John Sanford’s NIH Presentation, Kirk Durston’s Paper

Posted on October 20, 2018 by stcordova

On Thursday 10/18/18 John Sanford presented at the prestigious Mazur Auditorioum of the National Institutes of Health:

https://calendar.nih.gov/app/MCalInfoView.aspx?EvtID=36417

Recordings are being edited, formatted and finalized and will be, God willing, released in due time.

On 10/19/18 the next day John and I conferenced with Kirk Durston, the lead author of the following paper which lays some of the ground work for one of the purposes of the nested hierarchy in biology which is makes discovery of 3D protein structure because of the patterns of sequence diversity and similarity.

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3524763/

Statistical discovery of site inter-dependencies in sub-molecular hierarchical protein structuring

Much progress has been made in understanding the 3D structure of proteins using methods such as NMR and X-ray crystallography. The resulting 3D structures are extremely informative, but do not always reveal which sites and residues within the structure are of special importance. Recently, there are indications that multiple-residue, sub-domain structural relationships within the larger 3D consensus structure of a protein can be inferred from the analysis of the multiple sequence alignment data of a protein family. These intra-dependent clusters of associated sites are used to indicate hierarchical inter-residue relationships within the 3D structure. To reveal the patterns of associations among individual amino acids or sub-domain components within the structure, we apply a k-modes attribute (aligned site) clustering algorithm to the ubiquitin and transthyretin families in order to discover associations among groups of sites within the multiple sequence alignment. We then observe what these associations imply within the 3D structure of these two protein families.

84 thoughts on “John Sanford’s NIH Presentation, Kirk Durston’s Paper”

stcordova on October 22, 2018 at 4:36 pm said:

Allan,

Your reply was pretty good, much better than anyone here.

However, I should point out, small molecule binding could be the order of the day for proteins with stable folds like Topoisomerase. Topo has a comparable dense amount of Post Translational Modifications like the Histones so the sequences are under more functional constraint than you suppose, except the constraints are species specific.

From a less mathy perspective, if we take the analogy of a picture, we can understand Kirk’s work from a zoomed out perspective…

A completely blank white screen is in a low entropy condition, and is not very informative. Whereas a screen filled with random pixels (a high entropy condition) is also not very informative from a superficial level (unless the “random” pixels weren’t random, but were some sort of decipherable code. Something in between minimal and maximal entropy might be informative as in giving a picture is a recognizable image.

A completely conserved protein is like a blank screen, it has not much information on structure. A totally unconserved protein across species doesn’t really exist because otherwise we couldn’t classify it into a proper family. So something in-between, like Ubiquitin, that has parts conserved and unconserved, can reveal structural characteristics. I wouldn’t be too quick to say the other residues are under NO selection, maybe weaker selection.

But, from a Young Life Creation (YLC) perspective, this is adequate to explain a POSSIBLE reason for the patterns of diversity and similarity. It is as Dembski said, an embedded user manual.
DNA_Jock on October 22, 2018 at 4:57 pm said:

If a protein is near 100% conserved, it is disordered

Sal, I will happily insult the intelligence of anyone who believes this.
However, I have seen zero evidence to suggest that Kirk, or any other biologist, thinks something this obviously wrong.
The impression that Sal got, even after a three hour conversation with the poor guy, is not evidence of anything.
Cue Sal whining about my mis-interpreting what he meant. I gave you a chance to back away from “If a protein is near 100% conserved, it is disordered”, and you doubled down.
Allan Miller on October 22, 2018 at 8:06 pm said:

stcordova,

Topoisomerase binds DNA, which I wouldn’t call a ‘small molecule’. And to correct a later supposition, I wasn’t arguing that residues are under NO selection. But it is in the nature of substitution that, generally, almost everything can be substituted by something, of similar size and polarity or charge. There are squillions of ways of making an alpha helix, for example, and they aren’t dotted around unconnected in sequence space. If the only constraint is internal interaction, there are more degrees of freedom than if there are external interactions – particularly if multiple different systems interact with the protein, e.g. ubiquitin. I liken it to the http:// tag, which is arbitrary but not changeable ad hoc.
Rumraket on October 22, 2018 at 8:19 pm said:

Let’s not forget that many residues which are selected against are “only” lower in fitness, not necessarily nonfunctional or lethal.

When something has been strongly conserved over time, that doesn’t mean there aren’t other functional variants of the protein even in the immediate vicinity in sequence space. Rather they are usually just lower fitness variants which are strictly still capable of supporting the life and functionality of the host organism.

It is through such lower fitness zones that ancestor reconstruction-type experiments often show that present (levels of) functions have evolved.
stcordova on October 22, 2018 at 8:39 pm said:

Allan Miller:

Topoisomerase binds DNA, which I wouldn’t call a ‘small molecule’.

I was referring to the large number of post translational modifications on Topo which are obvious binding sites much like the many binding sites on Histone 3 for its post translational modifications.

Is the proper term binding site or some other term?
John Harshman on October 23, 2018 at 1:39 am said:

stcordova: So, notwithstanding John Harshman’s glib dismissals, other people, not just myself, believe Kirk has something to contribute.

To be clear: the glib dismissal was of you, not Kirk. He may well have something to contribute, but you don’t, and one can’t tell what he has to contribute by reading anything you post.
Entropy on October 23, 2018 at 2:26 am said:

Sal,
Sorry, but no decent scientists would believe that 100% conservation means that the protein is disordered. It just doesn’t follow. Consider that you might have misunderstood something about that work, because it’s obvious that 100% conservation doesn’t tell you anything about whether the 3D structure is stable or disordered, or anything in-between.
phoodoo on October 23, 2018 at 3:34 am said:

DNA_Jock: Sal, I will happily insult the intelligence of anyone who believes this.
However, I have seen zero evidence to suggest that Kirk, or any other biologist, thinks something this obviously wrong.
The impression that Sal got, even after a three hour conversation with the poor guy, is not evidence of anything.
Cue Sal whining about my mis-interpreting what he meant. I gave you a chance to back away from “If a protein is near 100% conserved, it is disordered”, and you doubled down.

Shame on you, mini-Alan.

Lizzie sure can pick em.
Allan Miller on October 23, 2018 at 4:26 am said:

stcordova: I was referring to the large number of post translational modifications on Topo which are obvious binding sites much like the many binding sites on Histone 3 for its post translational modifications.

Is the proper term binding site or some other term?

The proper term would probably be ‘substrate’. A particular residue in A is the target for a post translational modification, which is mediated by an enzyme B with binding sites for the small molecule and the neighbourhood of the attachment in A. This interaction does not place a constraint on A in itself, unlike the situation in ubiquitin. If the residue in A changes such that B can’t act there, it’s a change of the same character as one between two unmodified residues.
Alan Fox on October 23, 2018 at 8:07 am said:

phoodoo,
Make complaints about admins and moderation in the appropriate thread.
Alan Fox on October 23, 2018 at 1:52 pm said:

Moved a comment to guano. See my previous comment.
DNA_Jock on October 23, 2018 at 1:55 pm said:

Wrong thread, phoodoo.
If you want to try to explain WHY you didn’t like a particular comment, while staying within the rules, that might be on-topic.
ETA: Ninja’d! (My point being: phoodoo may not have understood my comment…)
stcordova on October 23, 2018 at 3:34 pm said:

John Harshman: To be clear: the glib dismissal was of you, not Kirk. He may well have something to contribute, but you don’t, and one can’t tell what he has to contribute by reading anything you post.

Well it’s fair game to mock what I say. But I’m the one who pointed out to Kirk Histone 3 and his eyes lit up. 🙂
Mung on October 23, 2018 at 5:44 pm said:

How very Halloween that sounds!
colewd on October 23, 2018 at 6:19 pm said:

stcordova,

Well it’s fair game to mock what I say. But I’m the one who pointed out to Kirk Histone 3 and his eyes lit up.

Histone 3 because of its sequence preservation?
Entropy on October 23, 2018 at 7:50 pm said:

stcordova:
Just as an FYI, I pointed out to Kirk and John Sanford, that in the course of my journey, I found out the Histone 3 protein is almost 100% identical in all mammals,and 99% identical between humans and Arabidops Thaliana (a plant), and 90% similar to yeast. Where the heck did evolutionary theory predict that?

Look for the meaning of negative/purifying selection Sal.

You’re welcome

ETA: Did you mean Histone H3? If so, there’s more than one version in humans alone, not identical between them (thus not 100% in all mammals).
I tried a couple, and the best results I got:
It’s 99% identical between humans and mouse (thus not 100% in all mammals)
It’s 97% identical between human and A thaliana.
It’s 88% identical between humans and yeast
stcordova on October 23, 2018 at 9:35 pm said:

colewd:
stcordova,

Histone 3 because of its sequence preservation?

Yes, in animals and plants. It’s like it didn’t evolve. One can’t build much of a nested hiearchies with it for plants and animals.

Also the salient feature is the Histone Tail. H3 is Histone 3 in the diagram. You can see the significance of the long tail sticking out (the 3D depiction is only by inference).
stcordova on October 23, 2018 at 9:36 pm said:

Histones
Alan Fox on October 23, 2018 at 9:48 pm said:

stcordova: You can see the significance of the long tail sticking out (the 3D depiction is only by inference).

It’s a cartoon representation, Sal. A model.
Rumraket on October 23, 2018 at 10:10 pm said:

stcordova: Yes, in animals and plants. It’s like it didn’t evolve.

How so? It’s conserved, so what?
stcordova on October 23, 2018 at 10:33 pm said:

How so? It’s conserved, so what?

It’s like the evolutionary clock stopped after it poofed on the scene. It’s a very critical protein, it’s not an enzyme, and it’s otherwise non-descript. It may as well be a random polypeptide. So…it seems some serious machinery is built to support the language it represents — the histone code. Otherwise the protein itself is non-descript.
stcordova on October 23, 2018 at 10:39 pm said:

Rumraket: It’s conserved, so what?

Kirk predicts deeply conserved like this means not folded. That’s structurally informative (like a users manual). Where the does phylogeny predict this sort of relationship? Folded proteins on the other hand cluster in ways as Kirk points out in his paper and also have 3D significance in terms of function according to the clustering. Where does phylogeny predict this?

If phylogeny doesn’t, then this serves as a user manual, hence is consistent with Dembski’s vision of steganography — hidden messages to mankind in biology.

THAT explain a possible reason for the nested hierarchical patterns — and in the case of histone 3 — the lack thereof.
Entropy on October 23, 2018 at 10:59 pm said:

stcordova,

Again, there’s more than one H3 just in humans. Not 100% identical to each other. Thus, not 100% conserved in mammals, since it’s not even 100% conserved within a single species.

Leaving aside that inability to find remote homologs doesn’t mean that a protein didn’t evolve, the Pfam PF00125, where all H3s belong, contains H2A, H2B, H3, and H4 proteins. In other words, all of those histones are related to each other, and, sorry, they’re not identical.

If we click on the Pfam tab in that link, we find that other domains are similar to PF00125, for example: CBFD_NFYB_HMF, TFIID-31kDa, TAF, TFIID_20kDa, CENP-T_C and CENP-S in Pfam. None of them identical. Additionally, Bromo_TP, CBFD_NFYB_HMF, CENP-S, CENP-T_C, CENP-W, DUF1931, PAF, TAF, TAFII28, TFIID-18kDa, TFIID-31kDa, TFIID_20kDa and YscO-like in SCOP. None of them identical.

This took a few minutes checking Sal. I know, I know. You’re a biophysics guy and, according to you, guys in biophysics don’t know how to search for literature or through protein and domain databases. Right?
DNA_Jock on October 23, 2018 at 11:12 pm said:

stcordova,

Interesting.
Would you say, Sal, that the conservation of the N-terminal tail (1 to ~38) is
a) significantly greater
b) significantly less
or
c) about the same
as the conservation of the C-terminal domain (~39 to 136)
?

Asking for a friend.
Entropy on October 23, 2018 at 11:25 pm said:

colewd:
Histone 3 because of its sequence preservation?

Here a pseudo-copy for you so that you won’t be fooled by Sal’s ignorant claims:

There’s more than one H3 just in humans. Not 100% identical to each other. Thus, not 100% conserved in mammals, since it’s not even 100% conserved within a single species.

Leaving aside that inability to find remote homologs doesn’t mean that a protein didn’t evolve, the Pfam PF00125, where all H3s belong, contains H2A, H2B, H3, and H4 proteins. In other words, all of those histones are related to each other, and, sorry, they’re not identical.

If we click on the Pfam tab in that link, we find that other domains are similar to PF00125, for example: CBFD_NFYB_HMF, TFIID-31kDa, TAF, TFIID_20kDa, CENP-T_C and CENP-S in Pfam. None of them identical. Additionally, Bromo_TP, CBFD_NFYB_HMF, CENP-S, CENP-T_C, CENP-W, DUF1931, PAF, TAF, TAFII28, TFIID-18kDa, TFIID-31kDa, TFIID_20kDa and YscO-like in SCOP. None of them identical.

This took a few minutes checking. I know, I know. Sal’s a biophysics guy and, according to him, guys in biophysics don’t know how to search for literature or through protein and domain databases. So he has an excuse for not finding these homologs during “the course of his journey” (which was said with some pride, as if it was quite an accomplishment, yet the guy didn’t even know that there’s more than one H3 in humans alone).
John Harshman on October 24, 2018 at 12:49 am said:

stcordova: Well it’s fair game to mock what I say. But I’m the one who pointed out to Kirk Histone 3 and his eyes lit up.

Boy, I’m really glad it’s fair game to mock what you say, when you say stuff like that. You rule, Sal!
Allan Miller on October 24, 2018 at 11:36 am said:

stcordova: It’s like the evolutionary clock stopped after it poofed on the scene. It’s a very critical protein, it’s not an enzyme, and it’s otherwise non-descript.It may as well be a random polypeptide.So…it seems some serious machinery is built to support the language it represents — the histone code.Otherwise the protein itself is non-descript.

It’s almost as if it arose, became embedded in DNA management, and ended up losing its early plasticity in surviving descendant lineages, preserved by purifying selection.
phoodoo on October 24, 2018 at 2:15 pm said:

Allan Miller: became embedded in DNA management,

DNA management? What’s that a euphemism for?
Allan Miller on October 24, 2018 at 3:52 pm said:

phoodoo: DNA management?What’s that a euphemism for?

Nothing. See also ‘control’, ‘switch’, ‘signalling’, ‘maintenance’ and so on. They aren’t euphemisms either. But I know you’re just itching to make some capital out of casual word choice, so fill your boots.
stcordova on October 24, 2018 at 4:28 pm said:

It’s was both my view, and more importantly Kirk’s dissertation committee that Kirk had a viable and novel tool to use bioinformatics to assist in structural biology. This is independent of ones view of universal common descent.

From an experimental standpoint, it remains to be seen if Kirk’s idea is usable in the laboratory context and in biotech.

On a more philosophical: is common descent a good explanation for the pattern, especially the higher order ones that Kirk discovered. If not, then it would seem Kirk’s discovery accords with Bill Dembski’s vision of embedded user manuals in biology. At this stage it’s too early to explore that question.

My immediate interest is to get Kirk’s work tested out in the lab. That may be forth coming and I hope I’ll get assigned to work on it. In the meantime, my present work is on analyzing post translational modifications.

I’d like to thank all the participants for their comments.

When Dr. Sanford releases his NIH video, I’ll post it here at TSZ.
Entropy on October 25, 2018 at 1:12 am said:

stcordova:
On a more philosophical: is common descent a good explanation for the pattern, especially the higher order ones that Kirk discovered.

You have a conceptual mess there Sal. There’s no reason why common descent would have to explain why a highly disordered sequence should be highly preserved (I insist that this is evidently wrong, that either you misunderstood, or Kirk got it wrong, because we do know of highly conserved sequences that have very stable 3D structures). Predicting such a thing would belong to how proteins work, not their phylogenetic relationships. Expecting common descent to explain everything, even things it is not supposed to explain is a philosophical blunder on your part Sal.

stcordova:
If not, then it would seem Kirk’s discovery accords with Bill Dembski’s vision of embedded user manuals in biology. At this stage it’s too early to explore that question.

Not just too early, but too stupid. Sorry, but this is like saying that unless common descent explained how gravitation works, then god-did-it. Common descent is not supposed to explain gravity, Thus, it simply doesn’t follow.
dazz on October 25, 2018 at 2:16 am said:

Been reading the discussions over at peacefulscience on Durston’s paper “Measuring the functional sequence complexity of proteins” and most of the objections Prof. Swamidass presents seem to be identical to those against Puccio’s ID “theory”, only with more math involved. In fact, just like Puccio, Durston argues that extant proteins must be representative of the totality of sequence space and mostly arrived at by drift.

*sigh*
Allan Miller on October 25, 2018 at 9:34 am said:

dazz,

Yes, a neat way to make a sequence even more ‘miraculous’ would be simply to render its bearers extinct. As if by magic, sequence space loses a member, and functional proteins become even less likely.
stcordova on November 14, 2018 at 11:24 pm said:

Well, after a long awaited release here is Dr. Sanford’s 10/18/18 presentation at the National Institutes of Health!

FINALLY! Video of John Sanford’s 10/18/18 presentation at the NIH