How not to sample protein space

Mung has drawn our attention to a post by Kirk Durston at ENV. This is my initial reaction to his method to establish the likelihood of generating a protein with AA permease (amino acid membrane transport) capability.

Durston: “Hazen’s equation has two unknowns for protein families: I(Ex) and M(Ex). However, I have published a method to solve for a minimum value of I(Ex) using actual data from the Protein Family database (Pfam),

Translation: I have published a method to solve for a minimum value of I(Ex) among proteins that presently exist.

I downloaded 16,267 sequences from Pfam for the AA permease protein family. After stripping out the duplicates, 11,056 unique sequences for AA Permease remained.

Translation: I took some proteins that actually exist. I implicitly assume that they are a representative, unbiased sample of all the AA permeases that could exist.

the results showed that a minimum of 466 [think he means 433 – that’s the number he plugs in later anyway] bits of functional information are required to code for AA permease.

Translation: the results show that the smallest number of bits in this minuscule and biased sample of the entire space is 433.

Using Hazen’s equation to solve for M(Ex), we find that M(Ex)/N is less than 10^-140 where N = 20^433.

Translation: starting from my extremely tiny sample of protein space, multiplying up any distortions (eg those due to common origin or evolution) and ignoring redundancy, modularity, exaptation, site-specific variations in constraint and the possibility of anything more economically specified than an existing protein, the chance of hitting a 433-bit AA permease by a mechanism not actually known in biology is – ta-dah! – 1 in 10^140.

441 thoughts on “How not to sample protein space

  1. Kirk,

    Evolutionary sampling of sequence space

    For those who have experience writing genetic algorithms, you know that if we just use mutations, we will be stuck sampling only nearby sequence space. Crossover, however, can drop us into a completely different area of sequence space from which a new search area can unfold. Deletions can prune the size of sequence space, giving us simpler solutions. So if evolution can utilize crossover and deletions, it is not stuck sampling ever-widening circles of a local area of sequence space. The question is, how effective has its sampling been?

    An analogy would be with dropping on an island, and exploring it thoroughly (or at least ascending the nearby peaks), and using this as a guide to the frequency of islands in the sea. Crossover largely does the former – the chimeric ends of sequences being recombined both typically descend from a single ancestral sequence. It takes part-solutions with different subsequent histories, each of which has proven its worth independently, and generates ‘novel’ solutions in a less hit-and-miss way than mutation. But it is still ‘on the island’.

    The recombinational method I think most undermines the case, however, involves integration of sequence between different proteins. Again, the protein set at any given time has been enriched in those sequences that are at least not inimical to function. But instead of reciprocal swap of two ends of the same ancestral sequence, one can get migration of entire sub-ORF modules. This, I think takes one out of the local region explorable by point mutation and crossover. If (for example) an amphipathic helix exists in one protein, copying it to another makes that protein membrane-bound in one multi-residue step, but without the massive improbability associated with that many simultaneous substitutions.

    The probability in such arenas is a dependent one, of that module existing and being inserted at a given site.

  2. Allan Miller,

    Despite stcordova’s protests, everything is made from DNA

    Nonsense. The first cell didn’t have its membrane made from its DNA. Does DNA make the Golgi apparatus? DNA makes mitochondria? Really?

    Maybe someday Allan will have some science to support his PoV on DNA. I doubt that will happen any time soon.

  3. Allan Miller:
    Mung,

    Leaving evolution to proceed through the ones that don’t die.

    Are you proposing a Principle of Universal Lethality of ATP Binding? On what grounds? You can imagine something being problematic, therefore it is problematic?

    ATP binding via designed AA sequences Not one of those sequences arose via stochastic processes and cells require much more than ATP binding.

    Lenski has demonstrated the severe limits of evolutionary processes. So you need more than your rhetoric.

  4. Here is a thought- instead of attacking Kirk why don’t you just show him/ us how easy it is for stochastic processes to produce proteins? That would be the scientific venue, anyway

  5. Frankie,

    Nonsense. The first cell didn’t have its membrane made from its DNA. Does DNA make the Golgi apparatus? DNA makes mitochondria? Really?

    Did you read me as saying that everything is made of DNA?

    Yes, DNA ‘makes’ both the Golgi apparatus and mitochondria. Unless you know different. If you suggest that they are ‘really’ made by enzymes (which are in turn ‘really’ made by other enzymes), I would suggest in return that you learn a little basic molecular biology.

  6. Frankie,

    Here is a thought- instead of attacking Kirk why don’t you just show him/ us how easy it is for stochastic processes to produce proteins? That would be the scientific venue, anyway

    The topic of the thread is in fact Kirk Durston’s method of sampling protein space, so that is naturally what I address. If you would like a thread about something else, off you go and write it.

  7. Rumraket: Then your line of reasoning basically collapses and you simply can’t use these calculations to estimate the density of function in sequence space.

    The above was written in response to a step in my procedure that removes insertions that occur only a minority of times, usually less than 10 or 15 percent. As I mentioned earlier, an insertion may serve some functional purpose that we may not know of.

    Why nothing ‘collapses’
    The objective of my research was to find an upper limit for M(Ex)/N, not a lower limit. Obviously, the more insertions I strip out, the higher (more probable) M(Ex)/N becomes. The idea is to look for the shortest, functional version of a protein within a protein family, not the most arcane, improbable version. I want to make it as easy as possible for evolution to find a member of that family, not as hard as possible.

  8. Allan Miller:
    Frankie,

    Did you read me as saying that everything is made of DNA?

    Yes, DNA ‘makes’ both the Golgi apparatus and mitochondria. Unless you know different. If you suggest that they are ‘really’ made by enzymes (which are in turn ‘really’ made by other enzymes), I would suggest in return that you learn a little basic molecular biology.

    Too bad that you don’t have any evidence to support your claim. Mitochondria replicate and make copies of themselves.

    Maybe someday you will have evidence for your claim but today is not that day

  9. Allan Miller:
    Frankie,

    The topic of the thread is in fact Kirk Durston’s method of sampling protein space, so that is naturally what I address. If you would like a thread about something else, off you go and write it.

    Yes and my suggestion is the way to refute his claim. Too bad that you cannot show that stochastic processes can produce any functional protein. I know that must hurt

  10. Kirk: Obviously, the more insertions I strip out, the higher (more probable) M(Ex)/N becomes.

    This is only true because your ‘method’ fails to account for the variability that is allowed, as evidenced by the existence of such insertions. If you accounted for this allowed variability, M(Ex)/N would go through the roof.

  11. the chance of hitting a 433-bit AA permease by a mechanism not actually known in biology is – ta-dah! – 1 in 10^140.

    And all you have to do to refute that is show that stochastic processes can produce such a protein. Yet you can’t

  12. Mung,

    From, of, what’s the difference?

    It depends whether you want to play silly equivocation games or not. Clearly there is a sense in which ‘from’ and ‘of’ are synomymous. Clearly that is not the sense I was intending. Ever heard of the term ‘charitable reading’?

    For the record, I am not saying that everything is made of DNA. How anyone could be so stupid as to think I would be so stupid is beyond me.

  13. OMagain:
    The adults are talking Frankie. Why don’t you go and play with a toaster?

    Do I hear a subvocalized “in a bathtub” at the end of that suggestion?

  14. Frankie,

    Too bad that you don’t have any evidence to support your claim. Mitochondria replicate and make copies of themselves.

    Using the mitochondrial genome to make their own tRNAs and many of their proteins, plus additional enzymes from the main genome in the nucleus. In both cases, coming from DNA.

    I love how you guys school molecular biologists in molecular biology. Its soooo cute!

  15. Frankie,

    And all you have to do to refute that is show that stochastic processes can produce such a protein.

    That you think such a demonstration would refute Dr Durston’s case shows that you don’t understand the argument, on either side. Durston is taking a particular approach to sampling. I am addressing that.

  16. Allan Miller:
    Frankie,

    Using the mitochondrial genome to make their own tRNAs and many of their proteins, plus additional enzymes from the main genome in the nucleus. In both cases, from DNA.

    I love how you guys school molecular biologists in molecular biology. Its soooo cute!

    LoL! Nothing of what you say is controversial nor being debated. DNA does not and cannot do anything without existing proteins and enzymes. And your last blurb is totally unsupported.

  17. Allan Miller:
    Frankie,

    That you think such a demonstration would refute Dr Durston’s case shows that you don’t understand the argument, on either side. Durston is taking a particular approach to sampling. I am addressing that.

    LoL! That you cannot demonstrate such a thing proves that your position has nothing, Allan.

  18. Patrick: Do I hear a subvocalized “in a bathtub” at the end of that suggestion?

    Yes you and OM should play with a plugged-in toaster while in a tub full of water.

    Yes adults are having a discussion which means OM needs to stay out of it.

    (prediction- my comment will be put in guano but the others will be allowed to stay)

  19. Kirk,

    The idea is to look for the shortest, functional version of a protein within a protein family, not the most arcane, improbable version. I want to make it as easy as possible for evolution to find a member of that family, not as hard as possible.

    It remains the fact that you are sampling an evolved portion of the space, which inevitably biases the set towards tuned modern function. Your map consists solely of a couple of yards around the highest point of the island. Clearly, that is a very restricted space, and the chance of parachuting blindly upon it astronomically small. But it says nothing about the capacity of evolution to land on the shore of such an island, which depends on their size and frequency.

  20. Allan Miller:
    Frankie,

    Using the mitochondrial genome to make their own tRNAs and many of their proteins, plus additional enzymes from the main genome in the nucleus. In both cases, coming from DNA.

    I love how you guys school molecular biologists in molecular biology. Its soooo cute!

    Take a genome- any genome- just the DNA and see what happens. That is the easiest way to refute your claim, Allan.

  21. Allan Miller:
    Kirk,

    It remains the fact that you are sampling an evolved portion of the space, which inevitably biases the set towards tuned modern function. Your map consists solely of a couple of yards around the highest point of the island. Clearly, that is a very restricted space, and the chance of parachuting blindly upon it astronomically small. But it says nothing about the capacity of evolution to land on the shore of such an island, which depends on their size and frequency.

    You don’t know if it is an evolved portion. Evolved from what? Evolved how? You do realize that evolution by design is still evolution…

  22. Also, Frankie’s claim was originally

    Frankie: ATP binding via designed AA sequences Not one of those sequences arose via stochastic processes…

    Which betrays quite the failure to understand what Keefe & Szostak did.

    Is “Mutagenic PCR” stochastic, Frankie?

  23. DNA_Jock:
    Also, Frankie’s claim was originally

    Which betrays quite the failure to understand what Keefe & Szostak did.

    Is “Mutagenic PCR” stochastic, Frankie?

    Then it should be easy for you to refute my claim. Mutagenic PCR is manmade. Did nature produce those sequences? No.

  24. Frankie,

    LoL! That you cannot demonstrate such a thing proves that your position has nothing, Allan.

    That you keep saying that and only that proves that you are, as usual, blowing smoke.

    I wonder if Dr Durston considers this imagined demonstration the only reasonable refutation of his figures. Kirk – care to comment?

  25. Allan Miller:
    Frankie,

    That you keep saying that and only that proves that you are, as usual, blowing smoke.

    I wonder if Dr Durston considers this imagined demonstration the only reasonable refutation of his figures. Kirk – care to comment?

    Whatever, Allan. It is clear that your position has no clue as to how proteins arose. That is the only point I am making.

  26. Frankie,

    Take a genome- any genome- just the DNA and see what happens. That is the easiest way to refute your claim, Allan.

    That does not refute ‘the claim’. Can you point to any enzyme or RNA whose sequence is not determined by the sequence of one or more stretches of DNA?

  27. Frankie,

    It is clear that your position has no clue as to how proteins arose.

    I have very well-formed ideas on how proteins arose. This thread, however, is apout Dr Durston’s method. I realise you may wish to change the subject, but there you go.

  28. Frankie,

    You don’t know if it is an evolved portion. Evolved from what? Evolved how?

    What does it matter? Kirk is assuming evolution arguendo and investigating its limitations, specifically with respect to protein space. I am critiquing his approach.

  29. Allan,

    It remains the fact that you are sampling an evolved portion of the space, which inevitably biases the set towards tuned modern function. Your map consists solely of a couple of yards around the highest point of the island. Clearly, that is a very restricted space, and the chance of parachuting blindly upon it astronomically small. But it says nothing about the capacity of evolution to land on the shore of such an island, which depends on their size and frequency.

    Nicely put.

  30. Allan Miller:
    Frankie,

    That does not refute ‘the claim’. Can you point to any enzyme whose sequence is not determined by the sequence of one or more stretches of DNA?

    Of course it refutes your claim. DNA cannot do anything without pre-existing proteins and enzymes. DNA doesn’t do anything by itself- besides degrade.

    Can you point to any DNA that produced proteins without pre-existing proteins? No

  31. Allan Miller:
    Frankie,

    I have very well-formed ideas on how proteins arose. This thread, however, is apout Dr Durston’s method. I realise you may wish to change the subject, but there you go.

    Test your ideas then. I will leave it at that.

  32. Allan Miller:
    Frankie,

    What does it matter? Kirk is assuming evolution arguendo and investigating its limitations, specifically with respect to protein space. I am critiquing his approach.

    And Lenski has shown evolution has severe limitations. The paper “Waiting for two Mutations” present another huge obstacle for you

  33. Frankie,

    Of course it refutes your claim. DNA cannot do anything without pre-existing proteins and enzymes. DNA doesn’t do anything by itself- besides degrade.

    So what? All the pre-existing proteins and enzymes are made from DNA->RNA templates. All of them. It all comes from DNA. It does not have to come from the chromosome in the current cell for this to be true. A DNA copy in the parent cell furnishes the daughter cells with their initial load of enzymes and RNAs, which extract other enzymes, including more copies of themselves.

  34. Frankie,

    And Lenski has shown evolution has severe limitations. The paper “Waiting for two Mutations” present another huge obstacle for you

    Neither of these is germane to the OP. I am awaiting a substantive comment from you on Dr Durston’s methodology for exploring the limits of evolution.

  35. Allan Miller: All the pre-existing proteins and enzymes are made from DNA->RNA templates

    All the pre-existing proteins and enzymes are made from DNA->RNA templates

    That is your untestable opinion. How the heck is that even possible? Show your work

    A DNA copy in the parent cell furnishes the daughter cells with their initial load of enzymes and RNAs, which extract other enzymes, including more copies of themselves.

    Exactly! DNA didn’t do it. DNA can’t even replicate itself.

  36. Allan Miller:
    Frankie,

    Neither of these is germane to the OP. I am awaiting a substantive comment from you on Dr Durston’s methodology for exploring the limits of evolution.

    And we are awaiting substantive comment from anyone who can refute his claims. So far it isn’t looking so good.

  37. Sampling function protein space is like sampling the space of functional locks and keys — one has some flexibility in looking for really easy to find lock-and-key combinations versus highly intricate ones.

    Why should evolution pick the more intricate and extravagant constructs. It’s the peacock’s tail problem. We have complex structures that should be selected against, not for. Darwin had it bass-ackward like a lot of his other illogical ideas which people drool over as if it were the product of genius.

    Darwin knew something was wrong with his theory when he saw the Peacock’s tail, but he never could bring himself to realize that tail symbolized what’s wrong with his theory.

  38. stcordova: Darwin knew something was wrong with his theory when he saw the Peacock’s tail, but he never could bring himself to realize that tail symbolized what’s wrong with his theory.

    Cool. Any other century dead people’s minds you can read?

  39. Kirk: I want to make it as easy as possible for evolution to find a member of that family, not as hard as possible.

    Yeah, and that is all you’re doing. Locating a member of a particular family of proteins. You’re not estimating the density of all possible functional proteins in sequence space. You can’t from where you are standing.

  40. Proteins have binding and other interactions. What defines functional is the apparatus that makes the protein much like a complex key is rather meaningless if it doesn’t correspond to a lock.

    Example: human insulin can be expressed in trangenic genetically engineered bacteria. Is the insulin functional in the bacterial space? Does the question of “functionality” even have meaning for insulin in an E. Coli? No.

    So how complex is the insulin molecule? Well, it really is a statement of how complex the machinery is that utilizes it as well, isn’t it? I mean — beta cells, tyrosine kinase receptors, regulation — oooh, not good if there is too much or too little insulin since death becomes a consequence.

    The question is how improbable is the emergence of something as complex as an insulin-regulated metabolism. This is like asking the improbability of Rube Goldberg machines.

    The specificity of binding interactions are a start. If they are highly specific (therefore improbable), when non-specific ones are possible, then why and how can it evolve?

    Additionally, protein space isn’t just about catalytic ability but the parts it connects to. For example for the longest time, it wasn’t understood why there is codon bias to generate the same protein. Now that we know a little more about miRNAs and their targeting of mRNA transcripts etc., we know the genes expressing the protein are regulated by microRNAs that recognize the specific nucleotide sequences even though there is degeneracy in the protein code.

    Also, how does one define functional? Histones have N-terminal tails and on those tails are where post-translational epigenetic modification marks are place. One can’t just willy nilly say, “Hey, a histone that allows DNA to wrap around it is sufficient to define functionality therefore the protein space.” NOT SO.

    The post translational aspects of proteins enable residue specific changes vital to the organism. For example, Residue 27 on the protein called Histone 3 is highly essential to stem cell differentiation. We can create laundry lists of residues that are just dedicated to providing Random Access Memory marks and attendant addressing schemes to make cells work. It’s not just about making a catalytic action, it’s way more than that.

    So protein specificty:

    1. DNA specific gene bases despite codon degeneracy because of microRNA targeting and other mRNA regulatory mecahisms

    2. multiple ligand and receptor sites to enable connection with other proteins RNAs and DNAs

    3. post translational epi-proteomic modifications that provide signalling capability

    4. who knows what else…

  41. Frankie,

    Me: All the pre-existing proteins and enzymes are made from DNA->RNA templates

    Frankie: That is your untestable opinion. How the heck is that even possible? Show your work

    Yoiu are just gibbering. Untestable opinion my arse. If you mutate the DNA the protein changes. Crack open any molecular biology textbook. Look up ‘transcription’. Then look up ‘translation’. All of the proteins and RNA in your cell come from DNA – as in, are template-copied into RNA and potentially translated in a ribosome. And that includes the polymerases and RNA of the transcription and translation machinery.

    DNA didn’t do it. DNA can’t even replicate itself.

    The fact remains that the proteins that replicate DNA come from DNA – as in, are template-specified by it. Where in hell else do you think proteins come from? Space?

    Try turning off transcription. Take some death cap. You’ll be fine for about 12 hours, as you use up your current load of protein. Then, you’ll start to feel decidedly unwell as the supply of proteins dries up. Then, in very short order, you’ll be as dead as a doornail. Because the proteins come from DNA, and death cap turns off the tap.

    This is basic molecular biology.

  42. Frankie,

    And we are awaiting substantive comment from anyone who can refute his claims. So far it isn’t looking so good.

    Use your scroll wheel. There are extensive multi-paragraph posts from me and others going into the method and its flaws, in detail. Kirk responded with multi-paragraph responses of his own, suggesting that he, at least, understood what was being said. So far, your own responses indicate simple incomprehension.

  43. Allan, to Frankie:

    So far, your own responses indicate simple incomprehension.

    A statement that applies not only to this thread but to Frankie’s entire history at TSZ.

  44. Allan Miller: For the record, I am not saying that everything is made of DNA. How anyone could be so stupid as to think I would be so stupid is beyond me.

    They don’t call us IDiots for no reason!

  45. Allan Miller,

    If you mutate the DNA the protein changes.

    Not necessarily but that isn’t even relevant. For DNA to do that requires an existing suite of proteins

    All of the proteins and RNA in your cell come from DNA

    That is your opinion. However given how cells replicate it is a given that everything in the cell is passed on and the DNA doesn’t recreate it.

    DNA doesn’t do anything on its own. That is the whole point. If you get DNA you still need all of those proteins or else nothing happens. And in the first cell DNA could not have made the proteins it requires.

    The fact remains that the proteins that replicate DNA come from DNA

    Or they came from the designer who knew what it takes to get a living cell.

    Without DNA polymerase you get nothing. And if you don’t already have it then you aren’t going to make it.

  46. Allan Miller:
    Frankie,

    Use your scroll wheel. There are extensive multi-paragraph posts from me and others going into the method and its flaws, in detail. Kirk responded with multi-paragraph responses of his own, suggesting that he, at least, understood what was being said. So far, your own responses indicate simple incomprehension.

    I read them and your OP. It all amounts to whining. Try to get your tripe in peer-review.

Leave a Reply