How not to sample protein space

Mung has drawn our attention to a post by Kirk Durston at ENV. This is my initial reaction to his method to establish the likelihood of generating a protein with AA permease (amino acid membrane transport) capability.

Durston: “Hazen’s equation has two unknowns for protein families: I(Ex) and M(Ex). However, I have published a method to solve for a minimum value of I(Ex) using actual data from the Protein Family database (Pfam),

Translation: I have published a method to solve for a minimum value of I(Ex) among proteins that presently exist.

I downloaded 16,267 sequences from Pfam for the AA permease protein family. After stripping out the duplicates, 11,056 unique sequences for AA Permease remained.

Translation: I took some proteins that actually exist. I implicitly assume that they are a representative, unbiased sample of all the AA permeases that could exist.

the results showed that a minimum of 466 [think he means 433 – that’s the number he plugs in later anyway] bits of functional information are required to code for AA permease.

Translation: the results show that the smallest number of bits in this minuscule and biased sample of the entire space is 433.

Using Hazen’s equation to solve for M(Ex), we find that M(Ex)/N is less than 10^-140 where N = 20^433.

Translation: starting from my extremely tiny sample of protein space, multiplying up any distortions (eg those due to common origin or evolution) and ignoring redundancy, modularity, exaptation, site-specific variations in constraint and the possibility of anything more economically specified than an existing protein, the chance of hitting a 433-bit AA permease by a mechanism not actually known in biology is – ta-dah! – 1 in 10^140.

441 thoughts on “How not to sample protein space

  1. colewd:
    Allan Miller,

    This does not cite a single mutation but I am showing it to you because P 53 is a mission critical protein in the cell cycle and shows the mutationally sensitive areas.
    http://dx.doi.org/10.1177%2F1947601911408889

    It seems to mostly show how it confers advantages to tumor cells, not a particularly good example of mutations somehow ruining the cell cycle. Tumor cells are still alive and reproducing very well, so well in fact they might outcompete the normal cells of the body of a multicellular organism and thereby kill it. But that is irrelevant with respect to a hypothetical primordial and simpler form of cellular life, which would have been single-celled and not multicellular and thereby immune to tumors. In that respect your reference merely shows how mutations can make cells evolve.

  2. Allan Miller:
    Frankie,

    I see no good reason to dismiss the possibility.

    There isn’t any evidence for it. That is good enough for science. No one knows what it would contain and no one knows how it would function.

  3. colewd:
    Allan Miller,

    I need to think about this but currently I am skeptical that we have identified a mechanism that can consistently tune at all.What is the mechanism that you think does this?

    Mutation and natural selection, as the reference you gave shows very well. Besides this, the long-term evolution experiment. The fitness of the organisms have been consistently improving for over 20 years and showing no signs of stopping. So mutation and selection provably consistently tune organisms.

  4. Allan Miller,

    You just baldly stated his sample space was wrong. And if there isn’t any evidence for other AAps then science can’t just make them up.

    Also if one can assume evolution without the unnecessary and untestable baggage of Common Descent.

    I don’t have to show stochastic processes doing anything.

    We infer ID because 1) You can’t and 2) it matches the criteria

  5. 1- Natural selection includes mutations
    2- Those mutations have to be happenstance occurrences- yet we have no idea how to make that determination
    3- NS eliminates, it doesn’t select and there is a huge difference between the 2- Ernst Mayr
    4- With NS whatever is good enough survives and gets the chance to reproduce. There is no driving towards a higher fitness.
    5- Whatever survives can be the fastest, slowest, biggest, smallest, longest, shortest, heaviest, lightest

    NS is nothing more than contingent serendipity

  6. Rumraket,

    Mutation and natural selection, as the reference you gave shows very well. Besides this, the long-term evolution experiment. The fitness of the organisms have been consistently improving for over 20 years and showing no signs of stopping. So mutation and selection provably consistently tune organisms.

    What proteins were tuned in this experiment? Where was improved protein to protein interaction observed?

  7. colewd:
    Rumraket,

    What proteins were tuned in this experiment? Where was improved protein to protein interaction observed?

    http://www.nature.com/nature/journal/v461/n7268/extref/nature08480-s1.pdf

    This lists the known amino acid changes in the proteins of the E coli genome. I can’t answer the last question because it isn’t actually known what many of the mutated genes do. The thing we can say however, is that these mutations contribute to increased fitness of the E coli cells over the course of the experiment. So clearly something has been tuned by these mutations both inside the protein coding regions and in the regulatory regions of the genome.

    Here is the paper itself: http://www.nature.com/nature/journal/v461/n7268/full/nature08480.html
    behind a paywall unfortunately.

  8. colewd,

    This does not cite a single mutation but I am showing it to you because P 53 is a mission critical protein in the cell cycle and shows the mutationally sensitive areas.

    I can only say again: P53 is mutationally sensitive now. There are all manner of co-evolutionary processes that have the capacity lead to a modern protein being mutationally sensitive without its ancestral sequence being anywhere near so sensitive, if at all.

    There does seem to be a general unwillingness or incapacity among ID advocates to consider the possibilities available to evolution at all. I term it ‘evolution blindness’. Perhaps it is what tends people towards ID in the first place, or perhaps the tendency develops once ID has been embraced, I dunno. But the kinds of possiblility I point to seem obvious, at least as in-principle mechanisms for achievement of the modern state. And yet I have to spell them out.

    The kind of gradual acquisition of ‘brittleness’ that I sketch must, surely, be accepted as a logical possibility that has at least the potential to account for any given non-negotiable modern sequence?

    And this is just the functional argument. When one adds in selection, allowing that many sequences may perform the function to a degree but only one or a few do it well, surely one could see that this exactly what one would expect from selection? Historic tuning has located an approximate optimum configuration. All configurations including its ancestors will be detrimental against that standard. All changes are downhill. That does not mean that all changes always were downhill. That would make no sense, in a scenario where selection pushed it up the hill in the first place.

  9. And just to reiterate Rumraket’s point: P53 is a bad example: it acts to suppress unrestrained mitosis in animal cells, allowing for co-ordination of a soma. There is an uneasy truce among the cells of a soma: they forego their replication for the opportunities afforded by enhancement of the survival of their genes in gametes. But the mechanism of suppression is only temporary or at least it can break.

    Meanwhile, suppression of mitosis in a single-celled organism, a reasonably inferred ancestor, would be unlikely to be advantageous. P53 is not central to the cell cycle on the broad scale.

  10. Allan Miller: There does seem to be a general unwillingness or incapacity among ID advocates to consider the possibilities available to evolution at all. I term it ‘evolution blindness’. Perhaps it is what tends people towards ID in the first place, or perhaps the tendency develops once ID has been embraced, I dunno.

    There is a simple, clear reason for ID-pushers’ ‘evolution blindness’, and said reason is that ID is one of the various forms of Creationism (other such forms are Young-Earth Creationism, and Day-Age Creationism, to name only two). What drives ID-pushers to reject evolution is, therefore, not sincere doubt which they’ve arrived at after sober examination of the data; rather, what drives ID-pushers to reject evolution is the realization that evolution is against their religion. To the Creationist mind, the wrongness of evolution is an unquestionable presupposition, every bit as unshakably Certain as the goodness of God. Evolution cannot be right, full stop. And whatever intellectually-gymnastic feats of pretzel-logic are required to ‘support’ that position in the face of contradicting evidence, said feats of pretzel-logic must be totally good and right and valid because evolution is just wrong, okay!?!!?.

    Any model of ID-pushers which regards them as genuine seekers after truth, rather than as religious zealots on a Crusade, is a fundamentally invalid model.

  11. Allan Miller,

    Yes Allan and what you have is also evolutionary blindness-you see whatever you want and you think your imagination is evidence.
    BTW ID is not anti-evolution.

    Perhaps if you had some evidence and some way to test your claims IDists would take note. Until then you don’t have anything to see.

  12. cubist,

    Wow, what another load of crap. ID is not anti-evolution.

    What drives people to reject evolutionism is simple: it makes untestable claims. I know that you want and need it to be something else but it isn’t. If you guys could test and validate your claims ID would be a non-starter as one of ID’s claims is yours doesn’t have a mechanism capable of producing the diversity of life observed, including biological systems and subsystems.

    So perhaps it is time that you guys work on that as opposed to whining about why people reject evolutionism.

  13. Allan Miller: There does seem to be a general unwillingness or incapacity among ID advocates to consider the possibilities available to evolution at all. I term it ‘evolution blindness’.

    ok, but the possibilities would be part of the search space, right?

  14. OM is asking a question that has been answered- the same way GAs make it through the spaces in order to solve the problems they are designed to solve- active searches, ie evolution by intelligent design.

  15. cubist: Any model of ID-pushers which regards them as genuine seekers after TRUTH, rather than as religious zealots on a Crusade, is a fundamentally invalid model.

    Fixed that for you.

  16. Frankie: evolution by intelligent design.

    What role does intelligent design play in evolution by intelligent design? Does it set a target protein that is then searched for?

  17. Allan Miller,

    And just to reiterate Rumraket’s point: P53 is a bad example: it acts to suppress unrestrained mitosis in animal cells, allowing for co-ordination of a soma. There is an uneasy truce among the cells of a soma: they forego their replication for the opportunities afforded by enhancement of the survival of their genes in gametes. But the mechanism of suppression is only temporary or at least it can break.

    I just don’t see evidence that a simple untuned set of molecules can sustain life. I see evidence that it can’t. I think your alpha helix idea is very clever but what can an alpha helix on its own do? The lenski experiment showed the enabling of an existing enzyme not the tuning of one. This is what I would expect. The size of the sequence space drifts the protein to non function if it allowed high mutation rates. Do you think that Rumraket’s mechanism of RMNS can produce a complex DNA sequence? I think the lack of a credible mechanism for large scale evolutionary change is why ID has emerged.

  18. Mung,

    What is the Allan Miller Approved way to sample protein space?

    Not This One.

    I’m not offering up a better way to do it at all. Just pointing out the flaws in Kirk Durston’s approach.

  19. Mung,

    ok, but the possibilities would be part of the search space, right?

    All real proteins would be in the space of all possible proteins, yes …

  20. Mung:
    What is the Allan Miller Approved way to sample protein space?

    I know, I’m not Allan Miller. But, if you’re trying to determine the actual density of all possible functions of proteins in amino acid sequence space, then it can’t be done by the simple reason that it is context specific. You’d have to test all sequences in all environments under all circumstances. There are too many variables, so it can’t be done.

  21. Patrick:
    Moved a comment to Guano:“accusing others of ignorance . . . is off topic”

    Truth hurts- deal with it. But I understand that you have to protect your own

  22. colewd: Do you think that Rumraket’s mechanism of RMNS can produce a complex DNA sequence?

    What is a complex DNA sequence? Define your terms. Give examples.

    I think the lack of a credible mechanism for large scale evolutionary change is why ID has emerged.

    I think religious fundamentalism is why ID has emerged.

    Phillip Johnson:
    “This [the intelligent design movement] isn’t really, and never has been, a debate about science, it’s about religion and philosophy.” World Magazine, 30 November 1996

    “The Intelligent Design movement starts with the recognition that ‘In the beginning was the Word,’ and ‘In the beginning God created.’ Establishing that point isn’t enough, but it is absolutely essential to the rest of the gospel message.” Foreword to Creation, Evolution, & Modern Science (2000)

    “Our strategy has been to change the subject a bit so that we can get the issue of intelligent design, which really means the reality of God, before the academic world and into the schools.” American Family Radio (10 January 2003)

  23. colewd,

    I just don’t see evidence that a simple untuned set of molecules can sustain life.

    My own view is that proteins came some considerable time after the beginning of life. A long time before now, but a long time (in millions of years) after the original problem of getting replication going. Once proteins became embedded in life, RNA-only organisms were quickly rendered extinct.

    RNA brings its own problems, but it has some very interesting properties.

    I think your alpha helix idea is very clever but what can an alpha helix on its own do?

    Not much, but it’s by no means the only regular structure fomed by amino acids. I invoke it as an illustrative aid, to pursue the point that the apparently ‘digital’ nature of primary sequence is soon subsumed in a much more ‘analogue’ whole, where overall properties of a peptide string do not depend on any one fixed residue.

    People get hung up on enzymes. That is not the only function of protein. To try those helixes again, the property of binding to a membrane may simply help to stabilise it. This does not require exquisite specificity.

    The lenski experiment showed the enabling of an existing enzyme not the tuning of one.

    It shows a bit of both. There are numerous changes in the Lenski flasks, beyond citrate metabolism.

    This is what I would expect. The size of the sequence space drifts the protein to non function if it allowed high mutation rates.

    It’s not the size of the sequence space, but its neighbourhood that matters. If mutation occurs at too high a rate, I guess it would be a problem. But this simply means that the kinds of protein we see today necessarily awaited the kinds of mutation rate we see today. If mutation rate placed a historic limit on protein size, specificity or complexity, then proteins found in that era would be those permitted by that limit, and not the modern equivalents.

    Do you think that Rumraket’s mechanism of RMNS can produce a complex DNA sequence?

    Certainly. Complexity is just a set of interacting parts. Any added component that enhances the survival of the new set will prosper as a linked whole. Of course complexity is not a universal, and is subject to constraint. But the kind of complexities that tend to exercise us – like … us – are not actually as complex as the basics of cellular metabolism.

    I think the lack of a credible mechanism for large scale evolutionary change is why ID has emerged.

    Credible is in the eye of the beholder. Doesn’t bother biologists much, bothers the religious enormously. I’m not sure what large scale evolutionary change is definitively out of scope for cumulative mutation, recombination, drift and selection. I’ve seen various proposals, but no hard reasons as to why.

  24. Rumraket: You’d have to test all sequences in all environments under all circumstances. There are too many variables, so it can’t be done.

    And why isn’t too many variables a problem for evolution, or at the very least for genetic algorithms being used as evidence for evolution.

    Is it yet another miracle that the actual environments that evolution has been able to test have led to life as we know it today? That seems amazing to me, and utterly fortuitous.

  25. Mung: Rumraket: You’d have to test all sequences in all environments under all circumstances. There are too many variables, so it can’t be done.

    And why isn’t too many variables a problem for evolution, or at the very least for genetic algorithms being used as evidence for evolution.

    You are confused again Mung. This thread is about Kirk trying to give an estimate of the number of functional proteins among all possible amino acid sequences. This is what I’m saying can’t be done. The calucation-method Kirk proposes to do this kind of sampling is not accomplishing what he wants.

    For evolution itself it is irrelevant, because evolution isn’t trying to determine how many functional proteins there are for use in some esoteric argument. Evolution isn’t writing papers or blog-posts and participating in debate.

    Mung: Is it yet another miracle that the actual environments that evolution has been able to test have led to life as we know it today? That seems amazing to me, and utterly fortuitous.

    This is another version of the fine-tuning argument. You’re essentially arguing it is an amazing coincidence that the laws of physics allow atoms to form and combine in such a way as to produce functional proteins, DNA and living organisms. An argument that is completely irrelevant to whether evolution or even a natural origin of life is possible. It’s a strange sort of moving of the goal posts. First the argument is about whether evolution is possible – and it is. Then it’s about if evolution can even get started, about the origin of life – we don’t really know much about it. Then you go even further and say, but look at the entire cosmos and how it all works such that it seems life can exist and function.

    Why is the world such that it is even possible for life to exist and evolve? I don’t know and I don’t claim to know at all, in fact I have no fucking clue, do you?

    But I’d be REALLY amazed if we discovered life couldn’t possibly exist, yet did anyway.

  26. Frankie: That doesn’t even make any sense.

    I know what evolution is, more or less. If Intelligent Design Evolution is like GAs, solving problems they are designed to solve, then what problems are organisms designed to solve and how do you know that?

    Frankie: OM is asking a question that has been answered- the same way GAs make it through the spaces in order to solve the problems they are designed to solve- active searches, ie evolution by intelligent design.

    My original question was how does ID help life navigate protein space? If it’s the “same way” then I can point to the intelligent design bit in a programmed GA. The program. I can’t point to the same bit in biology.

    Where in the cell is the current state of the problem being solved by the GA stored Frankie?

  27. Rumraket,

    It seems to mostly show how it confers advantages to tumor cells, not a particularly good example of mutations somehow ruining the cell cycle. Tumor cells are still alive and reproducing very well, so well in fact they might outcompete the normal cells of the body of a multicellular organism and thereby kill it. But that is irrelevant with respect to a hypothetical primordial and simpler form of cellular life, which would have been single-celled and not multicellular and thereby immune to tumors. In that respect your reference merely shows how mutations can make cells evolve.

    My point here is that AA specificity goes up the more surfaces that a protein encounters. P 53 is an example of a nuclear protein and I used it because it is so well studied in the cancer field. We could use JNK, RAS, Beta Catenin, MeK etc as well. I agree with your point that simple single celled organisms don’t require these complex protein interactions. On the other hand they do require protein complexes that have multi surface interactions. ATP synthase, ribosome and the bacterial flagellar motor are examples. How many mutations to any of these multi protein complexes will kill the organism?

  28. OM,

    If you don’t like ID then all you have to do is step up and find scientific support for the claims of your position. Trying to explain ID to someone like you is a losing proposition.

  29. colewd: How many mutations to any of these multi protein complexes will kill the organism?

    Probably the vast majority.

    How can I say this and still believe they evolved? What am I missing?

  30. Mung,

    Is it yet another miracle that the actual environments that evolution has been able to test have led to life as we know it today? That seems amazing to me, and utterly fortuitous.

    Life? Huh. Thinks it’s all that. Looka’ me, I can replicate and stuff.

    When it can operate without water, or in space, then I’ll be impressed. So far, it’s a thin scum on a middling sized rock, occupying a fairly narrow set of watered environments, though admittely with elaborate contrivance.

  31. Patrick:
    Moved another comment to Guano.Address the post, not the poster.

    And more BS- why is it OK to attack IDists, then?

  32. Frankie: If you don’t like ID then all you have to do is step up and find scientific support for the claims of your position.

    I don’t need to do that, it’s already been done. Even if you don’t like it.

  33. Rumraket,

    How can I say this and still believe they evolved? What am I missing?

    They could have evolved, but does this raise your skepticism that the mechanism is the blind watchmaker?

  34. OMagain: Where in the cell is the current state of the problem being solved by the GA stored Frankie?

    I know the answer to this one! It’s in the RAM.

    HT: Salvador

  35. What is OM talking about? No one knows how to test the claim that ATP synthase arose via stochastic processes. The same goes for all biological systems and subsystems.

    But if OM thinks it knows better please provide references and we will consider them.

  36. colewd,

    Is there a difference?

    Of course. If the modern system is less capable of change than the ancient one, of course there is a difference. Which one must surely accept as a logical possibility that would still lead to what we see. Sequences evolve in the presence of each other, and can be ‘pinned’ in place by interactions not present at the start.

    Ubiquitin, for example, which Kirk goes into in the J Med Model paper linked. It is … ubiquitous. Many systems depend upon recognising it, and therefore it cannot be changed without screwing one or more of them. But it does not have a strongly sequence-dependent function as such. It is as ‘specific’ as the http:// tag, which could once have been anything non-wordy, but no longer can. The systems weren’t waiting for ubiquitin to be invented, with binding sites primed and ready.

  37. Allan Miller,

    My own view is that proteins came some considerable time after the beginning of life.

    So you agree that life is a non-physical entity. Good. Physical living organisms have to have proteins. Can’t be a living organism without them.

Leave a Reply