Does gpuccio’s argument that 500 bits of Functional Information implies Design work?

On Uncommon Descent, poster gpuccio has been discussing “functional information”. Most of gpuccio’s argument is a conventional “islands of function” argument. Not being very knowledgeable about biochemistry, I’ll happily leave that argument to others.

But I have been intrigued by gpuccio’s use of Functional Information, in particular gpuccio’s assertion that if we observe 500 bits of it, that this is a reliable indicator of Design, as here, about at the 11th sentence of point (a):

… the idea is that if we observe any object that exhibits complex functional information (for example, more than 500 bits of functional information ) for an explicitly defined function (whatever it is) we can safely infer design.

I wonder how this general method works. As far as I can see, it doesn’t work. There would be seem to be three possible ways of arguing for it, and in the end; two don’t work and one is just plain silly. Which of these is the basis for gpuccio’s statement? Let’s investigate …

A quick summary

Let me list the three ways, briefly.

(1) The first is the argument using William Dembski’s (2002) Law of Conservation of Complex Specified Information. I have argued (2007) that this is formulated in such a way as to compare apples to oranges, and thus is not able to reject normal evolutionary processes as explanations for the “complex” functional information.  In any case, I see little sign that gpuccio is using the LCCSI.

(2) The second is the argument that the functional information indicates that only an extremely small fraction of genotypes have the desired function, and the rest are all alike in totally lacking any of this function.  This would prevent natural selection from following any path of increasing fitness to the function, and the rareness of the genotypes that have nonzero function would prevent mutational processes from finding them. This is, as far as I can tell, gpuccio’s islands-of-function argument. If such cases can be found, then explaining them by natural evolutionary processes would indeed be difficult. That is gpuccio’s main argument, and I leave it to others to argue with its application in the cases where gpuccio uses it. I am concerned here, not with the islands-of-function argument itself, but with whether the design inference from 500 bits of functional information is generally valid.

We are asking here whether, in general, observation of more than 500 bits of functional information is “a reliable indicator of design”. And gpuccio’s definition of functional information is not confined to cases of islands of function, but also includes cases where there would be a path to along which function increases. In such cases, seeing 500 bits of functional information, we cannot conclude from this that it is extremely unlikely to have arisen by normal evolutionary processes. So the general rule that gpuccio gives fails, as it is not reliable.

(3) The third possibility is an additional condition that is added to the design inference. It simply declares that unless the set of genotypes is effectively unreachable by normal evolutionary processes, we don’t call the pattern “complex functional information”. It does not simply define “complex functional information” as a case where we can define a level of function that makes probability of the set less than 2^{-500}.  That additional condition allows us to safely conclude that normal evolutionary forces can be dismissed — by definition. But it leaves the reader to do the heavy lifting, as the reader has to determine that the set of genotypes has an extremely low probability of being reached. And once they have done that, they will find that the additional step of concluding that the genotypes have “complex functional information” adds nothing to our knowledge. CFI becomes a useless add-on that sounds deep and mysterious but actually tells you nothing except what you already know. So CFI becomes useless. And there seems to be some indication that gpuccio does use this additional condition.

Let us go over these three possibilities in some detail. First, what is the connection of gpuccio’s “functional information” to Jack Szostak’s quantity of the same name?

Is gpuccio’s Functional Information the same as Szostak’s Functional Information?

gpuccio acknowledges that gpuccio’s definition of Functional Information is closely connected to Jack Szostak’s definition of it. gpuccio notes here:

Please, not[e] the definition of functional information as:

“the fraction of all possible configurations of the system that possess a degree of function >=
Ex.”

which is identical to my definition, in particular my definition of functional information as the
upper tail of the observed function, that was so much criticized by DNA_Jock.

(I have corrected gpuccio’s typo of “not” to “note”, JF)

We shall see later that there may be some ways in which gpuccio’s definition
is modified from Szostak’s. Jack Szostak and his co-authors never attempted any use of his definition to infer Design. Nor did Leslie Orgel, whose Specified Information (in his 1973 book The Origins of Life) preceded Szostak’s. So the part about design inference must come from somewhere else.

gpuccio seems to be making one of three possible arguments;

Possibility #1 That there is some mathematical theorem that proves that ordinary evolutionary processes cannot result in an adaptation that has 500 bits of Functional Information.

Use of such a theorem was attempted by William Dembski, his Law of Conservation of Complex Specified Information, explained in Dembski’s book No Free Lunch: Why Specified Complexity Cannot Be Purchased without Intelligence (2001). But Dembski’s LCCSI theorem did not do what Dembski needed it to do. I have explained why in my own article on Dembski’s arguments (here). Dembski’s LCCSI changed the specification before and after evolutionary processes, and so he was comparing apples to oranges.

In any case, as far as I can see gpuccio has not attempted to derive gpuccio’s argument from Dembski’s, and gpuccio has not directly invoked the LCCSI, or provided a theorem to replace it.  gpuccio said in a response to a comment of mine at TSZ,

Look, I will not enter the specifics of your criticism to Dembski. I agre with Dembski in most things, but not in all, and my arguments are however more focused on empirical science and in particular biology.

While thus disclaiming that the argument is Dembski’s, on the other hand gpuccio does associate the argument with Dembski here by saying that

Of course, Dembski, Abel, Durston and many others are the absolute references for any discussion about functional information. I think and hope that my ideas are absolutely derived from theirs. My only purpose is to detail some aspects of the problem.

and by saying elsewhere that

No generation of more than 500 bits has ever been observed to arise in a non design system (as you know, this is the fundamental idea in ID).

That figure being Dembski’s, this leaves it unclear whether gpuccio is or is not basing the argument on Dembski’s. But gpuccio does not directly invoke the LCCSI, or try to come up with some mathematical theorem that replaces it.

So possibility #1 can be safely ruled out.

Possibility #2. That the target region in the computation of Functional Information consists of all of the sequences that have nonzero function, while all other sequences have zero function. As there is no function elsewhere, natural selection for this function then cannot favor sequences closer and closer to the target region.

Such cases are possible, and usually gpuccio is talking about cases like this. But gpuccio does not require them in order to have Functional Information. gpuccio does not rule out that the region could be defined by a high level of function, with lower levels of function in sequences outside of the region, so that there could be paths allowing evolution to reach the target region of sequences.

An example in which gpuccio recognizes that lower levels of function can exist outside the target region is found here, where gpuccio is discussing natural and artificial selection:

Then you can ask: why have I spent a lot of time discussing how NS (and AS) can in some cases add some functional information to a sequence (see my posts #284, #285 and #287)

There is a very good reason for that, IMO.

I am arguing that:

1) It is possible for NS to add some functional information to a sequence, in a few very specific cases, but:

2) Those cases are extremely rare exceptions, with very specific features, and:

3) If we understand well what are the feature that allow, in those exceptional cases, those limited “successes” of NS, we can easily demonstrate that:

4) Because of those same features that allow the intervention of NS, those scenarios can never, never be steps to complex functional information.

Jack Szostak defined functional information by having us define a cutoff level of function to define a set of sequences that had function greater than that, without any condition that the other sequences had zero function. Neither did Durston. And as we’ve seen gpuccio associates his argument with theirs.

So this second possibility could not be the source of gpuccio’s general assertion about 500 bits of functional information being a reliable indicator of design, however much gpuccio concentrates on such cases.

Possibility #3. That there is an additional condition in gpuccio’s Functional Information, one that does not allow us to declare it to be present if there is a way for evolutionary processes to achieve that high a level of function. In short, if we see 500 bits of Szostak’s functional information, and if it can be put into the genome by natural evolutionary processes such as natural selection then for that reason we declare that it is not really Functional Information. If gpuccio is doing this, then gpuccio’s Functional Information is really a very different animal than Szostak’s functional information.

Is gpuccio doing that? gpuccio does associate his argument with William Dembski’s, at least in some of his statements.  And William Dembski has defined his Complex Specified Information in this way, adding the condition that it is not really CSI unless it is sufficiently improbable that it be achieved by natural evolutionary forces (see my discussion of this here in the section on “Dembski’s revised CSI argument” that refer to Dembski’s statements here). And Dembski’s added condition renders use of his CSI a useless afterthought to the design inference.

gpuccio does seem to be making a similar condition. Dembski’s added condition comes in via the calculation of the “probability” of each genotype. In Szostak’s definition, the probabilities of sequences are simply their frequencies among all possible sequences, with each being counted equally. In Dembski’s CSI calculation, we are instead supposed to compute the probability of the sequence given all evolutionary processes, including natural selection.

gpuccio has a similar condition in the requirements for concluding that complex
functional information is present:  We can see it at step (6) here:

If our conclusion is yes, we must still do one thing. We observe carefully the object and what we know of the system, and we ask if there is any known and credible algorithmic explanation of the sequence in that system. Usually, that is easily done by excluding regularity, which is easily done for functional specification. However, as in the particular case of functional proteins a special algorithm has been proposed, neo darwininism, which is intended to explain non regular functional sequences by a mix of chance and regularity, for this special case we must show that such an explanation is not credible, and that it is not supported by facts. That is a part which I have not yet discussed in detail here. The necessity part of the algorithm (NS) is not analyzed by dFSCI alone, but by other approaches and considerations. dFSCI is essential to evaluate the random part of the algorithm (RV). However, the short conclusion is that neo darwinism is not a known and credible algorithm which can explain the origin of even one protein superfamily. It is neither known nor credible. And I am not aware of any other algorithm ever proposed to explain (without design) the origin of functional, non regular sequences.

In other words, you, the user of the concept, are on your own. You have to rule out that natural selection (and other evolutionary processes) could reach the target sequences. And once you have ruled it out, you have no real need for the declaration that complex functional information is present.

I have gone on long enough. I conclude that the rule that observation of 500 bits of functional information is present allows us to conclude in favor of Design (or at any rate, to rule out normal evolutionary processes as the source of the adaptation) is simply nonexistent. Or if it does exist, it is as a useless add-on to an argument that draws that conclusion for some other reason, leaving the really hard work to the user.

Let’s end by asking gpuccio some questions:
1. Is your “functional information” the same as Szostak’s?
2. Or does it add the requirement that there be no function in sequences that
are outside of the target set?
3. Does it also require us to compute the probability that the sequence arises as a result of normal evolutionary processes?

1,971 thoughts on “Does gpuccio’s argument that 500 bits of Functional Information implies Design work?

  1. colewd: Joe Felsenstein,

    My reading is that when it is possible for NS to get to sequences in the target set by following a path of increasing function, gpuccio does not count those cases as complex functional information. Do you read gpuccio as counting those as complex functional information.

    Bill Cole: I just read parts of a couple of his ops and I think he agrees with you here. He would agree that NS can potentially migrate around an isle of function. When he returns to active commenting at UD I will verify this.

    Difficult, because gpuccio does not explicitely state that he excludes FI introduced by natural selection. But I think Joe’s reading is closer to the truth. This part summarizes his thoughts on the matter:

    Joe Felsestein says:

    “Or has gpuccio restricted the 500-Bits-Rule somehow, such as requiring that all sequences outside of the target set have no function at all? ”

    I have not restricted anything. A 500 bits function requires at least 500 specific bits to be implemented. So, the function is not there if those bits are not there. If someone can show that the function can be implemented with, say, 100 bits, then the functional complexity of the function is 100 bits, and not 500 bits.

    So, if I observe a function that requires 500 bits to be there, there is no gradual way of implementing it. As, in the thief example, there is no gradual way to find the key to the big safe by step by step attempts.

    The same is true ofr a new protein, unrelated to existing ones at the sequence level, and with w new function: if the transition to the new functional protein requires at least 500 new bits of functional information, it cannot be achieved by gradual increasingly functional steps. Like the key for the big safe.

    This is a description of a binary all-or-nothing function, not a quantitative one as required by the Hazen/Szostak definition. This allows him to dismiss any inconvenient demonstrations of experimental evolution introducing significant amounts of FI, by declaring it either “below the threshold of a naturally selectable function” or “tweaking an existing function”. The latter is taken care of by the cryptic requirement that all this FI be “new and original”. In gpuccio’s view, a protein can never cross the magic line from non-function to function unaided. Hence, although he does not appear to exclude NS as a source of FI in principle, he has raised the bar to effectively achieve this very outcome.

    But this is just my reading of his words. It would certainly be good if he commented on this himself.

  2. Corneel: In gpuccio’s view, a protein can never cross the magic line from non-function to function unaided. Hence, although he does not appear to exclude NS as a source of FI in principle, he has raised the bar to effectively achieve this very outcome.

    Thanks, Corneel, for pointing out what gpuccio is doing. And that it is not 100% certain that this is what gpuccio really is doing.

    colewd, let’s be careful before deciding that we agree. I’m not talking about change of sequences within the set of sequences whose function is above the threshold. I’m arguing that if gpuccio sees a path for NS from outside that set to inside it, gpuccio declares that case not to be an example of CFI.

    (Or maybe gpuccio doesn’t, as Corneel has also noted, it’s unclear at best).

  3. Rumraket,

    No there is nothing incoherent about that. That’s what happens when the antibodies from your immune system mutate to become specific towards antigens on foreign material.

    t
    This is an interesting example however I assume you understand that in this example the cell is actually hyper mutating deterministically a very small substrate. Ie 10 AA or less. It is also binding one protein.

    How about a system where dozens of proteins are interacting (ubiquitin system) to simply regulate a function. Where is the step by step process there?

  4. Corneel,

    This is a description of a binary all-or-nothing function, not a quantitative one as required by the Hazen/Szostak definition.

    It is not an all or nothing function it is simply saying for the function specified 500 bits are required. If through experiment it was discovered that there is a path that requires less then 500 bits then the 500 bit specification would be incorrect.

    The experiments you speak of require less then 500 bits. In the Hayashi case a lower level function theoretically requires less then 500 bits. I say theoretically as it does not count the other 300AA required for the function in the bit calculation.

  5. Joe Felsenstein,

    colewd, let’s be careful before deciding that we agree. I’m not talking about change of sequences within the set of sequences whose function is above the threshold. I’m arguing that if gpuccio sees a path for NS from outside that set to inside it, gpuccio declares that case not to be an example of CFI.

    If there is an attainable (given a practical amount of resources) path then the threshold has not been set properly.

    Given infinite time your threshold concept is ok.

    Given limited time you are essentially dealing with an “event horizon” given the number of trials to cross the bridge from non function to function.

  6. colewd: This is an interesting example however I assume you understand that in this example the cell is actually hyper mutating deterministically a very small substrate. Ie 10 AA or less.

    Yes and no. It is true that that there is a portion of the genes that code for antibodies called the hypervariable region that is mutated through targeted somatic hypermutation, but the mutations are still blind (it’s not like the cell knows what the effect of the mutation will be, nor even what kind of DNA base it will be turned into).

    But antibodies also mutate outside the hypervariable region just at a lower level.

    It is also binding one protein.

    Antibodies have to be able to bind both the epitope of the antigen, but also interacts with proteins that sit on the outside of both killer and memory T-cells, and various B-cells such as macrophages. Another part of the antibody has a signaling function.

    How about a system where dozens of proteins are interacting (ubiquitin system) to simply regulate a function. Where is the step by step process there?

    The same thing buddy. Mutations happen that affect the function of the protein, and they either are visible to selection or not, and thus their affect on fitness of the host organism determine whether they are retained or selected for or against.

    The fact that there are multiple levels of constraint operating on the protein doesn’t mean it can’t possibly change. You are stuck in dichotomous thinking. It doesn’t go from complete to zero freedom to change with nothing in between.

    Another example is various viral coat proteins that simultaneously have to constitute a structural element that keeps the virus particle compact and protected, some of which also functions as a signaling molecule that instigates intracellular transport on host cells it attempts to invade, or as membrane transport channels, and even some times as enzymes like reverse transcriptase. To make matters even worse, in addition to having the combined constraints of all these functions, it also has to constantly change and evolve because if it does not (viral coat proteins sit on the viral surface), it becomes the target of host immune system antibodies and the virus will quickly go extinct.

    How is this possible if multiple levels of constraint completely prevented protein evolution? And how is somatic hypermutation even possible if deleterious mutations so massively outnumber beneficial ones that selection is supposed to be powerless against them?

  7. Rumraket,

    How is this possible if multiple levels of constraint completely prevented protein evolution? And how is somatic hypermutation even possible if deleterious mutations so massively outnumber beneficial ones that selection is supposed to be powerless against them?

    You’re comparing apples and oranges. Rapid mutation that is limited to a confined substrate vs slow mutation that can occur anywhere in the genome. A 10 AA substrate that confines mutation is a very different animal then a 300 AA substrate that does not.

  8. Rumraket,

    The fact that there are multiple levels of constraint operating on the protein doesn’t mean it can’t possibly change.

    Do I detect a strawman:-)

  9. colewd: If there is an attainable (given a practical amount of resources) path then the threshold has not been set properly.

    There you go, if the evidence shows x amount of FI can evolve, keep pushing the threshold further. Anything but dealing with the evidence.

  10. colewd: You’re comparing apples and oranges. Rapid mutation that is limited to a confined substrate vs slow mutation that can occur anywhere in the genome.

    The hypervariable site is not the only part of antibodies that mutate. It is the spot targeted by somatic hypermutation, but there are regions outside of this that also mutate well above the normal rate for the rest of the genome. But even so, that just goes to show that just because a protein has multiple functions doesn’t mean it can’t evolve. it simply proves that it is possible for mutations to positively affect one function, without negatively affecting the other functions.

  11. colewd: Do I detect a strawman:-)

    How have I strawmanned you? What was the point you were trying to make by pointing out that proteins have multiple functions and must interact with other proteins too, besides some particular function, if not that these facts make protein evolution implausible? That was the point you were making right? And that was the point I addressed, so how have I strawmanned you? Perhaps if that wasn’t the point you wished to make you have communicated your intentions badly?

  12. Rumraket,

    And that was the point I addressed, so how have I strawmanned you?

    The fact that there are multiple levels of constraint operating on the protein doesn’t mean it can’t possibly change.

    Where did I say a protein can’t possible change?

    Change and a step by step process improving adaption are apples and oranges.

    The adaptive immune system is not the ubiquitin system.

  13. colewd:

    Joe Felsenstein,

    … I’m arguing that if gpuccio sees a path for NS from outside that set to inside it, gpuccio declares that case not to be an example of CFI.

    If there is an attainable (given a practical amount of resources) path then the threshold has not been set properly.

    There is indeed a disagreement between us. In Szostak’s and Hagen’s definition of Functional Information, there is a scale and each sequence has a level of “function” somewhere in that scale. There is a cutoff (anywhere on that scale), and the FI is calculated, a value that depends on where you set the cutoff.

    That is OK with Szostak and OK with Hagen, but you and gpuccio make further stipulations, that there must be a level which “is function” and below it all levels “are not function”. And that if there isn’t, you both argue that there is a need to set the threshold differently.

    Using Szostak and Hagen’s definitions, there is no necessity that “function” is absent for all sequences that have levels of function below the threshold. It is quite possible that there exist paths for natural selection (in a population) to change sequences so that one moves to an FI greater than 500 bits (from lower FI values).

    Hence there is no reason to think that these normal evolutionary processes cannot achieve 500 bits of FI, where FI is defined as it is by Szostak and Hagen.

  14. colewd: Change and a step by step process improving adaption are apples and oranges.

    No, a step by step process is an example of change.

    The adaptive immune system is not the ubiquitin system.

    Really? It also isn’t a car wash, or an oak tree.

    These statements are vacuous. What conclusion do you wish us to draw from them?

  15. Joe Felsenstein,

    It is quite possible that there exist paths for natural selection (in a population) to change sequences so that one moves to an FI greater than 500 bits (from lower FI values).

    How have you determined that this is “quite possible”?

    How many bits (in all populations) of evolutionary resources since the origin of life do you think you have to achieve this?

    We can quibble whether the “event horizon” is part of the definition or not but I would argue your odds of making any appreciable dent in the FI over time is vanishingly small as the amount of sequences that can cross over into functional information given reasonable time and resources is vanishingly small compared to the total sequence space. Again, assuming 500 bits of functional information.

    Using Szostak and Hagen’s definitions, there is no necessity that “function” is absent for all sequences who have levels of function below the threshold.

    As Gpuccio mentioned we are dealing with 2000 distinct protein families. Function does not mean a functional path.

  16. Rumraket,

    No, a step by step process is an example of change.

    And apples and oranges are fruit:-) When I argue a specific and you change it to a generic you have changed the argument.

  17. colewd: It is not an all or nothing function it is simply saying for the function specified 500 bits are required. If through experiment it was discovered that there is a path that requires less then 500 bits then the 500 bit specification would be incorrect.

    Ah, now the penny drops.

    All-or-nothing is how I read his thief-at-the-safe example: the safe opens or it doesn’t. That is a binary function, regardless of where the threshold is placed, and gpuccio states that he thinks of protein function (and biological systems) in that way. If that is how he views biological function, then of course there can be no path leading into it (I see that Joe was a step ahead of me).

    And if I am not mistaken, this is how you view biological systems as well. You can’t change a phone number and expect to be able to reach the same person with it. Either the watch works, or it is broken. Do I see that correctly?

  18. Corneel,

    Yes, gpuccio and colewd are taking “function” to either exist or not, and are taking the threshold as set properly when it separates the sequences that have no function from ones that have the function.

    Szostak and Hagen do not make this special set of assumptions.

  19. Joe Felsenstein,

    Yes, gpuccio and colewd are taking “function” to either exist or not, and are taking the threshold as set properly when it separates the sequences that have no function from ones that have the function.

    Correction: gpuccio and colewd are saying that the AA sequence can generate ” the specified function” or not and expect that the “threshold” demarcates sequences that can generate “the specified function” from ones that cannot. The sequences that can generate the specified function are called functional information or FI.

    If we observe this function in nature and estimate 500 bits of FI are required to generate it we can confidently infer design.

  20. colewd: If we observe this function in nature and estimate 500 bits of FI are required to generate it we can confidently infer design.

    But that estimate is merely based on the number of conserved sequences performing the function of interest. That tells us nothing about how many other sequences are “out there” that can perform the function well enough for an organism to survive, nor does it tell us anything about whether there is a path to those sequences that natural selection can take.

    You are looking at a collection of sequences that sit somewhere on a local hilltop, but that doesn’t tell you what the shape of the hill is near the base, or how wide the hill is, or how many such hills exist, or the steepness of the slope leading up to the top. You can’t just declare that because you can only find some X number of sequences that perform the function that this somehow magically rules out that there could be a selectable path to the level of function seen at the top of the hill. You don’t have the kind for information that could support that conclusion.

    By couching it in terms of FI and calculating that because the sequences at the top of the hill are so few that the ensemble exhibits 500 bits of FI, you don’t accomplish that either. It’s basically just a very roundabout way of presenting an argument from ignorance without explicitly pointing to the ignorance. Yet that is essentially what it is.

  21. colewd: Correction: gpuccio and colewd are saying that the AA sequence can generate ” the specified function” or not and expect that the “threshold” demarcates sequences that can generate “the specified function” from ones that cannot.

    Why is that distinction important? Neither you nor gpuccio are very keen on co-option as an explanation so why would the capability to perform some other unspecified function be relevant?

  22. Corneel,

    Why is that distinction important? Neither you nor gpuccio are very keen on co-option as an explanation so why would the capability to perform some other unspecified function be relevant?

    We are not observing an unspecified function. We are observing a specific biological function and AA sequences between species separated by deep time.

    In the case of ATP synthase this sequence had billions of years to substitute all its AA’s however 2 species separated by that deep time ended up with the majority of those sequences conserved.

    If there really exists a selectable path to a specific function for proteins as a minimum you need the ability for the AA’s to tolerate variation.

    For Rum’s and Joe’s claim to be true you don’t need a selectable path you need a huge amount of selectable paths for random variation to find one in 500 bits of sequence space. Once it finds the selectable path it must stay on that path.

    Two proteins preforming the same function separated by billions of years ending up in almost the same place contradicts the concept of many selectable paths.

    If there are less then an enormous amount of selectable paths to the observed function the conservatism of the 500 bit calculation makes it a strong validation of the design inference as the best explanation of the observed functional sequence.

  23. colewd:
    We are not observing an unspecified function. We are observing a specific biological function and AA sequences between species separated by deep time.

    And that’s the problem. You’re concluding too much from sequences that had already gotten stuck into a function before that separation. That tells you nothing about whether it could have been something entirely different. It just tells you something about the sequences after-the-fact. This is the main point you guys are missing. Whether a sequence is stuck into some particular point in sequence space or not doesn’t tell you anything about whether it could have evolved or not. It just tells you about the state of affairs for what you’re observing, and how long ago it might have gotten “stuck.”

    colewd:
    In the case of ATP synthase this sequence had billions of years to substitute all its AA’s however 2 species separated by that deep time ended up with the majority of those sequences conserved.

    1. As I’ve explained before, for all of the substitutable AA to be changed, you’d require a lot of mutations. So many, that you’re implying that a lot of sequence space would have been explored.

    2. The sequences didn’t end up with the majority conserved, they ended up with a proportion of conserved AAs that is beyond what would be expected by chance, which is due to the sequences being homologous. Whether they could change much more or not is not tested by such comparison, and is irrelevant to whether the sequence ancestral to those two evolved or not.

    colewd:
    If there really exists a selectable path to a specific function for proteins as a minimum you need the ability for the AA’s to tolerate variation.

    While the functions are evolving, sure. Once they’ve established a lot of interdependences, no. Again, you’re looking at them after-the-fact. Not before.

    colewd:
    For Rum’s and Joe’s claim to be true you don’t need a selectable path you need a huge amount of selectable paths for random variation to find one in 500 bits of sequence space. Once it finds the selectable path it must stay on that path.

    More like once it finds a path and follows it, it might get stuck within some limited sequence space, which is what gets you guys confused. Joe is actually talking about the evolution of a function, while you’re talking about the conservation of a sequence once it has gone through the path and found a local minimum, as if staying within a local minimum was the same as getting to, evolving towards, a local minimum.

    colewd:
    Two proteins preforming the same function separated by billions of years ending up in almost the same place contradicts the concept of many selectable paths.

    Sorry Bill, but you’re contradicting yourself. If you’re thinking that the two sequences “ended up” in the same place independently, then there must be many selectable paths leading to the same place. If, instead, you’re thinking that they didn’t diverge much, you’re, again, taking the wrong data for the conclusion you wish to attain. The two sequences didn’t end up in almost the same place. The two sequences were already there before separating into the two lineages analyzed. It’s not too surprising that they’d stay there if the sequence had already reached some local minimum.

    colewd:
    If there are less then an enormous amount of selectable paths to the observed function the conservatism of the 500 bit calculation makes it a strong validation of the design inference as the best explanation of the observed functional sequence.

    Nope. It just suggests that the sequence was already at some local minimum before it diverged into the lineages analyzed.

    The measure is about the information shared by the sequences Bill. Information shared because the sequences share common ancestry. It’s not the amount of functional information evolved, let alone of the proportion of sequences that could evolve to attain some function. Even if gpuccio’s self-defeating assumptions were correct, at best you’d still be talking about how restricted the sequence might be after-having-evolved-some-function. Not about the amount of functional information of the sequence.

  24. colewd: In the case of ATP synthase this sequence had billions of years to substitute all its AA’s however 2 species separated by that deep time ended up with the majority of those sequences conserved.

    And for some of the sequences that accumulated mutations, they took on other functions.

  25. colewd: If there really exists a selectable path to a specific function for proteins as a minimum you need the ability for the AA’s to tolerate variation.

    Bill, those previous “steps” in the path are further down the slope. Selection goes uphill, not down. It would be odd to expect there to be variants that have significantly lower fitness than the sequences around the top of the hill. Once selection finds a hilltop, you expect it to generally stay there unless the environment somehow changes such that whatever function the hilltop is a manifestation of, no longer constitutes a hilltop in the fitness landscape.

    But the sequences a bit further downhill, while worse than the sequences at the top of the hill, are still better than sequences at the very base of the hill. So there could easily be many paths up the slope to the top, but once up there, selection will preserve the function found at that local optimum. The reason we don’t see more diversity of sequence is because the sequences at the top have significantly higher fitness than sequences further away, but that doesn’t mean sequences further away don’t also have higher fitness that sequences even further away.

    You can’t rule this out just by picking out the sequences we see that sit on the very top of the hill (the local optimum) and then calculating that because they are only a tiny subset of the total amount of possible sequences, they exhibit 500 bits of FI and this then means they can’t evolve. It simply doesn’t follow. Finding that the sequences you know about together exhibit 500 bits of FI tells you nothing about how big the hill is, or how many paths can be taken to the top, or how broad the base is, or the steepness of the slope leading up.

    You know NOTHING about this and you don’t get knowledge about it by calculating that those sequences known about at the top imply 500 bits of FI.

    And you’re still completely ignoring the fact that ATP synthase belongs to the superfamily of P-loop kinases that have many diverse but related functions. Some closer to ATP synthase than others, like GTP synthases.

    For Rum’s and Joe’s claim to be true you don’t need a selectable path you need a huge amount of selectable paths for random variation to find one in 500 bits of sequence space.

    There could be many, you have no idea how many paths lead to the top of the hill where the sequences we know about sit. Nor what fraction of such paths lead to other functions.

    Two proteins preforming the same function separated by billions of years ending up in almost the same place contradicts the concept of many selectable paths.

    No not at all. Finding the same hilltop says nothing about how many paths lead up to the top of that hill. All that says is that the top of that hill is quite narrow. It says nothing about from how many locations the top can be reached by selection.

    If there are less then an enormous amount of selectable paths to the observed function the conservatism of the 500 bit calculation makes it a strong validation of the design inference as the best explanation of the observed functional sequence.

    None of that made any sense.

  26. colewd: For Rum’s and Joe’s claim to be true you don’t need a selectable path you need a huge amount of selectable paths for random variation to find one in 500 bits of sequence space.

    And for gpuccio’s claims to be true, firstly we have to work out what he’s actually claiming. See the difference?

  27. Entropy,

    . Joe is actually talking about the evolution of a function, while you’re talking about the conservation of a sequence once it has gone through the path and found a local minimum, as if staying within a local minimum was the same as getting to, evolving towards, a local minimum.

    I apologize for not going through all your points due to time constraint however the above is the key point. Evolution explaining its stuck location is not credible. If its origin was a random sequence it is highly improbably that it could find such a rare function. The Hayashi paper confirms this as the random sequence finds a sub optimized location. The estimated journey to the observed location is beyond evolutionary resources to find it.

    You need to validate some of the facts you cited as I am not sure you are right. An example is the Beta chain of the F1 subunit of ATP synthase where the sequence identity between e coli and humans is greater then 60%.

  28. colewd: I apologize for not going through all your points due to time constraint however the above is the key point. Evolution explaining its stuck location is not credible.

    Why? How would it move away from the hill if it constitutes a local optima?

    If its origin was a random sequence it is highly improbably that it could find such a rare function.

    How rare is it? Why couldn’t it find it? Could it have evolved from another similar sequence, like GTP synthase, or any of the countless other P-look kinases?

    The Hayashi paper confirms this as the random sequence finds a sub optimized location.

    That would seem to indicate the opposite, as exactly what we’re saying is that evolution gets stuck at a local optima. It would be odd if all the hills were the exact same height. Selection finds a hill, climbs to a peak and then stays there and some times the sequences we see in life are the ones stuck around the top of some local hill. That says nothing about whether it is possible to climb the hill to begin with, or how many paths to such hills there are, or how many such hills there are, or how broad they are, etc. etc.

    The estimated journey to the observed location is beyond evolutionary resources to find it.

    Why? You just blindly declare it, but what supports that claim?

    You need to validate some of the facts you cited as I am not sure you are right. An example is the Beta chain of the F1 subunit of ATP synthase where the sequence identity between e coli and humans is greater then 60%.

    Yes, and? We know there are many other beta chains of homologous NTP synthases, like the highly divergent ATP synthases cited months ago by Entropy and DNA_jock, not to mention other ATP synthases besides F1, and then there’s the whole family of P-loop kinases (of which ATP synthase is just one “clade” in a larger family tree).
    Regardless of that, the degree of divergence of homologous proteins with the same function says nothing about the shape of the base of the hill and it’s surroundings, how many such hills with similar functions exist, or whether and to what extend they overlap hills that exhibit other functions.

  29. Rumraket: Why? How would it move away from the hill if it constitutes a local optima?

    How rare is it? Why couldn’t it find it? Could it have evolved from another similar sequence, like GTP synthase, or any of the countless other P-look kinases?

    That would seem to indicate the opposite, as exactly what we’re saying is that evolution gets stuck at a local optima. It would be odd if all the hills were the exact same height. Selection finds a hill, climbs to a peak and then stays there and some times the sequences we see in life are the ones stuck around the top of some local hill. That says nothing about whether it is possible to climb the hill to begin with, or how many paths to such hills there are, or how many such hills there are, or how broad they are, etc. etc.

    Why? You just blindly declare it, but what supports that claim?

    Yes, and? We know there are many other beta chains of homologous NTP synthases, like the highly divergent ATP synthases cited months ago by Entropy and DNA_jock, not to mention other ATP synthases besides F1, and then there’s the whole family of P-loop kinases (of which ATP synthase is just one “clade” in a larger family tree).
    Regardless of that, the degree of divergence of homologous proteins with the same function says nothing about the shape of the base of the hill and it’s surroundings, how many such hills with similar functions exist, or whether and to what extend they overlap hills that exhibit other functions.

    Can someone please tell me how much speculation is enough? Especially in math? How much room is there between 2+2 unless one is a moron like Krauss?

  30. Yeah! 2+2 has always been 5…I just don’t know what I was thinking…I should have been an theoretical cosmologist…You can always get away with 2+2 inaccuracy if you include quantum mechanics unpredictability…

  31. Rumraket,

    Why? How would it move away from the hill if it constitutes a local optima?

    The problem is a random search finding the hill in the first place when it is mutating though enormous search space. Optimized sequences appear very rare (Hayashi paper).

  32. colewd:
    Rumraket,

    The problem is a random search finding the hill in the first place when it is mutating though enormous search space.

    But you said that evolution being “stuck” is not credible, now you’re changing your argument to finding the hill instead. So do you concede there’s no issue with getting stuck on a local optimum?

    Optimized sequences appear very rare (Hayashi paper).

    That doesn’t mean it can’t evolve. All that would imply is that most sequences that evolve aren’t the best ones possible, but stuck at local optima instead.

    So why do you think ATP synthase is “optimized”?

  33. colewd: Optimized sequences appear very rare (Hayashi paper).

    What does it say about the strength and quality of gpuccio’s claims that he absolutely refuses to formalise them into a paper?

  34. Rumraket,

    But you said that evolution being “stuck” is not credible, now you’re changing your argument to finding the hill instead. So do you concede there’s no issue with getting stuck on a local optimum?

    If it is stuck it is evidence that the solution is sequence sensitive. If it is sequence sensitive that supports the design inference.

  35. It seems to me that there is a self-contradiction here: conservation of a sequence is supposed to mean that there are no other regions of the sequence space that have high levels of “function”. Why? Because there has been ample time to find them, and they weren’t found.

    Once it has been concluded that there is only one high-function region of the sequence space, then that is supposed to indicate that it must have been found by Design. Why? Because there has not been ample time to find it, but it was nevertheless found.

    I don’t think that the argument works. What have I misunderstood in that argument?

  36. colewd: If it is stuck it is evidence that the solution is sequence sensitive.

    Yes. As you move further away from the top of the hill, mutations become deleterious, as in they have lower fitness, so are selected against compared to the sequences that are found at the very top. But for a sequence even further down the hill, even the sequences that are not all the way at the top, but just below it, are still better than those even further down. You understand the concept of a hill right?

    Taking a few steps away from the top takes you further down, but there are still many more steps down to the bottom. Every step uphill from the bottom is beneficial, but once you’re at the top, you’re stuck there because there are no more steps that lead further up.

    If it is sequence sensitive that supports the design inference.

    No that just implies a local optimum.

    A is better than B, B is better than C, C is better than D, D is better than E. Once you get to A, going back to B is deleterious. But going from E to A is beneficial all the way. Finding sequences that are all “A” doesn’t mean you can’t get to A from E.

    I don’t know how many other ways to explain this. You can climb a ladder, and you get further up every step. Or walk a flight of stairs. Once you’re at the top, any other step on the ladder/stairs is a step down. But Any step from the bottom is a step up. And any steps other than the top and bottom on the ladder/stairs have both a higher and a lower step near it. And selection is the process that biases towards steps up.

    Are you getting this at some point?

  37. Joe Felsenstein:
    It seems to me that there is a self-contradiction here: conservation of a sequence is supposed to mean that there are no other regions of the sequence space that have high levels of “function”.Why?Because there has been ample time to find them, and they weren’t found.

    Once it has been concluded that there is only one high-function region of the sequence space, then that is supposed to indicate that it must have been found by Design.Why?Because there has not been ample time to find it, but it was nevertheless found.

    I don’t think that the argument works.What have I misunderstood in that argument?

    I believe the argument goes like this. Because there’s no functional pathway for NS to work it’s way to the present, highly optimized sequence (and you better buy his premise here or else you must provide a full, step by step description of such pathway) then it had to get there in a purely random search. But 400 plus million years of drift would mean no sequence similarity, therefore sweet jeesus.

    Problem is a cosmic designer could get anywhere in sequence space much faster than drift, how come after 400M years of “neutral design” just happened to land on a sequence with significant similarity to their distant relatives? What a lazy ass designer, and what a lucky bastard to have that optimum right there in the vicinity of the ancestral sequence!

  38. colewd:
    I apologize for not going through all your points due to time constraint however the above is the key point. Evolution explaining its stuck location is not credible.

    1. You’re missing the point. Here it goes again: gpuccio is measuring conservation of sequences that are already stuck in a local minimum. That doesn’t tell him or you anything about FI. It just tells you that the sequences are homologous, and that they’re somewhat stuck in that local minimum.

    2. Evolution explains paths towards getting stuck into a local minimum. What you’re saying is like saying that the shape of a whole doesn’t explain why stones get stuck at the bottom.

    colewd:
    If its origin was a random sequence it is highly improbably that it could find such a rare function.

    !. The origin is not just random sequences. It’s sequences going through rounds of selection/mutation/recombination/etc.

    2. You don’t know if the function is rare or improbable.

    colewd:
    The Hayashi paper confirms this as the random sequence finds a sub optimized location.The estimated journey to the observed location is beyond evolutionary resources to find it.

    Again, no. The paper had experimental limitations. Why do I need to repeat this point?

    colewd:
    You need to validate some of the facts you cited as I am not sure you are right.

    Which facts? That gpuccio’s bullshit is based on the examination of sequences that are already in a local minimum? That doesn’t need citation. The sequences perform the same function in the two lineages. therefore, it already had the function before diverging into the two lineages, aka, it was already in that local minimum. It’s simple logic.

    colewd:
    An example is the Beta chain of the F1 subunit of ATP synthase where the sequence identity between e coli and humans is greater then 60%.

    Yes. That’s an example of what I’m saying. The sequence was already performing a function before the bacterial and human lineages separated, and it’s therefore stuck there with mutations/selection/etc keeping it dancing around the local minimum.

    Again, you cannot tell whether other sequences could have done that work or not. You cannot tell how improbable is would be for a protein to evolve that would perform that function. You can just tell how similar the sequences have remained after-the-function. Again, you’re mistaking conservation-of-a-function for a no-function-to-function transition.

  39. Entropy is using the holes instead of hills analogy of the fitness landscape, just to clarify. Here sequences “roll” into a hole and stay at the bottom, rather than “climb” hills and stay at the top.

  40. Joe Felsenstein,

    conservation of a sequence is supposed to mean that there are no other regions of the sequence space that have high levels of “function”.

    Conservation of sequences means that for the identified function an amino acid substitution of the preserved sequences is getting removed from the populations that utilize these proteins. This is occurring despite very different evolutionary paths.

  41. colewd:
    Joe Felsenstein,

    Conservation of sequences means that for the identified function an amino acid substitution of the preserved sequences is getting removed from the populations that utilize these proteins.

    Yup, purifying selection, deleterious mutants being held to low frequencies in the population.

    But somehow gpuccio and you take this to mean that this process widely explores the space of all possible sequences, and would find all other islands of adaptation in long times.

    I know the relevant mathematical theory and can tell you that you’re wrong about that.

    What I don’t get at all is why you also think, at the same time, that starting from one sequence, the process wouldn’t find its way uphill on the fitness surface.

    These two assertions by you and gpuccio contradict each other.

    This is occurring despite very different evolutionary paths.

    I wish I knew what that sentence meant.

  42. Joe Felsenstein: I wish I knew what that sentence meant.

    What makes it doubly difficult to attempt to parse is that presumably neither colewd nor gpuccio think that those different evolutionary paths existed at all, beleving as they do that evolution is not the explanation for these extant configurations.

  43. I am just puzzled why they think that evolution is unable to climb the nearest peak in the fitness surface, but once it is there, that after a while it has somehow explored all the others, so we can use that as evidence that they don’t exist.

  44. colewd:
    If there really exists a selectable path to a specific function for proteins as a minimum you need the ability for the AA’s to tolerate variation.

    Which argues against a distinction between “function” and “a specified function”. If it is broke, it ain’t working. Period. So why insisting on a specified function, without actually, you know, specifying a function.

    So I ask again, why is it so important to you that the threshold demarcates sequences that can generate “the specified function” from ones that cannot, rather than just any “function”. There is nothing in your argument that makes such a distinction relevant. Your description of organisms remains a spitting image of the clockwork beings from the watchmaker analogy, where changing the little cogs and springs will lead to malfunction.

  45. Joe Felsenstein,

    Yup, purifying selection, deleterious mutants being held to low frequencies in the population.

    We agree on this point.

    But somehow gpuccio and you take this to mean that this process widely explores the space of all possible sequences, and would find all other islands of adaptation in long times.

    We know you cannot explore even a tiny fraction of this space. The inference is based the observation of a sequence that is in a highly constrained state. The highly constrained state indicates a high level of functional information.

    The hypothesis is that highly constrained functional AA sequences are rare compared to the total search space.

    What I don’t get at all is why you also think, at the same time, that starting from one sequence, the process wouldn’t find its way uphill on the fitness surface.

    What does finding its way uphill mean with a protein subunit like the ATP synthase F1 beta chain?

    Bill: This is occurring despite very different evolutionary paths.

    Joe:I wish I knew what that sentence meant.

    If we are comparing the E Coli beta chain protein with the human beta chain protein. One split from the common ancestor over a billion years ago and remained a bacteria. The other became a eukaryotic cell and eventually a human. These are very different paths with very different population characteristics.

  46. Corneel,

    Which argues against a distinction between “function” and “a specified function”. If it is broke, it ain’t working. Period. So why insisting on a specified function, without actually, you know, specifying a function.

    The method compares a specific function of two or more different animals.

    The functions are known and specified however what we are comparing is often just a piece of the overall function. I don’t know how we make a coherent comparison of different proteins or different groups of proteins with different functions.

  47. Joe Felsenstein,

    I am just puzzled why they think that evolution is unable to climb the nearest peak in the fitness surface,

    If it can’t find the fitness surface how can it climb it?

  48. colewd: The method compares a specific function of two or more different animals.

    The functions are known and specified however what we are comparing is often just a piece of the overall function. I don’t know how we make a coherent comparison of different proteins or different groups of proteins with different functions.

    Do you mean that you consider the molecular function of e.g. E3 ubiquitin-protein ligases in humans to be different from that in other animals? Like, a spark plug in a Volvo car does not perform the same function as the same plug in a Ford because they are part of a different configuration? If that is what you mean, that would be completely bonkers! All this time you were hammering on the fact that this and that protein was highly conserved “despite very different evolutionary paths”, yet you think they all have different functions? How does that make sense from your design view of life?

Leave a Reply