Does gpuccio’s argument that 500 bits of Functional Information implies Design work?

On Uncommon Descent, poster gpuccio has been discussing “functional information”. Most of gpuccio’s argument is a conventional “islands of function” argument. Not being very knowledgeable about biochemistry, I’ll happily leave that argument to others.

But I have been intrigued by gpuccio’s use of Functional Information, in particular gpuccio’s assertion that if we observe 500 bits of it, that this is a reliable indicator of Design, as here, about at the 11th sentence of point (a):

… the idea is that if we observe any object that exhibits complex functional information (for example, more than 500 bits of functional information ) for an explicitly defined function (whatever it is) we can safely infer design.

I wonder how this general method works. As far as I can see, it doesn’t work. There would be seem to be three possible ways of arguing for it, and in the end; two don’t work and one is just plain silly. Which of these is the basis for gpuccio’s statement? Let’s investigate …

A quick summary

Let me list the three ways, briefly.

(1) The first is the argument using William Dembski’s (2002) Law of Conservation of Complex Specified Information. I have argued (2007) that this is formulated in such a way as to compare apples to oranges, and thus is not able to reject normal evolutionary processes as explanations for the “complex” functional information.  In any case, I see little sign that gpuccio is using the LCCSI.

(2) The second is the argument that the functional information indicates that only an extremely small fraction of genotypes have the desired function, and the rest are all alike in totally lacking any of this function.  This would prevent natural selection from following any path of increasing fitness to the function, and the rareness of the genotypes that have nonzero function would prevent mutational processes from finding them. This is, as far as I can tell, gpuccio’s islands-of-function argument. If such cases can be found, then explaining them by natural evolutionary processes would indeed be difficult. That is gpuccio’s main argument, and I leave it to others to argue with its application in the cases where gpuccio uses it. I am concerned here, not with the islands-of-function argument itself, but with whether the design inference from 500 bits of functional information is generally valid.

We are asking here whether, in general, observation of more than 500 bits of functional information is “a reliable indicator of design”. And gpuccio’s definition of functional information is not confined to cases of islands of function, but also includes cases where there would be a path to along which function increases. In such cases, seeing 500 bits of functional information, we cannot conclude from this that it is extremely unlikely to have arisen by normal evolutionary processes. So the general rule that gpuccio gives fails, as it is not reliable.

(3) The third possibility is an additional condition that is added to the design inference. It simply declares that unless the set of genotypes is effectively unreachable by normal evolutionary processes, we don’t call the pattern “complex functional information”. It does not simply define “complex functional information” as a case where we can define a level of function that makes probability of the set less than 2^{-500}.  That additional condition allows us to safely conclude that normal evolutionary forces can be dismissed — by definition. But it leaves the reader to do the heavy lifting, as the reader has to determine that the set of genotypes has an extremely low probability of being reached. And once they have done that, they will find that the additional step of concluding that the genotypes have “complex functional information” adds nothing to our knowledge. CFI becomes a useless add-on that sounds deep and mysterious but actually tells you nothing except what you already know. So CFI becomes useless. And there seems to be some indication that gpuccio does use this additional condition.

Let us go over these three possibilities in some detail. First, what is the connection of gpuccio’s “functional information” to Jack Szostak’s quantity of the same name?

Is gpuccio’s Functional Information the same as Szostak’s Functional Information?

gpuccio acknowledges that gpuccio’s definition of Functional Information is closely connected to Jack Szostak’s definition of it. gpuccio notes here:

Please, not[e] the definition of functional information as:

“the fraction of all possible configurations of the system that possess a degree of function >=
Ex.”

which is identical to my definition, in particular my definition of functional information as the
upper tail of the observed function, that was so much criticized by DNA_Jock.

(I have corrected gpuccio’s typo of “not” to “note”, JF)

We shall see later that there may be some ways in which gpuccio’s definition
is modified from Szostak’s. Jack Szostak and his co-authors never attempted any use of his definition to infer Design. Nor did Leslie Orgel, whose Specified Information (in his 1973 book The Origins of Life) preceded Szostak’s. So the part about design inference must come from somewhere else.

gpuccio seems to be making one of three possible arguments;

Possibility #1 That there is some mathematical theorem that proves that ordinary evolutionary processes cannot result in an adaptation that has 500 bits of Functional Information.

Use of such a theorem was attempted by William Dembski, his Law of Conservation of Complex Specified Information, explained in Dembski’s book No Free Lunch: Why Specified Complexity Cannot Be Purchased without Intelligence (2001). But Dembski’s LCCSI theorem did not do what Dembski needed it to do. I have explained why in my own article on Dembski’s arguments (here). Dembski’s LCCSI changed the specification before and after evolutionary processes, and so he was comparing apples to oranges.

In any case, as far as I can see gpuccio has not attempted to derive gpuccio’s argument from Dembski’s, and gpuccio has not directly invoked the LCCSI, or provided a theorem to replace it.  gpuccio said in a response to a comment of mine at TSZ,

Look, I will not enter the specifics of your criticism to Dembski. I agre with Dembski in most things, but not in all, and my arguments are however more focused on empirical science and in particular biology.

While thus disclaiming that the argument is Dembski’s, on the other hand gpuccio does associate the argument with Dembski here by saying that

Of course, Dembski, Abel, Durston and many others are the absolute references for any discussion about functional information. I think and hope that my ideas are absolutely derived from theirs. My only purpose is to detail some aspects of the problem.

and by saying elsewhere that

No generation of more than 500 bits has ever been observed to arise in a non design system (as you know, this is the fundamental idea in ID).

That figure being Dembski’s, this leaves it unclear whether gpuccio is or is not basing the argument on Dembski’s. But gpuccio does not directly invoke the LCCSI, or try to come up with some mathematical theorem that replaces it.

So possibility #1 can be safely ruled out.

Possibility #2. That the target region in the computation of Functional Information consists of all of the sequences that have nonzero function, while all other sequences have zero function. As there is no function elsewhere, natural selection for this function then cannot favor sequences closer and closer to the target region.

Such cases are possible, and usually gpuccio is talking about cases like this. But gpuccio does not require them in order to have Functional Information. gpuccio does not rule out that the region could be defined by a high level of function, with lower levels of function in sequences outside of the region, so that there could be paths allowing evolution to reach the target region of sequences.

An example in which gpuccio recognizes that lower levels of function can exist outside the target region is found here, where gpuccio is discussing natural and artificial selection:

Then you can ask: why have I spent a lot of time discussing how NS (and AS) can in some cases add some functional information to a sequence (see my posts #284, #285 and #287)

There is a very good reason for that, IMO.

I am arguing that:

1) It is possible for NS to add some functional information to a sequence, in a few very specific cases, but:

2) Those cases are extremely rare exceptions, with very specific features, and:

3) If we understand well what are the feature that allow, in those exceptional cases, those limited “successes” of NS, we can easily demonstrate that:

4) Because of those same features that allow the intervention of NS, those scenarios can never, never be steps to complex functional information.

Jack Szostak defined functional information by having us define a cutoff level of function to define a set of sequences that had function greater than that, without any condition that the other sequences had zero function. Neither did Durston. And as we’ve seen gpuccio associates his argument with theirs.

So this second possibility could not be the source of gpuccio’s general assertion about 500 bits of functional information being a reliable indicator of design, however much gpuccio concentrates on such cases.

Possibility #3. That there is an additional condition in gpuccio’s Functional Information, one that does not allow us to declare it to be present if there is a way for evolutionary processes to achieve that high a level of function. In short, if we see 500 bits of Szostak’s functional information, and if it can be put into the genome by natural evolutionary processes such as natural selection then for that reason we declare that it is not really Functional Information. If gpuccio is doing this, then gpuccio’s Functional Information is really a very different animal than Szostak’s functional information.

Is gpuccio doing that? gpuccio does associate his argument with William Dembski’s, at least in some of his statements.  And William Dembski has defined his Complex Specified Information in this way, adding the condition that it is not really CSI unless it is sufficiently improbable that it be achieved by natural evolutionary forces (see my discussion of this here in the section on “Dembski’s revised CSI argument” that refer to Dembski’s statements here). And Dembski’s added condition renders use of his CSI a useless afterthought to the design inference.

gpuccio does seem to be making a similar condition. Dembski’s added condition comes in via the calculation of the “probability” of each genotype. In Szostak’s definition, the probabilities of sequences are simply their frequencies among all possible sequences, with each being counted equally. In Dembski’s CSI calculation, we are instead supposed to compute the probability of the sequence given all evolutionary processes, including natural selection.

gpuccio has a similar condition in the requirements for concluding that complex
functional information is present:  We can see it at step (6) here:

If our conclusion is yes, we must still do one thing. We observe carefully the object and what we know of the system, and we ask if there is any known and credible algorithmic explanation of the sequence in that system. Usually, that is easily done by excluding regularity, which is easily done for functional specification. However, as in the particular case of functional proteins a special algorithm has been proposed, neo darwininism, which is intended to explain non regular functional sequences by a mix of chance and regularity, for this special case we must show that such an explanation is not credible, and that it is not supported by facts. That is a part which I have not yet discussed in detail here. The necessity part of the algorithm (NS) is not analyzed by dFSCI alone, but by other approaches and considerations. dFSCI is essential to evaluate the random part of the algorithm (RV). However, the short conclusion is that neo darwinism is not a known and credible algorithm which can explain the origin of even one protein superfamily. It is neither known nor credible. And I am not aware of any other algorithm ever proposed to explain (without design) the origin of functional, non regular sequences.

In other words, you, the user of the concept, are on your own. You have to rule out that natural selection (and other evolutionary processes) could reach the target sequences. And once you have ruled it out, you have no real need for the declaration that complex functional information is present.

I have gone on long enough. I conclude that the rule that observation of 500 bits of functional information is present allows us to conclude in favor of Design (or at any rate, to rule out normal evolutionary processes as the source of the adaptation) is simply nonexistent. Or if it does exist, it is as a useless add-on to an argument that draws that conclusion for some other reason, leaving the really hard work to the user.

Let’s end by asking gpuccio some questions:
1. Is your “functional information” the same as Szostak’s?
2. Or does it add the requirement that there be no function in sequences that
are outside of the target set?
3. Does it also require us to compute the probability that the sequence arises as a result of normal evolutionary processes?

1,971 thoughts on “Does gpuccio’s argument that 500 bits of Functional Information implies Design work?

  1. Joe Felsenstein: I disagree with Mung about whether FI measures information, and I disagree with Hazen about whether it measures “complexity”.

    Won’t be the first time you’ve been wrong. 🙂

  2. Right or wrong about those, what about the fact that *no one* has presented any conservation law that justifies the use of 500 Bits of FI as a criterion for confidently concluding in favor of Design?

  3. Joe Felsenstein: An “ID theorist” might well do the right things when dealing with racetrack tips.

    What is the right thing to do when dealing with racetrack tips? How does a stranger whispering in your ear change the odds of any of the horses?

    Joe Felsenstein: We could be more and more specific about this particular hypothetical case, but let’s not waste time on the details. Except to note that, in such cases, anyone betting should pay close attention to the probabilities of the different horses winning and the probability of me giving you the name of the real winner.

    I was prepared to assume that your analogy was a bit off. But no, your whole thinking is off.

    Biology is not about probabilities, but about the nature of things. A statistician will of course never get this. As long as your entire world consists of probabilities, you will never get into the real issue, and it’s easy to sidetrack you with yet another declaration of probability, such as 500 bits.

    Anyway, back to your fascinating analogy. How does a stranger whispering in someone’s ear change the odds of the horses? Or were you not talking about the odds of the horses at all, but really about rigging the game, like making people bet on a particular horse, and that changes the how much a bet may win? And evolution is like that?

  4. Erik: Biology is not about probabilities, but about the nature of things. A statistician will of course never get this.

    You know that phrase not even wrong? For you, Erik, an entirely new phrase will have to be coined.

  5. Joe Felsenstein,

    But in any case the issue here is whether 500 bits of the quantity FI, whatever it is that it represents, is a reliable indicator of Design. It isn’t, and there is no argument that it is.

    The argument that you have agreed to is that function includes all naturally selectable paths. Those paths simply become part of the numerator of the equation. If the result is 500 bits or more then we can safely infer design.

  6. OMagain: You know that phrase not even wrong? For you, Erik, an entirely new phrase will have to be coined.

    Felsenstein’s analogy is not even wrong, insofar as biology matters. But if biology doesn’t matter, then of course I am very mistaken in assuming that it does.

  7. Erik: But if biology doesn’t matter, then of course I am very mistaken in assuming that it does.

    Then, what is the “real issue”?

  8. colewd: If the result is 500 bits or more then we can safely infer design.

    You keep declaring that. Your ability to repeat that same statement does not magically transform it into unassailable truth.

    No we can’t infer design, safely or otherwise. For reasons already explained like twenty times. You don’t know whether there is a selectable path to the function regardless of how many sequences we see. And as I explained in a previous post, there is no degree of lack of conservation that could ever realistically bring the FI of the system below 500 bits if the protein in question is of a sufficient length.

    Even for known cases where one biological difference evolved from another you would still calculate significanly over 500 bits. Like the example I gave with the lactate dehydrogenases evolving from the malate dehydrogenases. The total FI of all known LDH and MDH sequences are well in excess of 500 bits. It’s over 1400 bits actually. And one function (converstion of pyruvate into lactate) evovled from the other function (reduction of oxaloacetate to malate).

    No, the number of bits of FI is not a factor that allows you to infer design.

  9. Rumraket,

    Even for known cases where one biological difference evolved from another you would still calculate significanly over 500 bits. Like the example I gave with the lactate dehydrogenases evolving from the malate dehydrogenases. The total FI of all known LDH and MDH sequences are well in excess of 500 bits. It’s over 1400 bits actually. And one function (converstion of pyruvate into lactate) evovled from the other function (reduction of oxaloacetate to malate).

    No, the number of bits of FI is not a factor that allows you to infer design.

    And the origin of the 1400 bits that make up this family is?

    Like the example I gave with the lactate dehydrogenases evolving from the malate dehydrogenases.

    And the evidence it evolved is?

  10. colewd: And the origin of the 1400 bits that make up this family is?

    It doesn’t matter where the MDH family came from, the LDH’s evolved from them and the LDH’s exhibit well over 500 bits of FI. You are claiming that there is no selectable path to a function that exhibits 500 bits. Take function X, calculate the number of bits for molecules that have this function. If you calculate 500 bits or more, infer ID mustadunit.

    But that is demonstrably false, as the LDH function evolved by a selectable path from the MDH function.

    It is completely irrelevant where the MDH function itself came from, that’s just you moving the goalposts. We could be completely ignorant about that, that wouldn’t actually imply that it didn’t or couldn’t evolve. All that could ever reflect is the state of the breadth of our knowledge. It cannot ever become a statement about possibility or plausibility. Ignorance about how X evolved isn’t evidence for it being designed.

    You cannot derive the shape of the fitness landscape from the number of bits of FI you calculate. Look again at the picture I showed you earlier and then please answer the question it contains.

    And the evidence it evolved is?

    Read the paper. They used ancestor reconstruction to recreate inferred ancestral states and tested them for function in the laboratory. Using this method they were able to show that the function evolved by gene duplication of an ancient MDH enzyme almost 1 billion years ago, and further mutations in one of the duplicates brought the LDH function about to selectable levels through a promiscuous intermediate that could catalyze both reactions. This LDH-capable duplicate then further increased in specificity such that today it works exclusively on pyruvate.

    Key question for IDcreationists: If the inferred evolutionary history did not actually take place, why is it possible to recreate ancestral versions of the enzymes by using phylogenetic methods, and test them in the lab and find that they actually work? This is direct evidence that there is a selectable path between functions, and that this path was traversed in actual history. Otherwise there’d be no reason for an inferred ancestral state to yield a functional molecule. Even more obvious is that the older the inferred ancestral state, the more the MDH function is increased and the lower the LDH function. Why would that be the case?

  11. Rumraket,

    You are claiming that there is no selectable path to a function that exhibits 500 bits.

    This is not the claim. Gpuccio excludes protein families evolving from his analysis.

  12. colewd: This is not the claim. Gpuccio excludes protein families evolving from his analysis.

    Really? Where does he do this? Of course he doesn’t do this, as all proteins belong to some protein family. Bill, stop just making shit up please.

  13. Rumraket: No, the number of bits of FI is not a factor that allows you to infer design.

    You seem to be arguing that we have to rule out all possible alternative scenarios in order to infer design. I don’t find that to be a realistic expectation.

  14. Erik: How does a stranger whispering in your ear change the odds of any of the horses?

    Joe lives in a state where they have horse racing but I wonder if he has actually ever been to the races. They post the odds for each horse where anyone present can see them. 🙂

  15. Mung: You seem to be arguing that we have to rule out all possible alternative scenarios in order to infer design. I don’t find that to be a realistic expectation.

    I don’t know how you came to the conclusion that I think you must rule out all possible alternatives before you can infer design. I dont’ think that’s how this works at all. But when it comes to the so-called 500-bits-rule it seems to be based on the premise that it successfully accomplishes ruling out evolution, so I have tried to argue that it does not actually accomplish that.

    I don’t argue that case because I think design is some sort of last resort that can only be concluded when all other things are ruled out. I merely do it to point out what I see as an error in valid logical reasoning and an example of bad science.

    I think in order to infer design you must have a design hypothesis with explanatory power that makes testable predictions. As in it must predict and explain a set of data significantly better than any competing predictive hypothesis.

  16. Rumraket,

    Really? Where does he do this? Of course he doesn’t do this, as all proteins belong to some protein family. Bill, stop just making shit up please.

    You need to understand his methods. He clearly distinguishes 2000 protein superfamilies as his target.

    If you want we can ask him but the first beer when we meet is on the one who is incorrect here.

  17. Joe Felsenstein: …what about the fact that *no one* has presented any conservation law that justifies the use of 500 Bits of FI as a criterion for confidently concluding in favor of Design?

    I don’t worry about it too much because I don’t use it (500 bits of FI) as an argument for ID.

    I am however interested in why people think that evolutionary processes can create FI.

  18. colewd: You need to understand his methods.He clearly distinguishes 2000 protein superfamilies as his target.

    None of this makes any sense.

  19. Rumraket,

    Here is his discussion of superfamilies on his TSS op.

    To conclude, a few words about the issue of the sequence space not entirely traversed.

    We have 2000 protein superfamilies that are completely unrelated at sequence level. That is evidence that functional protein sequences are not bound to any particular region of the sequence space.

    Moreover, neutral variation in non coding and non functional sequences can go any direction, without any specific functional constraints. I suppose that neo-darwinists would recognize that parts of the genomes is non functional, wouldn’t they? And we have already seen elsewhere (in the ubiquitin thread discussion) that many new genes arise from non coding sequences.

    So, there is no reason to believe that the functional space has not been traversed. But, of course, neutral variation can traverse it only at very low resolution.

    IOWs, there is no reason that any specific part of the sequence space is hidden from RV. But of course, the low probabilistic resources of RV can only traverse different parts of the sequence space occasionally.

    It’s like having a few balls that can move freely on a plane, and occasionally fall into a hole. If the balls are really few and the plane is extremely big, the balls will be able to potentially traverse all the regions of the plane, but they will pass only through a very limited number of possible trajectories. That’s why finding a very small hole will be almost impossible, wherever it is. And there is no reason to believe that small functional holes are not scattered in the sequence space, as protein superfamilies clearly show.

    So, it’s not true that highly functional proteins are hidden in some unexplored tresure trove in the sequence space. They are there for anyone to find them, in different and distant parts of the sequence space, but it is almost impossible to find them through a random walk, because they are so small.

    And yet, 2000 highly functional superfamilies are there.

  20. Yeah none of that means he “excludes” protein superfamilies from his argument. Rather he is referring explicitly to superfamilies of proteins as entities to be explained. So no Bill, your rather odd excuse earlier doesn’t work at all.

  21. On another note, that is one confused mess of an argument Gpuccio is making there. It’s hard to even make sense of it. The entire space has been traversed, but only at “low resolution” (whatever that means?), so that means it can’t find the functions in the sequence space occupied by protein superfamilies, because those are tiny holes (Why? We aren’t told, in fact the whole idea of a superfamily implies huge amounts of variation so his reasoning here is just completely off).

  22. Rumraket,

    Yeah none of that means he “excludes” protein superfamilies from his argument. Rather he is referring explicitly to superfamilies oft proteins as entities to be explained. So no Bill, your rather odd excuse earlier doesn’t work at all.

    If a gene is duplicated with 500bits of FI how much FI do you think has been generated? My point is in the case of superfamilies you are not starting from 0 bits FI after a gene duplication. I was not able to pull all the info to your satisfaction but I believe this is his method.

    I

  23. Rumraket,

    On another note, that is one confused mess of an argument Gpuccio is making there. It’s hard to even make sense of it. The entire space has been traversed, but only at “low resolution” (whatever that means?), so that means it can’t find the functions in the sequence space occupied by protein superfamilies, because those are tiny holes (Why? We aren’t told, in fact the whole idea of a superfamily implies huge amounts of variation so his reasoning here is just completely off).

    If I can make one AA substitution across the sequence that is the entire space traversed at low resolution. If the AA’s are preserved I observe this through DNA substitutions.

    His reasoning is fine. The size of the hole is determined by the sequence preservation. If these sequences are preserved then we are witnessing low variation over time but the family is utilizing different sequences in its design. This is very clear evidence that the sequences were not generated by mutation as in one case we see variation and the other preservation.

    There is no explanation why on the one hand we see a super family with very different sequences and yet mutation stops for specific family members in different species except for very high FI as the result of a designed sequence.

  24. colewd: If a gene is duplicated with 500bits of FI how much FI do you think has been generated?

    Actually if I were to calculate the FI for both sequences combined it would basically double (we’d get >2800 bits for two 330 AA sequences). Roughly speaking double the length of the sequence, double the FI. According to the method of Hazen & Szostak it doesn’t matter whether the sequence constitutes a long series of repeats.

    But look at what you are basically arguing here. If a functional protein exists that exhibits >500 bits of FI, then it doesn’t matter, according to the implications of the question you are posing, how many other functions could subsequently evolve from this protein, the FI would remain the same. That is basically what you are implying with your question. Which is absolutely insane, because that would imply that every function found for any protein found anywhere in life could evolve from a single ancestral protein, and you’d insist no new FI evolved because the ancestor already exhibited 500 bits of FI. I could scarcely imagine a more obvious case of a person flailing to come up with ridiculous excuses for why evolution producing FI mysteriously doesn’t count.

    But the duplicate doesn’t start out with the function in question so we can’t calculate the FI for a function that has yet to evolve. That function evolves subsequently. That is when the FI for that function evolves.

    FI is calculated with respect to a particular function E(x). You set a minimum threshold for it E(min) and then calculate the FI for all sequences that meet or exceed that threshold. The function in question for the LDH enzymes is conversion of pyruvate into lactate. You obviously don’t calculate FI for a set of proteins that don’t even have that function (such as a duplicate of an MDH enzyme that doesn’t have >E(min) LDH function), that wouldn’t make sense. Whether the protein can do another function is irrelevant to how FI(Ex) is calculated.

    Look, either you are using the method of Hazen & Szostak or you are not. If you want to suggest we need to include multiple functions when calculating FI you’re going to have to come up with some way of doing that, because Hazen & Szostak’s method doesn’t allow for that.

    At best you could calculate the FI for each function seperately, (so you would have E(LDH) and E(MDH) and a number for each) but then it would still be the case that >1400 bits of FI for the LDH function could evolve, in addition to the already existing ~1400 bits of FI for the MDH function.

  25. Bill how do you know which of these “hills” your sequences sit on? And how do you know how far out from the “peak” of that hill they extend?

  26. colewd: as the result of a designed sequence.

    Why does that follow? Can you fill in the bit in the middle there?

  27. colewd: There is no explanation why on the one hand we see a super family with very different sequences and yet mutation stops for specific family members in different species except for very high FI as the result of a designed sequence.

    Yes there is an explanation for that. It’s called a local optimum (or local minimum in the “holes” metaphor). Purifying selection keeps them constrained to the collection of sequences that are “best” as the other sequences immediately around them have lower fitness.

  28. Rumraket,

    But look at what you are basically arguing here. If a functional protein exists that exhibits >500 bits of FI, then it doesn’t matter, according to the implications of the question you are posing, how many other functions could subsequently evolve from this protein, the FI would remain the same. That is basically what you are implying with your question. Which is absolutely insane, because that would imply that every function found for any protein found anywhere in life could evolve from a single ancestral protein, and you’d insist no new FI evolved because the ancestor already exhibited 500 bits of FI. I could scarcely imagine a more obvious case of a person flailing to come up with ridiculous excuses for why evolution producing FI mysteriously doesn’t count.

    If I duplicate a one page paper I have not increased the information content I have simply duplicated the same information. If I change the title by dropping 2 words in the second copy I have simply modified the information content and lost 2 words of information or perhaps 30 bits. Can you point to me where Stostak talks about information change with a duplication?

  29. Rumraket,

    Yes there is an explanation for that. It’s called a local optimum (or local minimum in the “holes” metaphor). Purifying selection keeps them constrained to the collection of sequences that are “best” as the other sequences immediately around them have lower fitness.

    How did the gene families form in the wake of all this purifying selection? Why are most all the family of genes getting stuck at a local optimum at very different sequences? Do you see a trend here 🙂

  30. Rumraket,

    Bill how do you know which of these “hills” your sequences sit on? And how do you know how far out from the “peak” of that hill they extend?

    They are on multiple hills as there is more then one function and due to the observation of preservation they are at the top.

  31. OMagain,

    Why does that follow? Can you fill in the bit in the middle there?

    The evolutionists explanation is that they are on a big island at a local optima. The problem with this is the existence of protein super families with very diverse sequences also at a local optima. If I compare multiple WNT family ligands which are organ specific the family will have around 30% sequence similarity where the same WNT in different species will have 80 to 90 percent similarity with similarity not necessarily correlated with the time since the split. If the first WNT was on the island how did it move up to the peak in one instance and then down the island travel across the universe in all different directions one at a time to 10 more islands, and yes, right to the peak? Oh yea, I forget, co-evolution 🙂

  32. colewd:
    Rumraket,

    If I duplicate a one page paper I have not increased the information content I have simply duplicated the same information.

    We are not just analyzing gene duplications Bill, we are looking at the divergence of the genes after duplication has happened.

    If a duplication of this sequence with function A:
    AAGTCTCAT
    can evolve by mutation into this sequence and gain function B:
    GAGGCTCGT
    So that now we have both sequences with function A, and function B:
    AAGTCTCATGAGGCTCGT
    Then it would obviously be ridiculous to claim that no information has been added.

    Yet that is what you are doing. You seem to focus entirely on the first step, the duplication, and completely ignore the subsequent divergence and emergence of the new function in the mutating duplicate gene.

    First there was the MDH enzyme, this godt duplicated so there are now two of the same MDH gene. You want to say the mere duplication of the MDH enzyme is not new information. Okay, fine.

    But then one of the copies changed. A mutation happened in one of them, and that mutated copy could now do something else in addition to the function it already had (it could reduce pyruvate into lactate, it was a lactate dehydrogenase (LDH) in addition to being a malate dehydrogenase (MDH).
    This mutation was beneficial, so it was retained by natural selection. Among further mutations that happened were some that increased the LDH activity (and reduced the MDH activity) of the mutated duplicate. Eventually the duplicate had become fully specialized as a LDH enzyme and had completely lost the MDH activity (the other copy retained the MDH ability thoughout). The organism that carries these two genes now has one gene that is fully specialized towards MDH, and one that is fully specialized towards LDH. It did not have this to begin with (it only had the MDH gene), and these genes are not identical copies of each other.

    There is no sensible way that you can claim that no new information has been added to the system. Yet that is what you are doing.

    Taken to an extreme, we could imagine that all proteins across the diversity of life ultimately evolved by many consecutive duplications and divergence from some ancestral protein, and gradually become completely dissimilar sequences with different functions, and you would claim that no new information has been added simply because that one of the steps in this process involves copying a sequence so there are two identical copies of them. Then you proceed to completely ignore and discount the subsequent mutational and functional divergence of the duplicates.

    And based on this complete dismissal you conclude no FI has evolved. That’s ridiculous. You are obviously just being irrational in your deeply motivated denial that evolution can produce >500 bits of FI.

  33. Rumraket: Bill Cole: If I duplicate a one page paper I have not increased the information content I have simply duplicated the same information.

    We are not just analyzing gene duplications Bill, we are looking at the divergence of the genes after duplication has happened.

    And of course we are happily ignoring the fact that gene duplication results in a potential doubling of the amount of gene product, which will definitely increase function.

    In addition, copy number variations and tandem repeats are responsible for a plethora of interesting novel functions (for bad or for good). Yes Bill, in biology, duplications often have functional consequences.

  34. colewd:
    Rumraket,

    How did the gene families form in the wake of all this purifying selection?

    You are stuck in black and white thinking again. You seem to think purifying selection means ALL mutations are ALWAYS removed.

    There can be purifying selection while there is also divergence. It comes in degrees, and having multiple copies of a gene can relax the selective pressure on some of them as slightly deleterious mutations are tolerated due to dosage compensation (meaning if a mutation slightly decreases the activity of the gene, this small loss of activity can be compensated for by having more of such genes expressed). This can allow mutations to accumulate in some of the copies, until eventually a new function emerges in these mutating duplications. If this new function is beneficial, selection can further drive these already mutated copies up a new hill in the landscape. Now purifying selection is retaining duplicates on both hills (though still not perfectly, which is why we still see some divergence).

    That is one way among several that can happen. And yes there is actual evidence of this. See for example this paper: Reconstruction of Ancestral Metabolic Enzymes Reveals Molecular Mechanisms Underlying Evolutionary Innovation through Gene Duplication

    Why are most all the family of genes getting stuck at a local optimum at very different sequences?

    What else would they be? It would be silly to think there is one sequence that can do all functions simultaneously.

    Do you see a trend here?

    Yes, gene duplications diverge over time, find new functions, and if they’re beneficial they’re retained in varying degrees by purifying selection at local optima on hills that are near-enough each other in sequence space.

  35. colewd: Rumraket: Bill how do you know which of these “hills” your sequences sit on? And how do you know how far out from the “peak” of that hill they extend?

    colewd: They are on multiple hills as there is more then one function and due to the observation of preservation they are at the top.

    But then how can you claim to know that there is no slope leading up to the top that selection could climb from a sequence further down?

  36. colewd: They are on multiple hills as there is more then one function and due to the observation of preservation they are at the top.

    Nope, for every specified function you assume one (=1) hill. After all, it is the sequence decided upon by the good Designer to perform a specific unique function, remember? And because you deny that the protein sequence can tolerate any mutation without losing that specified function we can infer it is of type A, with a teeny-weeny base (and summit).

    Are you happy with that?

  37. Rumraket: Actually if I were to calculate the FI for both sequences combined it would basically double (we’d get >2800 bits for two 330 AA sequences). Roughly speaking double the length of the sequence, double the FI.

    No, he is treating them as two separate sequences of the same length.

    But the duplicate doesn’t start out with the function in question so we can’t calculate the FI for a function that has yet to evolve.

    Wrong.

    FI is calculated with respect to a particular function E(x). You set a minimum threshold for it E(min) and then calculate the FI for all sequences that meet or exceed that threshold.

    So it has nothing to do with whether or not the function has evolved yet.

  38. According to this paper FI is a measure of information.

    A new measure of information — functional information — is required to account for all possible sequences that could potentially carry out an equivalent biochemical function, independent of the structure or mechanism used.

  39. Mung: Wrong.

    So it has nothing to do with whether or not the function has evolved yet.

    If there are no sequences that have the function (meet the minimum threshold), whatever you set the threshold to, you end up with -log2(0/20^L) = infinity.

    So the highest possible amount of FI is when the function is absent. Yeah, I’m going to assume that means we aren’t supposed to calculate FI for absent functions. Call me crazy.

  40. Rumraket: If there are no sequences that have the function (meet the minimum threshold), whatever you set the threshold to, you end up with -log2(0/20^L) = infinity.

    So the highest possible amount of FI is when the function is absent. Yeah, I’m going to assume that means we aren’t supposed to calculate FI for absent functions. Call me crazy.

    I guess in that case, FI is undefined, not infinite? I mean it doesn’t make sense to pick a function not present at all in any of the sequences of the ensemble

  41. Rumraket: If there are no sequences that have the function (meet the minimum threshold), whatever you set the threshold to, you end up with -log2(0/20^L) = infinity.

    Therefore design. Q.E.D.

  42. Rumraket: Explain why you think it is none.

    If the two sequences are the same, and have the same function, and have the same degree of function, then for purposes of calculating FI they are the same sequence.

  43. Corneel,

    In addition, copy number variations and tandem repeats are responsible for a plethora of interesting novel functions (for bad or for good). Yes Bill, in biology, duplications often have functional consequences.

    This is not the argument but carry on 🙂

  44. Rumraket,

    You are stuck in black and white thinking again. You seem to think purifying selection means ALL mutations are ALWAYS removed.

    I find when I stick to the back and white at night I then get to enjoy your creativity in the morning.:-)

    So we have WNT evolving new tissue related variants through gene duplications. Now with the duplication of WNT we also need Fizzled (WNT receptor) to co-mutate (made up) with the WNT plus all the other proteins that bind with Fizzled.

    The big assumption here is that sequence space and functional space are almost the same. Yet specific binding is important also these proteins are finding sequences that are intolerant of mutation. The need for creativity is on the rise.

  45. Corneel,

    Nope, for every specified function you assume one (=1) hill. After all, it is the sequence decided upon by the good Designer to perform a specific unique function, remember? And because you deny that the protein sequence can tolerate any mutation without losing that specified function we can infer it is of type A, with a teeny-weeny base (and summit).

    Are you happy with that?

    How do you envision a protein with 6 different functions climbing up a single hill?

  46. Mung: If the two sequences are the same, and have the same function, and have the same degree of function, then for purposes of calculating FI they are the same sequence.

    But the combination of the two being present in a genome affects the degree of function beyond the degree of function of only one. We’d have to calculate FI for the two of them combined. Where one could fail to meet the minimum threshold, two of them combined could.

Leave a Reply