Does gpuccio’s argument that 500 bits of Functional Information implies Design work?

On Uncommon Descent, poster gpuccio has been discussing “functional information”. Most of gpuccio’s argument is a conventional “islands of function” argument. Not being very knowledgeable about biochemistry, I’ll happily leave that argument to others.

But I have been intrigued by gpuccio’s use of Functional Information, in particular gpuccio’s assertion that if we observe 500 bits of it, that this is a reliable indicator of Design, as here, about at the 11th sentence of point (a):

… the idea is that if we observe any object that exhibits complex functional information (for example, more than 500 bits of functional information ) for an explicitly defined function (whatever it is) we can safely infer design.

I wonder how this general method works. As far as I can see, it doesn’t work. There would be seem to be three possible ways of arguing for it, and in the end; two don’t work and one is just plain silly. Which of these is the basis for gpuccio’s statement? Let’s investigate …

A quick summary

Let me list the three ways, briefly.

(1) The first is the argument using William Dembski’s (2002) Law of Conservation of Complex Specified Information. I have argued (2007) that this is formulated in such a way as to compare apples to oranges, and thus is not able to reject normal evolutionary processes as explanations for the “complex” functional information.  In any case, I see little sign that gpuccio is using the LCCSI.

(2) The second is the argument that the functional information indicates that only an extremely small fraction of genotypes have the desired function, and the rest are all alike in totally lacking any of this function.  This would prevent natural selection from following any path of increasing fitness to the function, and the rareness of the genotypes that have nonzero function would prevent mutational processes from finding them. This is, as far as I can tell, gpuccio’s islands-of-function argument. If such cases can be found, then explaining them by natural evolutionary processes would indeed be difficult. That is gpuccio’s main argument, and I leave it to others to argue with its application in the cases where gpuccio uses it. I am concerned here, not with the islands-of-function argument itself, but with whether the design inference from 500 bits of functional information is generally valid.

We are asking here whether, in general, observation of more than 500 bits of functional information is “a reliable indicator of design”. And gpuccio’s definition of functional information is not confined to cases of islands of function, but also includes cases where there would be a path to along which function increases. In such cases, seeing 500 bits of functional information, we cannot conclude from this that it is extremely unlikely to have arisen by normal evolutionary processes. So the general rule that gpuccio gives fails, as it is not reliable.

(3) The third possibility is an additional condition that is added to the design inference. It simply declares that unless the set of genotypes is effectively unreachable by normal evolutionary processes, we don’t call the pattern “complex functional information”. It does not simply define “complex functional information” as a case where we can define a level of function that makes probability of the set less than 2^{-500}.  That additional condition allows us to safely conclude that normal evolutionary forces can be dismissed — by definition. But it leaves the reader to do the heavy lifting, as the reader has to determine that the set of genotypes has an extremely low probability of being reached. And once they have done that, they will find that the additional step of concluding that the genotypes have “complex functional information” adds nothing to our knowledge. CFI becomes a useless add-on that sounds deep and mysterious but actually tells you nothing except what you already know. So CFI becomes useless. And there seems to be some indication that gpuccio does use this additional condition.

Let us go over these three possibilities in some detail. First, what is the connection of gpuccio’s “functional information” to Jack Szostak’s quantity of the same name?

Is gpuccio’s Functional Information the same as Szostak’s Functional Information?

gpuccio acknowledges that gpuccio’s definition of Functional Information is closely connected to Jack Szostak’s definition of it. gpuccio notes here:

Please, not[e] the definition of functional information as:

“the fraction of all possible configurations of the system that possess a degree of function >=
Ex.”

which is identical to my definition, in particular my definition of functional information as the
upper tail of the observed function, that was so much criticized by DNA_Jock.

(I have corrected gpuccio’s typo of “not” to “note”, JF)

We shall see later that there may be some ways in which gpuccio’s definition
is modified from Szostak’s. Jack Szostak and his co-authors never attempted any use of his definition to infer Design. Nor did Leslie Orgel, whose Specified Information (in his 1973 book The Origins of Life) preceded Szostak’s. So the part about design inference must come from somewhere else.

gpuccio seems to be making one of three possible arguments;

Possibility #1 That there is some mathematical theorem that proves that ordinary evolutionary processes cannot result in an adaptation that has 500 bits of Functional Information.

Use of such a theorem was attempted by William Dembski, his Law of Conservation of Complex Specified Information, explained in Dembski’s book No Free Lunch: Why Specified Complexity Cannot Be Purchased without Intelligence (2001). But Dembski’s LCCSI theorem did not do what Dembski needed it to do. I have explained why in my own article on Dembski’s arguments (here). Dembski’s LCCSI changed the specification before and after evolutionary processes, and so he was comparing apples to oranges.

In any case, as far as I can see gpuccio has not attempted to derive gpuccio’s argument from Dembski’s, and gpuccio has not directly invoked the LCCSI, or provided a theorem to replace it.  gpuccio said in a response to a comment of mine at TSZ,

Look, I will not enter the specifics of your criticism to Dembski. I agre with Dembski in most things, but not in all, and my arguments are however more focused on empirical science and in particular biology.

While thus disclaiming that the argument is Dembski’s, on the other hand gpuccio does associate the argument with Dembski here by saying that

Of course, Dembski, Abel, Durston and many others are the absolute references for any discussion about functional information. I think and hope that my ideas are absolutely derived from theirs. My only purpose is to detail some aspects of the problem.

and by saying elsewhere that

No generation of more than 500 bits has ever been observed to arise in a non design system (as you know, this is the fundamental idea in ID).

That figure being Dembski’s, this leaves it unclear whether gpuccio is or is not basing the argument on Dembski’s. But gpuccio does not directly invoke the LCCSI, or try to come up with some mathematical theorem that replaces it.

So possibility #1 can be safely ruled out.

Possibility #2. That the target region in the computation of Functional Information consists of all of the sequences that have nonzero function, while all other sequences have zero function. As there is no function elsewhere, natural selection for this function then cannot favor sequences closer and closer to the target region.

Such cases are possible, and usually gpuccio is talking about cases like this. But gpuccio does not require them in order to have Functional Information. gpuccio does not rule out that the region could be defined by a high level of function, with lower levels of function in sequences outside of the region, so that there could be paths allowing evolution to reach the target region of sequences.

An example in which gpuccio recognizes that lower levels of function can exist outside the target region is found here, where gpuccio is discussing natural and artificial selection:

Then you can ask: why have I spent a lot of time discussing how NS (and AS) can in some cases add some functional information to a sequence (see my posts #284, #285 and #287)

There is a very good reason for that, IMO.

I am arguing that:

1) It is possible for NS to add some functional information to a sequence, in a few very specific cases, but:

2) Those cases are extremely rare exceptions, with very specific features, and:

3) If we understand well what are the feature that allow, in those exceptional cases, those limited “successes” of NS, we can easily demonstrate that:

4) Because of those same features that allow the intervention of NS, those scenarios can never, never be steps to complex functional information.

Jack Szostak defined functional information by having us define a cutoff level of function to define a set of sequences that had function greater than that, without any condition that the other sequences had zero function. Neither did Durston. And as we’ve seen gpuccio associates his argument with theirs.

So this second possibility could not be the source of gpuccio’s general assertion about 500 bits of functional information being a reliable indicator of design, however much gpuccio concentrates on such cases.

Possibility #3. That there is an additional condition in gpuccio’s Functional Information, one that does not allow us to declare it to be present if there is a way for evolutionary processes to achieve that high a level of function. In short, if we see 500 bits of Szostak’s functional information, and if it can be put into the genome by natural evolutionary processes such as natural selection then for that reason we declare that it is not really Functional Information. If gpuccio is doing this, then gpuccio’s Functional Information is really a very different animal than Szostak’s functional information.

Is gpuccio doing that? gpuccio does associate his argument with William Dembski’s, at least in some of his statements.  And William Dembski has defined his Complex Specified Information in this way, adding the condition that it is not really CSI unless it is sufficiently improbable that it be achieved by natural evolutionary forces (see my discussion of this here in the section on “Dembski’s revised CSI argument” that refer to Dembski’s statements here). And Dembski’s added condition renders use of his CSI a useless afterthought to the design inference.

gpuccio does seem to be making a similar condition. Dembski’s added condition comes in via the calculation of the “probability” of each genotype. In Szostak’s definition, the probabilities of sequences are simply their frequencies among all possible sequences, with each being counted equally. In Dembski’s CSI calculation, we are instead supposed to compute the probability of the sequence given all evolutionary processes, including natural selection.

gpuccio has a similar condition in the requirements for concluding that complex
functional information is present:  We can see it at step (6) here:

If our conclusion is yes, we must still do one thing. We observe carefully the object and what we know of the system, and we ask if there is any known and credible algorithmic explanation of the sequence in that system. Usually, that is easily done by excluding regularity, which is easily done for functional specification. However, as in the particular case of functional proteins a special algorithm has been proposed, neo darwininism, which is intended to explain non regular functional sequences by a mix of chance and regularity, for this special case we must show that such an explanation is not credible, and that it is not supported by facts. That is a part which I have not yet discussed in detail here. The necessity part of the algorithm (NS) is not analyzed by dFSCI alone, but by other approaches and considerations. dFSCI is essential to evaluate the random part of the algorithm (RV). However, the short conclusion is that neo darwinism is not a known and credible algorithm which can explain the origin of even one protein superfamily. It is neither known nor credible. And I am not aware of any other algorithm ever proposed to explain (without design) the origin of functional, non regular sequences.

In other words, you, the user of the concept, are on your own. You have to rule out that natural selection (and other evolutionary processes) could reach the target sequences. And once you have ruled it out, you have no real need for the declaration that complex functional information is present.

I have gone on long enough. I conclude that the rule that observation of 500 bits of functional information is present allows us to conclude in favor of Design (or at any rate, to rule out normal evolutionary processes as the source of the adaptation) is simply nonexistent. Or if it does exist, it is as a useless add-on to an argument that draws that conclusion for some other reason, leaving the really hard work to the user.

Let’s end by asking gpuccio some questions:
1. Is your “functional information” the same as Szostak’s?
2. Or does it add the requirement that there be no function in sequences that
are outside of the target set?
3. Does it also require us to compute the probability that the sequence arises as a result of normal evolutionary processes?

1,971 thoughts on “Does gpuccio’s argument that 500 bits of Functional Information implies Design work?

  1. DNA_Jock,

    You did not answer my question, to wit, are you okay with this calculation? I am not trying to strawman Cole-Puccio math. I believe that this is an accurate reflection of your calculations. If it is not, you need to explain, very precisely, just why it is not.

    This is your strawman. Gpuccio is comparing proteins with the same function. The same function is a critical point. Your addicted to sharp shooting:-)

  2. Joe Felsenstein,

    The 500-bit figure is supposed to rule out mutation alone as capable of building that much FI into the genome. I agree, it is a conservative figure for that.

    I think you have conceded the argument here. By definition if there is a selectable path to the sequence (less then 500 bits apart) it has to be less than 500 bits of FI.

  3. colewd: I think you have conceded the argument here. By definition if there is a selectable path to the sequence (less then 500 bits apart) it has to be less than 500 bits of FI.

    gpuccio defines things that way, as gpuccio does say that if there is such a path then we don’t define it as 500 bits of FI.

    But gpuccio also says that gpuccio’s definition is the same as Szostak’s, and of course there is no such lack-of-path condition in Szostak and/or Hazen’s definitions.

    So we end up with
    (i) gpuccio having no mathematical proof that natural selection cannot achieve 500 bits of FI,
    (ii) gpuccio and you both defining 500 bits of FI as having been achieved, but only in the case where natural selection cannot do the job.

    So it is just like William Dembski’s latest-and-greatest argument for why Complex Specified Information cannot be achieved by natural selection — namely, he only defines it as being CSI if it cannot be achieved by natural selection.

    And in both cases that leaves it up to us to figure out whether natural selection can do the job. The user does all the heavy lifting, and then the CSI or the 500-Bits-of-FI is just a useless label they paste on later. Adding that label does nothing, as the objective was to prove that NS can’t bring about the adaptation, and that is already achieved when you get ready to paste on the label.

    In short, there is no 500-Bits-Rule that is of any use to anyone except to dazzle naïve readers with sciency phrases.

  4. colewd: I think we can rule it out as a viable mechanism.

    Why?

    You are struggling to make your point with the simplest life forms.

    What are you even blathering about here?

  5. colewd: No. Loss of information is quite easy to explain. Gain is another challenge.

    Any loss is in principle reversible. If an A-to-G mutation causes a loss of information, then the reverse G-to-A must be a gain of an amount equal to the lost.

    What’s the problem?

  6. Joe Felsenstein,

    But gpuccio also says that gpuccio’s definition is the same as Szostak’s, and of course there is no such lack-of-path condition in Szostak and/or Hazen’s definitions.

    This is where you are mistaken. Szostak references his ATP experiment in his definition. It either binds ATP or it does not. There is no selectable path unless that function exists somewhere in sequence space.

    A selectable path means that there is function and must be included in the 500 bit calculation.

    You’re only argument is that 500 bits of FI don’t exist in living organisms. This is a different OP.

    And in both cases that leaves it up to us to figure out whether natural selection can do the job.

    Natural selection can only do the job if the function is pervasive in sequence space. What we are observing in cases like the beta chain of the subunit F1 of ATP synthase is AA sequence preservation over deep time as measured by the prokaryotic eukaryotic split. What is the selectable path you envision here?

    In short, there is no 500-Bits-Rule that is of any use to anyone except to dazzle naïve readers with sciency phrases.

    The 500 bit rule allows for a large margin of safety for the design inference. Again, your challenge is to argue that 500 bits of FI don’t exist in living organisms.

    In short, there is no 500-Bits-Rule that is of any use to anyone except to dazzle naïve readers with sciency phrases.

    You already accepted that mutations alone cannot generate this. That is a significant concession given natural selection generating protein sequences is a theoretical construct at this point.

  7. Rumraket,

    Any loss is in principle reversible. If an A-to-G mutation causes a loss of information, then the reverse G-to-A must be a gain of an amount equal to the lost.

    What’s the problem?

    All this assumes the pre existence of enough information for a living organism to self replicate. Without this these mutations are meaningless.

  8. colewd: Without an enormous amount of selectable sequences natural selection is not a viable explanation for the current observed FI in nature.

    Of course it is, you have no idea what those sequences evolved from.

    How far you can go in X steps with N mutations depends on where you start. Obviously.

    In one of Gpuccio’s appplications of his nonsensical method he relies on deliberately excluding a huge collection of sequences with other functions adopting similar folds. For example, he ignores all P-loop kinases that aren’t classified ATPase/ATP synthase to make the “target” appear smaller and more conserved than it actually is. But those other sequences is evidence that ATPase/ATP synthase could have evolved by natural selection from proteins performing other functions.
    That in fact it is much less conserved than he thinks, but when particular mutations happen it changes the function of the protein. To pick something that is obviously related, there is a host of similar enzymes called GTPase/GTP synthase. They even function in the same way by pumping protons or other ions driving GTP synthesis or hydrolysis. The GTP synthases/GTPases is just one branch of the P-loop kinase superfamily besides ATPase/ATP synthase proteins.

  9. colewd:

    Joe Felsenstein,

    But gpuccio also says that gpuccio’s definition is the same as Szostak’s, and of course there is no such lack-of-path condition in Szostak and/or Hazen’s definitions.

    This is where you are mistaken. Szostak references his ATP experiment in his definition. It either binds ATP or it does not. There is no selectable path unless that function exists somewhere in sequence space.

    No, you are totally mistaken. In his 2003 paper Szostak’s definition does not say that this applies or does not apply to ATP synthase. Here is his definition from the 2003 paper:

    Imagine a pile of DNA, RNA or protein molecules of all possible sequences, sorted by activity with the most active at the top. A horizontal plane through the pile indicates a given level of activity; as this rises, fewer sequences remain above it. The functional information required to specify that activity is −log2 of the fraction of sequences above the plane.

    Show me what part of those sentences “reference” his ATP synthase experiments. Or make any reference to whether or not paths to that level of function exist.

  10. colewd: All this assumes the pre existence of enough information for a living organism to self replicate. Without this these mutations are meaningless.

    No shit Sherlock? And here I thought we were talking about mutating feldspar.

    Your claim that explaining loss of information is easy, assumes the pre-existence of information-that-can-be-lost.

    You implied there was some sort of issue with new information evolving by natural selection, but that loss of information is easy. How is your statement here in any way a response?

  11. Joe Felsenstein,

    Show me what part of those sentences “reference” his ATP synthase experiments. Or make any reference to whether or not paths to that level of function exist.

    He does not reference ATP synthase he references his experiment of binding ATP where the measured frequency is 10^-11 per 70AA. So there is no selection here the function is binary. It binds or does not bind. In this case there is no hill to climb only a binary result.

    as this rises, fewer sequences remain above it.

    This part of his statement is assumptive.
    Removing his evolutionist assumption we have.

    Imagine a pile of DNA, RNA or protein molecules of all possible sequences, sorted by activity with the most active at the top. A horizontal plane through the pile indicates a given level of activity; The functional information required to specify that activity is −log2 of the fraction of sequences above the plane.

    And now Gpuccio’s method is very consistent with the definition. His method of calculating the ratio between sequences below the plane from the sequences above the plain is identifying those that can afford AA substitutions over deep time and eliminating them from the calculation thus reducing the number of bits calculated.

    If the calculation is above 500 bits he infers design.

  12. colewd: So there is no selection here the function is binary. It binds or does not bind. In this case there is no hill to climb only a binary result.

    This is not true, binding comes in varying degrees of strength, and they deliberately introduced rounds of high mutation to increase it.

    What do you think “selection round” refers to in this figure?

  13. Rumraket,

    You implied there was some sort of issue with new information evolving by natural selection, but that loss of information is easy. How is your statement here in any way a response?

    Unless your theory can support the origin of genetic information then it cannot be the default explanation.

    The mechanisms you require to support your hypothesis were not always their given the simple to complex model.

    What supports my statement is the quantity of deleterious mutations vs beneficial ones.

  14. Rumraket,

    This is not true, binding comes in varying degrees of strength, and they deliberately introduced rounds of high mutation to increase it.

    How is binding strength measured?

  15. colewd: This part of his statement is assumptive.

    Gpuccio actively relies on the truth of that assumption. If he didn’t, his method could not possibly make sense even to himself.

    And now Gpuccio’s method is very consistent with the definition. His method of calculating the ratio between sequences below the plane from the sequences above the plain is identifying those that can afford AA substitutions over deep time and eliminating them from the calculation thus reducing the number of bits calculated.

    Bill, the larger the difference in proportion between the sequences above and below the plane, the higher the number of bits. So it isn’t reducing the number of bits, it’s increasing the number of bits.

    And Gpuccio is ignoring those sequences that perform other functions (such as GTP synthase and many others) besides ATP synthesis/hydrolysis, thus basically ruling out that natural selection could have changed one into the other. But he’s done this a priori. So it isn’t his method that lead him to conclude that 500 bits of FI means natural selection can’t do it, he’s simply begun by declaring as invalid the evidence that implies it could.

    And let’s not forget that Entropy and DNA_Jock have previously pointed out that Gpuccio ignores homologous but divergent mitochondrial ATP synthases. In other words, the level of conservation he decides on, even were he to insist we can only look at ATP synthases as a function (thus excluding homologous proteins like GTP synthase), is still arbitrarily set by him and he is actually excluding sequences known to have ATP synthase functions.

  16. colewd: Unless your theory can support the origin of genetic information then it cannot be the default explanation.

    And it can. Genetic information can originate by a combination of mutation and natural selection. Like when the random protein in Hayashi et al improved the infectivity of coliphage M13 from a level equal to a deletion-mutant, to a seventeen-thousand-fold increase, over a mere 20 generations.

    The mechanisms you require to support your hypothesis were not always their given the simple to complex model.

    Are you blathering about the origin of life? We don’t know how life originated. We do know what makes it possible for life that already exists to evolve, and how that can result in the emergence of new genetic information: Mutation and drift+selection.

    What supports my statement is the quantity of deleterious mutations vs beneficial ones.

    What is that quantity? By the way that is an interesting statement, as you seem to directly imply that what determines the quantity of information in a given biological system, is it’s effect on fitness. That because deleterious mutations are more likely than beneficial ones, it must be more likely for mutations to destroy rather than create information.

    Deleterious mutations reduce fitness and therefore reduce genetic information, and beneficial mutations increase fitness and therefore increase genetic informatioin, right? That must be what you mean to imply, otherwise you bringing up the proportion of deleterious to beneficial mutations makes no sense.

    But then you must also accept that any fitness increase in an organism constitutes an increase in genetic information. Do you have a mathematical theory that relates reproductive success to genetic information? I’m sure there are many here who’d be interested in such a theory.

    If natural selection can’t prevent deleterious mutations, nor fix beneficial ones because they’re too unlikely (as you seem to imply), then why has fitness continually increased in the Long Term Evolution Experiment?

    Why was it possible for Keefe and Szostak to improve the ATP binding capacity of a protein with random mutations and selection, if those mutations are supposed to be too unlikely compared to deleterious ones?

    Why was it possible for coliphage M13 to evolve a 17000-fold increase in infectivity in Hayashi et al? Shouldn’t the deleterious mutations have overwhelmed the beneficial ones?

  17. colewd,

    Your statements about what Szostak does in a particular case, and what gpuccio does to infer design, are utterly irrelevant to the question of how FI is defined by Szostak.

    I gave his statement of the definition of FI. You were utterly unable to show how all these other issues came into that definition.

    In short, Szostak’s definition includes nothing about whether natural selection can or cannot bring about this FI. Period. End. Finished.

  18. Joe Felsenstein,

    In short, Szostak’s definition includes nothing about whether natural selection can or cannot bring about this FI. Period. End. Finished.

    So if natural selection is really a factor why doesn’t Jack mention it?

  19. Rumraket,

    Bill, the larger the difference in proportion between the sequences above and below the plane, the higher the number of bits. So it isn’t reducing the number of bits, it’s increasing the number of bits.

    The less AA substitutions the larger the difference. Again, more AA substitutions decrease the bit size calculation.

    Gpuccio actively relies on the truth of that assumption. If he didn’t, his method could not possibly make sense even to himself.

    Why do you think this is true.

    And Gpuccio is ignoring those sequences that perform other functions (such as GTP synthase and many others) besides ATP synthesis/hydrolysis, thus basically ruling out that natural selection could have changed one into the other. But he’s done this a priori. So it isn’t his method that lead him to conclude that 500 bits of FI means natural selection can’t do it, he’s simply begun by declaring as invalid the evidence that implies it could.

    He is working from the definition of functional information. If the functions are different so is the information. Apples to Oranges comparisons accomplish nothing.

    Deleterious mutations reduce fitness and therefore reduce genetic information, and beneficial mutations increase fitness and therefore increase genetic informatioin, right?

    There is not one to one correlation.

    And let’s not forget that Entropy and DNA_Jock have previously pointed out that Gpuccio ignores homologous but divergent mitochondrial ATP synthases.

    He deliberately picks sequences with the same function. Thats his method and his argument. Entropy and Jock’s attempt at a strawman is duly noted. I have you guys trying to change gpuccio’s methodology and Joe getting upset if we divert one iota from Szostak’s definition.

    If natural selection can’t prevent deleterious mutations, nor fix beneficial ones because they’re too unlikely (as you seem to imply), then why has fitness continually increased in the Long Term Evolution Experiment?

    This is a very good question. How are you defining an increase in fitness in this case.

    Why was it possible for coliphage M13 to evolve a 17000-fold increase in infectivity in Hayashi et al? Shouldn’t the deleterious mutations have overwhelmed the beneficial ones?

    Another good question. It probably depends on the starting point. How in real living organisms are we seeing the opposite effect of what we are seeing in the experiments?

  20. Joe Felsenstein,

    He doesn’t mention any evolutionary forces, or any theological forces. Does he have to?

    Not if the available selectable function is part of the calculation.

  21. colewd:
    Joe Felsenstein,

    Not if the available selectable function is part of the calculation.

    Well, I am sorry to hear that you are unhappy with Szostak for not putting anything about selection into his definition of FI. But he might ignore your desire to have that in the definition.

    What he did was try to use Orgel’s idea to make a more sensible definition of how much information was involved in narrowing down the set of sequences to a set that had at least X amount of function.

    And that does not, at that point, speak to the question of how or whether natural selection could get the evolving system there. It isn’t intended to do that. That comes later.

  22. colewd: Rumraket: Deleterious mutations reduce fitness and therefore reduce genetic information, and beneficial mutations increase fitness and therefore increase genetic informatioin, right?

    You: There is not one to one correlation.

    Are you telling us that deleterious mutations can increase the amount of genetic (functional?) information? That is quite the admission. I am really curious what you believe to be the relationship between fitness and functional information. Is it generally positive, or not? If it is, natural selection becomes a prime candidate for increasing the amount of FI in a population. If it isn’t, then you shouldn’t be using “beneficial mutations are rare” as an argument that mutation cannot introduce new variants with an increased function.

  23. colewd: He [gpuccio] deliberately picks sequences with the same function. Thats his method and his argument. Entropy and Jock’s attempt at a strawman is duly noted. I have you guys trying to change gpuccio’s methodology and Joe getting upset if we divert one iota from Szostak’s definition.

    You have been using your strawman argument a lot lately. Yet it is truly false that gpuccio is careful to pick sequences with the same function. Case in point; the TRIM62 protein. We have both seen the graphs that he uses to demonstrate his beloved “information jumps “. He makes those by comparing the human copies of a protein to their orthologs in other species. You can see that he also includes comparisons to orthologs in cnidarians (cnid).

    However, gpuccio has conceded that

    TRIM 62 as we oberve it in humans, a 475 AAs long protein with about 1000 bits of total potential functional information in BLAST comparisions, is not present in cnidaria [..]

    So gpuccio just included those proteins in his comparison because those were the ones that popped up in the BLAST session. It was in fact Rum and me that pointed out that the most likely function of those orthologs is E3 ubiquitin-protein ligase function, just like TRIM62. In reality, nobody really knows what the actual function of those proteins is, and we are all just guessing based on the presence of conserved protein domains.

    Now if I was feeling charitable, I could argue that given the actual function that gpuccio uses in his calculations (sequence similarity) any ortholog qualifies. But I am not feeling charitable today, because I am getting tired of this obfuscation game. So therefore I am calling bullshit on your claim that gpuccio has any consideration for the actual function of a protein, or actually cares about whether an ortholog fulfills the same function as its human counterpart.

  24. colewd:
    Rumraket: Bill, the larger the difference in proportion between the sequences above and below the plane, the higher the number of bits. So it isn’t reducing the number of bits, it’s increasing the number of bits.

    colewd: The less AA substitutions the larger the difference. Again, more AA substitutions decrease the bit size calculation.

    No that is incorrect, you can perform the calculation yourself.

    Let’s do an example. According to Hazen et al 2007:
    “Functional information is determined by identifying the fraction of all sequences that achieve a specified outcome.

    Consider, for example, sequences of 10 letters that have a high probability (Ex ≅ 1) of evoking a positive response from the fire department. Such sequences might include “FIREONMAIN,” “MAINSTFIRE,” or “MAPLENMAIN.” Additionally, some messages containing phonetic misspellings (FYRE or MANE), mistakes in grammar or usage (FIREOFMAIN), or typing errors (MAZLE or NAPLE) may also yield a significant but lower probability of response (0 ≪ Ex < 1). Given these variants, on the order of 1,000 combinations of 10 letters might initiate a rapid response to the approximate location of the fire. Thus
    I(1) ≈ -log2[1000/(26^10)] ≈ 36 bits."

    So let’s say there are 10,000,000,000,000,000,000 (ten Quintillion) protein sequences of length 100 AA that can perform a particular function:
    -log2(10×10^18/(20^100)) ≈ 369 bits of functional information.

    But what happens if we reduce the number of sequences to just 1000 that can perform the function?
    -log2(10^3/(20^100)) ≈ 422 bits of functional information.
    Looks like the number of bits increased.

    Let’s reduce the number to just 1 sequence:
    -log2(1/(20^100)) ≈ 432 bits of functional information.

    Clearly as we move the floor up, the number of bits goes up.

    Rumraket: Gpuccio actively relies on the truth of that assumption. If he didn’t, his method could not possibly make sense even to himself.

    colewd: Why do you think this is true.

    Because it’s the very thing that makes the “target” appear small and the number of bits increase. If this wasn’t so, Gpuccio would have no reason to set a floor for things like ATP synthase, or insist that wild-type levels of function is the only thing that counts for Hayashi et al 2007.

    Look Bill, you clearly don’t understand what is going on here.

    Rumraket: And Gpuccio is ignoring those sequences that perform other functions (such as GTP synthase and many others) besides ATP synthesis/hydrolysis, thus basically ruling out that natural selection could have changed one into the other. But he’s done this a priori. So it isn’t his method that lead him to conclude that 500 bits of FI means natural selection can’t do it, he’s simply begun by declaring as invalid the evidence that implies it could.

    Colewd: He is working from the definition of functional information. If the functions are different so is the information.

    But then you can’t rule out that natural selection could have created the information. You are basically conceding that if natural selection can enhance the function of a protein from being specific towards one substrate, towards another, then it can create new information. It’s just that Gpuccio’s method is incapable of ruling out such transitions as it simply does not consider them. It doesn’t matter whether he thinks the system exhibits 500 bits of FI or not when that tells us nothing about whether those 500 bits could have been brought about by, for example natural selection acting on an already existing protein by gradually turning it’s substrate specificity from one substrate to another. By for example changing a GTP synthase into an ATP synthase.

    Apples to Oranges comparisons accomplish nothing.

    If the applejuice making protein can be turned into an orangejuice making protein, and both proteins exhibit 500 bits of FI (and as you say, they’re different information) it surely matters for whether natural selection can make 500 bits of functional information.

    Rumraket: Deleterious mutations reduce fitness and therefore reduce genetic information, and beneficial mutations increase fitness and therefore increase genetic informatioin, right?

    colewd: There is not one to one correlation.

    Then what is the correlation? Beneficial mutations some times increase genetic information? How often? Deleterious mutations some times reduce genetic information? What are the numbers here?

    And let’s not forget that Entropy and DNA_Jock have previously pointed out that Gpuccio ignores homologous but divergent mitochondrial ATP synthases.

    He deliberately picks sequences with the same function. Thats his method and his argument.

    Yes, and as we have just seen above, one flaw of his method is that it simply does not consider cases where natural selection can enhance the function of a protein from one thing to another. So he doesn’t get to declare that just because he can set a floor such that the tip of the cone exhibits 500 bits of FI, then natural selection could not have created it. He simply can’t claim to know that that is the case. It becomes particularly obvious when we are aware of similar sequences with other functions that are just blithely ignored. How does Gpuccio know natural selection has not created any of these functions from the others?

    Entropy and Jock’s attempt at a strawman is duly noted. I have you guys trying to change gpuccio’s methodology and Joe getting upset if we divert one iota from Szostak’s definition.

    Yes, we want Gpuccio’s attempt to emply Szostak’s methodology to be valid and include cases it presently doesn’t even consider in Gpuccio’s usage. We want to improve it. The problem is that when the method is changed to include such cases, it leads to a different conclusion than then one Gpuccio and you want, as it then becomes completely obvious that either natural selection CAN create 500 bits of FI, or at least that it can’t just be ruled out simply by finding that system X exhibits 500 bits of FI.

    Rumraket: If natural selection can’t prevent deleterious mutations, nor fix beneficial ones because they’re too unlikely (as you seem to imply), then why has fitness continually increased in the Long Term Evolution Experiment?

    colewd: This is a very good question. How are you defining an increase in fitness in this case.

    Relative Fitness Data Through Generation 10,000.

    Rumraket: Why was it possible for coliphage M13 to evolve a 17000-fold increase in infectivity in Hayashi et al? Shouldn’t the deleterious mutations have overwhelmed the beneficial ones?

    colewd: Another good question. It probably depends on the starting point. How in real living organisms are we seeing the opposite effect of what we are seeing in the experiments?

    Opposite? How is it opposite? We see a higher degree of function from the protein in wild-type phage, which has existed for hundreds of millions of years, than evolved in a few weeks of a laboratory experiment? How is that “opposite”? How is that even a surprise?

    Maybe there was another starting point? Maybe it evolved from recombination of other proteins? Or from noncoding DNA?

    Maybe adding 200 million generations of mutation and natural selection on top of 20 has a larger effect than extrapolation from the n-k model of protein fitness landscapes implies?

  25. Corneel,

    Are you telling us that deleterious mutations can increase the amount of genetic (functional?) information?

    I am saying there is not one to one correlation.

    You can see that he also includes comparisons to orthologs in cnidarians (cnid).

    Because the line on the graph is slightly above zero? This claim seems like an odd reach for you.

    So therefore I am calling bullshit on your claim that gpuccio has any consideration for the actual function of a protein, or actually cares about whether an ortholog fulfills the same function as its human counterpart.

    I note your opinion here but I disagree with you. His methodology is all about comparing proteins with the same function.

  26. Rumraket,

    colewd:
    Rumraket: Bill, the larger the difference in proportion between the sequences above and below the plane, the higher the number of bits. So it isn’t reducing the number of bits, it’s increasing the number of bits.

    colewd: The less AA substitutions the larger the difference. Again, more AA substitutions decrease the bit size calculation.

    Lets try to sync up here first. As an extreme example gpuccio compares protein A bacteria to Protein A human which has identical sequences despite a split 1 billion years ago. In this case that bit size would be at a maximum value. The plane would be at the very top of the pyramid. As the comparison shows AA substitutions then the bit value drops. The plane is now moving down the pyramid.

  27. Joe Felsenstein,

    Well, I am sorry to hear that you are unhappy with Szostak for not putting anything about selection into his definition of FI. But he might ignore your desire to have that in the definition.

    Do you agree that a sequence needs to be functional to be selectable? If so then the selectable sequences are a subset or full set of the functional sequences as calculated by Szostak.

  28. Rumraket: You are basically conceding that if natural selection can enhance the function of a protein from being specific towards one substrate, towards another, then it can create new information.

    This just doesn’t follow at all. You’re equivocating over the term “information.” It’s an abuse of FI not endorsed by Hazen or Szostak.

  29. Rumraket: Any loss is in principle reversible. If an A-to-G mutation causes a loss of information, then the reverse G-to-A must be a gain of an amount equal to the lost.

    No, that’s clearly yet another loss of information.

  30. colewd: Me: Are you telling us that deleterious mutations can increase the amount of genetic (functional?) information?

    Bill: I am saying there is not one to one correlation.

    Oh-kay. It’s not yet crystal clear to me what you mean by that. Could you perhaps address the rest of my comment? That might clear things up.

    colewd: Me: You can see that he also includes comparisons to orthologs in cnidarians (cnid).

    Bill: Because the line on the graph is slightly above zero? This claim seems like an odd reach for you.

    Because there are data points labeled “cnid” in he graph. That’s short for cnidarian, you know. Read the relevant part of gpuccio’s ubiquitin OP; he makes no secret of this.

    colewd: I note your opinion here but I disagree with you. His methodology is all about comparing proteins with the same function.

    Yet neither you nor gpuccio can tell me what that function is that TRIM62 and its orthologs are supposed to be doing.

  31. Rumraket: … then why has fitness continually increased in the Long Term Evolution Experiment?

    Why was it possible for Keefe and Szostak to…

    Why was it possible for coliphage M13 to …

    God.

  32. Joe Felsenstein: In short, Szostak’s definition includes nothing about whether natural selection can or cannot bring about this FI. Period. End. Finished.

    Agree wholeheartedly with Joe. FI is simply proposed as a measure of system complexity.

    I think all talk of FI being generated, or created, or increased, or decreased, is nonsense. To speak in those terms is to use information in a different sense.

  33. colewd: If the functions are different so is the information.

    Actually, no. The functions could be different and the FI could be he same amount.

  34. colewd: Lets try to sync up here first. As an extreme example gpuccio compares protein A bacteria to Protein A human which has identical sequences despite a split 1 billion years ago. In this case that bit size would be at a maximum value. The plane would be at the very top of the pyramid. As the comparison shows AA substitutions then the bit value drops. The plane is now moving down the pyramid.

    This confused mess is what you get for trying to assimilate gpuccio’s warped logic, Bill. What guarantee do you have that this hypothetical strongly conserved protein has the highest possible degree of function, and that it is the only one capable of achieving this? You don’t, but you just assumed this was the case because it is preserved in humans. This is Texas Sharp Shooting taken to its most extreme.

    Indulge me, and please tell me which bit value you supposed was dropping as the sequence similarity decreased. Was it the one associated with the bacterial sequence, the human sequence or the ancestral sequence?

  35. Mung: I think all talk of FI being generated, or created, or increased, or decreased, is nonsense. To speak in those terms is to use information in a different sense.

    Dammit Mung. We read all that stuff about functional information for nothing?

  36. Corneel,

    This confused mess is what you get for trying to assimilate gpuccio’s warped logic, Bill. What guarantee do you have that this hypothetical strongly conserved protein has the highest possible degree of function, and that it is the only one capable of achieving this?

    So you don’t believe that conserved function indicates FI. Thats fine. This is science so guarantees are out the window. We also don’t know for sure that a very different sequence could not do the same thing.

    What you forget is that most proteins don’t act on their own but have an interdependent relationship with other proteins. Their degree of function is not independent. Your concept of increasing function by mutating a single protein is incoherent.

    I like his measure because it tests this tolerance over deep time.

    You don’t, but you just assumed this was the case because it is preserved in humans. This is Texas Sharp Shooting taken to its most extreme.

    Gpuccio has not assumed anything. He understands this is an
    indirect measure.

    The only texas sharp shooting going on is your strawman saying that gpuccio is assuming something.

  37. Mung: This just doesn’t follow at all. You’re equivocating over the term “information.” It’s an abuse of FI not endorsed by Hazen or Szostak.

    I’m not equivocating, I’m responding to a statement Bill Cole made. He’s clearly not, in the post I respond to, using Hazen and Szostak’s definition of functional information (FI). Mung, perhaps you should read the posts I respond to?

  38. Mung: No, that’s clearly yet another loss of information.

    So information cannot possibly be gained, good to know. Then it can’t be designed either. Actually I take it you were really just joking right?

  39. This thread proves that creotards can’t gain information. I can totally see a theorem coming out of this

  40. colewd:
    Joe Felsenstein,

    Do you agree that a sequence needs to be functional to be selectable?If so then the selectable sequences are a subset or full set of the functional sequences as calculated by Szostak.

    It can also be selected as a side effect of some other function than the one being used for the FI calculation.

    The point I am making is that Szostak’s and Hagen’s definition of FI is a measure of the improbability of that high a degree of function, when the distribution that we draw from counts all sequences equally.

    gpuccio (and presumably colewd too) add further conditions that rule out us calling it FI if it can be achieved by natural selection. So they are not using Szostak’s definition, but instead one that, among other things, calculates a totally useless quantity.

  41. Joe Felsenstein,

    gpuccio (and presumably colewd too) add further conditions that rule out us calling it FI if it can be achieved by natural selection. So they are not using Szostak’s definition, but instead one that, among other things, calculates a totally useless quantity.

    Reading gpuccio’s post on the TSS fallacy it appears he is fine with NS as part of FI.

    For optimizing a specific protein is it anything more then NS working through the available function in that specific sequence?

    By definition the sequence created by the mutation must be functional in order for it to improve the system.

    The question is does NS affect the probability of mutation finding functional sequence space already defined by Szostaks equation?

    It can also be selected as a side effect of some other function than the one being used for the FI calculation.

    I am not sure what this means.

  42. colewd: Reading gpuccio’s post on the TSS fallacy it appears he is fine with NS as part of FI.

    My reading is that when it is possible for NS to get to sequences in the target set by following a path of increasing function, gpuccio does not count those cases as complex functional information. Do you read gpuccio as counting those as complex functional information?

    I do think gpuccio is very unclear on this.

  43. Joe Felsenstein,

    My reading is that when it is possible for NS to get to sequences in the target set by following a path of increasing function, gpuccio does not count those cases as complex functional information. Do you read gpuccio as counting those as complex functional information.

    I just read parts of a couple of his ops and I think he agrees with you here. He would agree that NS can potentially migrate around an isle of function. When he returns to active commenting at UD I will verify this.

  44. colewd: So you don’t believe that conserved function indicates FI.

    Conserved by … purifying selection?!? So there is a general positive relationship between fitness and functional information? (I have a distinct feeling we’ve been through this).
    To answer your question: sure I do, but sequence space is a BIG place. Any observed protein is unlikely to be the global optimum.

    colewd: What you forget is that most proteins don’t act on their own but have an interdependent relationship with other proteins. Their degree of function is not independent. Your concept of increasing function by mutating a single protein is incoherent.

    Like in enzyme-substrate binding, you mean? Or protein-protein interactions? Do you supppose that sequence similarity to modern proteins is a poor proxy for substrate affinity when the substrate is itself subject to evolutionary change? Or may even be swapped with another substrate altogether? That is a wonderful insight, and I fully agree.

    So how many bits of functional information does TRIM62 really have, when we correct for this interdependence with its substrate(s)?

    colewd: Gpuccio has not assumed anything. He understands this is an
    indirect measure.

    I strongly doubt that gpuccio understands that, but I was talking about you actually. Why did you put the conserved protein at the top of the pyramid in your hypothetical example? The only reason I could think of is because you bought into gpuccio’s implicit claim that the way proteins appear in modern humans is the optimal way to bring about undefined function X.

    colewd: The only texas sharp shooting going on is your strawman saying that gpuccio is assuming something.

    Could you add ad hominem too please? I was going for a hattrick.

  45. colewd: What you forget is that most proteins don’t act on their own but have an interdependent relationship with other proteins. Their degree of function is not independent. Your concept of increasing function by mutating a single protein is incoherent.

    No there is nothing incoherent about that. That’s what happens when the antibodies from your immune system mutate to become specific towards antigens on foreign material. To pick just one example. No other proteins have to mutate, it is just the ones encoding antibodies. And they really are increasing in function towards a particular pathogen, even while also being able to retain their necessary ability to interact with other proteins.

    And in so far as mutations in a protein might by chance cause it to interfere with other cellular functions in a degree visible to natural selection, they are selected against. That doesn’t mean proteins can’t tolerate mutations just because they might be involved in more than one type of process or interaction. You are stuck in black and white thinking again. The fact that several factors might constrain protein evolution does not mean there is no protein evolution. You seem to think the truth must be either one of these two extremes. That we think proteins are free to change in any and all ways, which you contrast with your own view that almost nothing can change because everything is this extremely fine-tuned but ultra-rigid machine where the slightest change to anything causes the entire house of cards to collapse.

    But there’s a reality in between those two extremes, where in so far as there are more layers of constraint it really just slows down protein evolution as tolerable changes might become increasingly rare, but aren’t truly at zero. And that the interactions between molecules are fundamentally noisy, stochastic processes.

Leave a Reply