Does gpuccio’s argument that 500 bits of Functional Information implies Design work?

On Uncommon Descent, poster gpuccio has been discussing “functional information”. Most of gpuccio’s argument is a conventional “islands of function” argument. Not being very knowledgeable about biochemistry, I’ll happily leave that argument to others.

But I have been intrigued by gpuccio’s use of Functional Information, in particular gpuccio’s assertion that if we observe 500 bits of it, that this is a reliable indicator of Design, as here, about at the 11th sentence of point (a):

… the idea is that if we observe any object that exhibits complex functional information (for example, more than 500 bits of functional information ) for an explicitly defined function (whatever it is) we can safely infer design.

I wonder how this general method works. As far as I can see, it doesn’t work. There would be seem to be three possible ways of arguing for it, and in the end; two don’t work and one is just plain silly. Which of these is the basis for gpuccio’s statement? Let’s investigate …

A quick summary

Let me list the three ways, briefly.

(1) The first is the argument using William Dembski’s (2002) Law of Conservation of Complex Specified Information. I have argued (2007) that this is formulated in such a way as to compare apples to oranges, and thus is not able to reject normal evolutionary processes as explanations for the “complex” functional information.  In any case, I see little sign that gpuccio is using the LCCSI.

(2) The second is the argument that the functional information indicates that only an extremely small fraction of genotypes have the desired function, and the rest are all alike in totally lacking any of this function.  This would prevent natural selection from following any path of increasing fitness to the function, and the rareness of the genotypes that have nonzero function would prevent mutational processes from finding them. This is, as far as I can tell, gpuccio’s islands-of-function argument. If such cases can be found, then explaining them by natural evolutionary processes would indeed be difficult. That is gpuccio’s main argument, and I leave it to others to argue with its application in the cases where gpuccio uses it. I am concerned here, not with the islands-of-function argument itself, but with whether the design inference from 500 bits of functional information is generally valid.

We are asking here whether, in general, observation of more than 500 bits of functional information is “a reliable indicator of design”. And gpuccio’s definition of functional information is not confined to cases of islands of function, but also includes cases where there would be a path to along which function increases. In such cases, seeing 500 bits of functional information, we cannot conclude from this that it is extremely unlikely to have arisen by normal evolutionary processes. So the general rule that gpuccio gives fails, as it is not reliable.

(3) The third possibility is an additional condition that is added to the design inference. It simply declares that unless the set of genotypes is effectively unreachable by normal evolutionary processes, we don’t call the pattern “complex functional information”. It does not simply define “complex functional information” as a case where we can define a level of function that makes probability of the set less than 2^{-500}.  That additional condition allows us to safely conclude that normal evolutionary forces can be dismissed — by definition. But it leaves the reader to do the heavy lifting, as the reader has to determine that the set of genotypes has an extremely low probability of being reached. And once they have done that, they will find that the additional step of concluding that the genotypes have “complex functional information” adds nothing to our knowledge. CFI becomes a useless add-on that sounds deep and mysterious but actually tells you nothing except what you already know. So CFI becomes useless. And there seems to be some indication that gpuccio does use this additional condition.

Let us go over these three possibilities in some detail. First, what is the connection of gpuccio’s “functional information” to Jack Szostak’s quantity of the same name?

Is gpuccio’s Functional Information the same as Szostak’s Functional Information?

gpuccio acknowledges that gpuccio’s definition of Functional Information is closely connected to Jack Szostak’s definition of it. gpuccio notes here:

Please, not[e] the definition of functional information as:

“the fraction of all possible configurations of the system that possess a degree of function >=

which is identical to my definition, in particular my definition of functional information as the
upper tail of the observed function, that was so much criticized by DNA_Jock.

(I have corrected gpuccio’s typo of “not” to “note”, JF)

We shall see later that there may be some ways in which gpuccio’s definition
is modified from Szostak’s. Jack Szostak and his co-authors never attempted any use of his definition to infer Design. Nor did Leslie Orgel, whose Specified Information (in his 1973 book The Origins of Life) preceded Szostak’s. So the part about design inference must come from somewhere else.

gpuccio seems to be making one of three possible arguments;

Possibility #1 That there is some mathematical theorem that proves that ordinary evolutionary processes cannot result in an adaptation that has 500 bits of Functional Information.

Use of such a theorem was attempted by William Dembski, his Law of Conservation of Complex Specified Information, explained in Dembski’s book No Free Lunch: Why Specified Complexity Cannot Be Purchased without Intelligence (2001). But Dembski’s LCCSI theorem did not do what Dembski needed it to do. I have explained why in my own article on Dembski’s arguments (here). Dembski’s LCCSI changed the specification before and after evolutionary processes, and so he was comparing apples to oranges.

In any case, as far as I can see gpuccio has not attempted to derive gpuccio’s argument from Dembski’s, and gpuccio has not directly invoked the LCCSI, or provided a theorem to replace it.  gpuccio said in a response to a comment of mine at TSZ,

Look, I will not enter the specifics of your criticism to Dembski. I agre with Dembski in most things, but not in all, and my arguments are however more focused on empirical science and in particular biology.

While thus disclaiming that the argument is Dembski’s, on the other hand gpuccio does associate the argument with Dembski here by saying that

Of course, Dembski, Abel, Durston and many others are the absolute references for any discussion about functional information. I think and hope that my ideas are absolutely derived from theirs. My only purpose is to detail some aspects of the problem.

and by saying elsewhere that

No generation of more than 500 bits has ever been observed to arise in a non design system (as you know, this is the fundamental idea in ID).

That figure being Dembski’s, this leaves it unclear whether gpuccio is or is not basing the argument on Dembski’s. But gpuccio does not directly invoke the LCCSI, or try to come up with some mathematical theorem that replaces it.

So possibility #1 can be safely ruled out.

Possibility #2. That the target region in the computation of Functional Information consists of all of the sequences that have nonzero function, while all other sequences have zero function. As there is no function elsewhere, natural selection for this function then cannot favor sequences closer and closer to the target region.

Such cases are possible, and usually gpuccio is talking about cases like this. But gpuccio does not require them in order to have Functional Information. gpuccio does not rule out that the region could be defined by a high level of function, with lower levels of function in sequences outside of the region, so that there could be paths allowing evolution to reach the target region of sequences.

An example in which gpuccio recognizes that lower levels of function can exist outside the target region is found here, where gpuccio is discussing natural and artificial selection:

Then you can ask: why have I spent a lot of time discussing how NS (and AS) can in some cases add some functional information to a sequence (see my posts #284, #285 and #287)

There is a very good reason for that, IMO.

I am arguing that:

1) It is possible for NS to add some functional information to a sequence, in a few very specific cases, but:

2) Those cases are extremely rare exceptions, with very specific features, and:

3) If we understand well what are the feature that allow, in those exceptional cases, those limited “successes” of NS, we can easily demonstrate that:

4) Because of those same features that allow the intervention of NS, those scenarios can never, never be steps to complex functional information.

Jack Szostak defined functional information by having us define a cutoff level of function to define a set of sequences that had function greater than that, without any condition that the other sequences had zero function. Neither did Durston. And as we’ve seen gpuccio associates his argument with theirs.

So this second possibility could not be the source of gpuccio’s general assertion about 500 bits of functional information being a reliable indicator of design, however much gpuccio concentrates on such cases.

Possibility #3. That there is an additional condition in gpuccio’s Functional Information, one that does not allow us to declare it to be present if there is a way for evolutionary processes to achieve that high a level of function. In short, if we see 500 bits of Szostak’s functional information, and if it can be put into the genome by natural evolutionary processes such as natural selection then for that reason we declare that it is not really Functional Information. If gpuccio is doing this, then gpuccio’s Functional Information is really a very different animal than Szostak’s functional information.

Is gpuccio doing that? gpuccio does associate his argument with William Dembski’s, at least in some of his statements.  And William Dembski has defined his Complex Specified Information in this way, adding the condition that it is not really CSI unless it is sufficiently improbable that it be achieved by natural evolutionary forces (see my discussion of this here in the section on “Dembski’s revised CSI argument” that refer to Dembski’s statements here). And Dembski’s added condition renders use of his CSI a useless afterthought to the design inference.

gpuccio does seem to be making a similar condition. Dembski’s added condition comes in via the calculation of the “probability” of each genotype. In Szostak’s definition, the probabilities of sequences are simply their frequencies among all possible sequences, with each being counted equally. In Dembski’s CSI calculation, we are instead supposed to compute the probability of the sequence given all evolutionary processes, including natural selection.

gpuccio has a similar condition in the requirements for concluding that complex
functional information is present:  We can see it at step (6) here:

If our conclusion is yes, we must still do one thing. We observe carefully the object and what we know of the system, and we ask if there is any known and credible algorithmic explanation of the sequence in that system. Usually, that is easily done by excluding regularity, which is easily done for functional specification. However, as in the particular case of functional proteins a special algorithm has been proposed, neo darwininism, which is intended to explain non regular functional sequences by a mix of chance and regularity, for this special case we must show that such an explanation is not credible, and that it is not supported by facts. That is a part which I have not yet discussed in detail here. The necessity part of the algorithm (NS) is not analyzed by dFSCI alone, but by other approaches and considerations. dFSCI is essential to evaluate the random part of the algorithm (RV). However, the short conclusion is that neo darwinism is not a known and credible algorithm which can explain the origin of even one protein superfamily. It is neither known nor credible. And I am not aware of any other algorithm ever proposed to explain (without design) the origin of functional, non regular sequences.

In other words, you, the user of the concept, are on your own. You have to rule out that natural selection (and other evolutionary processes) could reach the target sequences. And once you have ruled it out, you have no real need for the declaration that complex functional information is present.

I have gone on long enough. I conclude that the rule that observation of 500 bits of functional information is present allows us to conclude in favor of Design (or at any rate, to rule out normal evolutionary processes as the source of the adaptation) is simply nonexistent. Or if it does exist, it is as a useless add-on to an argument that draws that conclusion for some other reason, leaving the really hard work to the user.

Let’s end by asking gpuccio some questions:
1. Is your “functional information” the same as Szostak’s?
2. Or does it add the requirement that there be no function in sequences that
are outside of the target set?
3. Does it also require us to compute the probability that the sequence arises as a result of normal evolutionary processes?

907 thoughts on “Does gpuccio’s argument that 500 bits of Functional Information implies Design work?

  1. Rumraket: What are we then even trying to explain?

    Good question.

    …then how are we to even make sense of the question “where did the information come from” that creationists like to ask?

    To make sense of that question we’d have to ask what they mean by “information” in that context. It’s probably not “functional information” as defined by Szostak et al.

    As to where FI comes from, I think I explained that already. It comes from the mind of man.

  2. Mung:

    DNA_Jock: All other sequences (in this sequence space) have lower FI.

    What you ought to say is that all other sequences (in this sequence space) have lower degree of function.

    You all still don’t understand the plane intersecting the cone at all.

    What you wrote here makes me think that you, Mung, do not understand FI, and do not understand the plane intersecting the cone.
    My statement and your “what you ought to say” rewriting of it are logically identical.
    For a given function and sequence space, FI increases monotonically with increasing degree of function.
    If for element X the numerator is 1, that means all other elements have a lower degree of function, and a lower FI.

    Mung: If the FI were a property of each individual sequence then you would not need to know the amounts of function of all possible sequences.

    It disappoints me that you can write something this dumb after I had explained

    DNA_Jock: What that level is depends on the activity level of the entire set, same as if we were talking about what growth percentile your kid is in.

    Mung Jr’s FI = -logbase2(1- Mung Jr’s percentile)
    Mung Jr is in the “90th percentile w.r.t. height”. Mung Jr’s height percentile is a property of Mung Jr, yet it does depend on the height of all other humans. Do you get it now?

    Mung: He is saying that no other sequences have the same degree of function.

    NO. He is saying that no other sequences have the same or greater degree of function

  3. Mung: To make sense of that question we’d have to ask what they mean by “information” in that context. It’s probably not “functional information” as defined by Szostak et al.

    As to where FI comes from, I think I explained that already. It comes from the mind of man.

    There’s this thing called Functional Information. Whether it “really is” information, whether it is “new information”, whether it originates from the process of natural selection, or was already lying around somewhere out there

    are completely irrelevant to this discussion.

    The point is that whatever FI is or is not, wherever it originates, whether it is “new”, it has been asserted by gpuccio and by colewd that seeing 500 bits of FI in the genome is a reliable indicator of ID. The question is whether that rule works.

    At this point the issue has become whether any FI can end up in the genome after a process of natural selection.

    The answers are No (the 500 bits rule does not work) and Yes, some FI can end up in the genome after natural selection.

  4. Joe Felsenstein,

    The answers are No (the 500 bits rule does not work) and Yes, some FI can end up in the genome after natural selection.

    Were almost 1000 comments in and these are still the unsupported assertions you started with.

  5. colewd:
    Joe Felsenstein,

    Were almost 1000 comments in and these are still the unsupported assertions you started with.

    No. The shoe is on the other foot. You and gpuccio stated a general rule that 500 bits of FI is impossible by ordinary evolutionary processes. Asked why, you provide no reason that this is generally true, but demand that we disprove it instead.

    Yawn …

  6. Yes, it is ironic that the 500 bits rule is based on our ignorance about sequence space. The assumption made, completely unjustified, is that the sequences we see in extant life are the only ones possible and that no slope exists to the current level of function. The degree of conservation of the sequence is actually completely irrelevant. Even if ATP synthase was only conserved at 1% across the diversity of life, it would not be possible to meet the threshold where the system exhibits less than 500 bits of FI.

    Suppose we knew about 10^4 different functional variants of ATP synthase subunit beta.

    According to Gpuccio, that would mean ATP synthase subunit beta exhibits:
    -log2(10^4 / 20^500) ≈ 2144 bits.

    Then suppose we were to experimentally determine that there was a slope for selection to climb towards ATP synthase function for the beta subunit, and this would massively increase the number of functional sequences. Let’s say we experimentally map a large portion of a slope all the way down from nonfunction and up to a local optimum where extant ATP synthase sequences are found. We find that there are 10^50 possible ATP synthase subunit beta sequences on this slope. An incredible number.

    So Gpuccio reduces his threshold so that now 10^50 sequences meet the minimum desired function for ATP synthase subunit beta, and recalculates FI:

    -log2(10^50 / 20^500) ≈ 1994 bits.

    So even if ATP synthase subunit beta could evolve from nonfunction, and there were 10^50 possible ATP synthase subunit beta sequences, the 500 bits rule says it can’t evolve because even with 10^50 possible ATP synthase sequences the system still exhibits 1994 bits.

    That raises the question, how many would there have to be before the system exhibits less than 500 bits of FI?

    There’d have to be 10^501 (ten to the five hundred and first power) possible ATP synthase subunit beta sequences for the system to exhibit less than 500 bits of FI before Gpuccio would declare it evolvable.

    That is absolutely ridiculous. It doesn’t even matter how conserved the sequence is. The fact that it is ~500 amino acids in length is what makes it exhibit inordinate amounts of FI. The single most contributing factor to FI quantity is sequence length. The degree of conservation across life’s diversity or history could never add up to reducing the amount of FI exhibited by the system below 500 bits if we are dealing with a protein with a length of 500 amino acids.

    Adding more sequences that meet the threshold has a neglible impact on FI, compared to adding more sequence length. This is important, because even if we were to massively increase the number of known possible functional variants of a sequence, we could never hope to meet the 500 bits threshold.

    For ten sequences of L=500:
    -log2(10/(20^500)) ≈ 2157 bits.

    Let’s increase the threshold by 1000:
    -log2(10^4/(20^500)) ≈ 2147 bits.

    Let’s increase the threshold by 1000 again:
    -log2(10×10^6/(20^500)) ≈ 2137 bits.

    So going from 10 sequences to 10 million sequences reduces the number of bits exhibited by the system by a mere 20.

    Let’s add 50 to the sequence length instead:
    -log2(10/(20^550)) ≈ 2373 bits.
    So we basically added 200 bits to the system with that 50 aa insertion.

    Or we can do another thought-experiment. Suppose we experimentally demonstrated the evolution of ATP synthase from an arbitrary GTP synthase, so we do the FI calculation and include all known ATP synthase and GTP synthase sequences (a few tens of thousands), plus a large ensemble of experimentally derived intermediate sequences that exhibit some degree of both ATP and GTP synthesis. As in there is a selectable route between them. In the experiment we determine there are about twenty billion possible intermediate sequences, in addition to the known ATP and GTP synthase sequences. So in total we factor in that twenty billion and thirty thousand sequences meet the threshold.

    -log2(20.00003×10^9 / 20^500) ≈ 2126 bits.

    So even if we were to experimentally demonstrate a path for selection to traverse from another function, with twenty billion different possible intermediate sequences, Gpuccio would still declare it well above being evolvable. But how does Gpuccio know that such a transition isn’t actually possible right now? He just assumes it isn’t and he wants it proven to be possible, otherwise he’s just going to assume the 500 bits rule stands.

    The 500 bits rule is outright question begging nonsense. It’s basically just a blind declaration that “X can’t evolve, prove me wrong!” couched in sciency-sounding technobabble. And it’s based on a method for calculating FI that is skewed massively towards inflating FI for a longer sequence, while the number of possible sequences that meet some arbitrary threshold of function has no hope of ever bringing the FI of the system down such that Gpuccio could agree it was evolvable.

    Again, it’s just question-begging. It assumes what it is supposed to prove.

  7. colewd: Were almost 1000 comments in and these are still the unsupported assertions you started with.

    Let me help with what’s next. Now that anybody who had the vaguest interest in this has said their piece interest will wane. Due to the nature of the beast, this waning interest will be taken as a sign that gpuccios claims are correct as “nobody can defeat them”.

    Neither you nor gpuccio will learn from this extended discussion, apparently because you don’t actually really understand what’s being talked about. This is clear from the asymmetry in the comments from you colewd (short, simple) responding to the long, complex explanations of complex things that are being spoon fed to you. And most responses are ignored by you anyway.

    So continue to believe that those assertions are unsupported. The point is if that were really a true reflection of the situation you’d be planning your next grant application and Joe would be wondering where it all went so very wrong.

Leave a Reply