Does gpuccio’s argument that 500 bits of Functional Information implies Design work?

Posted on May 21, 2018 by Joe Felsenstein

On Uncommon Descent, poster gpuccio has been discussing “functional information”. Most of gpuccio’s argument is a conventional “islands of function” argument. Not being very knowledgeable about biochemistry, I’ll happily leave that argument to others.

But I have been intrigued by gpuccio’s use of Functional Information, in particular gpuccio’s assertion that if we observe 500 bits of it, that this is a reliable indicator of Design, as here, about at the 11th sentence of point (a):

… the idea is that if we observe any object that exhibits complex functional information (for example, more than 500 bits of functional information ) for an explicitly defined function (whatever it is) we can safely infer design.

I wonder how this general method works. As far as I can see, it doesn’t work. There would be seem to be three possible ways of arguing for it, and in the end; two don’t work and one is just plain silly. Which of these is the basis for gpuccio’s statement? Let’s investigate …

A quick summary

Let me list the three ways, briefly.

(1) The first is the argument using William Dembski’s (2002) Law of Conservation of Complex Specified Information. I have argued (2007) that this is formulated in such a way as to compare apples to oranges, and thus is not able to reject normal evolutionary processes as explanations for the “complex” functional information. In any case, I see little sign that gpuccio is using the LCCSI.

(2) The second is the argument that the functional information indicates that only an extremely small fraction of genotypes have the desired function, and the rest are all alike in totally lacking any of this function. This would prevent natural selection from following any path of increasing fitness to the function, and the rareness of the genotypes that have nonzero function would prevent mutational processes from finding them. This is, as far as I can tell, gpuccio’s islands-of-function argument. If such cases can be found, then explaining them by natural evolutionary processes would indeed be difficult. That is gpuccio’s main argument, and I leave it to others to argue with its application in the cases where gpuccio uses it. I am concerned here, not with the islands-of-function argument itself, but with whether the design inference from 500 bits of functional information is generally valid.

We are asking here whether, in general, observation of more than 500 bits of functional information is “a reliable indicator of design”. And gpuccio’s definition of functional information is not confined to cases of islands of function, but also includes cases where there would be a path to along which function increases. In such cases, seeing 500 bits of functional information, we cannot conclude from this that it is extremely unlikely to have arisen by normal evolutionary processes. So the general rule that gpuccio gives fails, as it is not reliable.

(3) The third possibility is an additional condition that is added to the design inference. It simply declares that unless the set of genotypes is effectively unreachable by normal evolutionary processes, we don’t call the pattern “complex functional information”. It does not simply define “complex functional information” as a case where we can define a level of function that makes probability of the set less than $2^{-500}$ . That additional condition allows us to safely conclude that normal evolutionary forces can be dismissed — by definition. But it leaves the reader to do the heavy lifting, as the reader has to determine that the set of genotypes has an extremely low probability of being reached. And once they have done that, they will find that the additional step of concluding that the genotypes have “complex functional information” adds nothing to our knowledge. CFI becomes a useless add-on that sounds deep and mysterious but actually tells you nothing except what you already know. So CFI becomes useless. And there seems to be some indication that gpuccio does use this additional condition.

Let us go over these three possibilities in some detail. First, what is the connection of gpuccio’s “functional information” to Jack Szostak’s quantity of the same name?

Is gpuccio’s Functional Information the same as Szostak’s Functional Information?

gpuccio acknowledges that gpuccio’s definition of Functional Information is closely connected to Jack Szostak’s definition of it. gpuccio notes here:

Please, not[e] the definition of functional information as:

“the fraction of all possible configurations of the system that possess a degree of function >=
Ex.”

which is identical to my definition, in particular my definition of functional information as the
upper tail of the observed function, that was so much criticized by DNA_Jock.

(I have corrected gpuccio’s typo of “not” to “note”, JF)

We shall see later that there may be some ways in which gpuccio’s definition
is modified from Szostak’s. Jack Szostak and his co-authors never attempted any use of his definition to infer Design. Nor did Leslie Orgel, whose Specified Information (in his 1973 book The Origins of Life) preceded Szostak’s. So the part about design inference must come from somewhere else.

gpuccio seems to be making one of three possible arguments;

Possibility #1 That there is some mathematical theorem that proves that ordinary evolutionary processes cannot result in an adaptation that has 500 bits of Functional Information.

Use of such a theorem was attempted by William Dembski, his Law of Conservation of Complex Specified Information, explained in Dembski’s book No Free Lunch: Why Specified Complexity Cannot Be Purchased without Intelligence (2001). But Dembski’s LCCSI theorem did not do what Dembski needed it to do. I have explained why in my own article on Dembski’s arguments (here). Dembski’s LCCSI changed the specification before and after evolutionary processes, and so he was comparing apples to oranges.

In any case, as far as I can see gpuccio has not attempted to derive gpuccio’s argument from Dembski’s, and gpuccio has not directly invoked the LCCSI, or provided a theorem to replace it. gpuccio said in a response to a comment of mine at TSZ,

Look, I will not enter the specifics of your criticism to Dembski. I agre with Dembski in most things, but not in all, and my arguments are however more focused on empirical science and in particular biology.

While thus disclaiming that the argument is Dembski’s, on the other hand gpuccio does associate the argument with Dembski here by saying that

Of course, Dembski, Abel, Durston and many others are the absolute references for any discussion about functional information. I think and hope that my ideas are absolutely derived from theirs. My only purpose is to detail some aspects of the problem.

and by saying elsewhere that

No generation of more than 500 bits has ever been observed to arise in a non design system (as you know, this is the fundamental idea in ID).

That figure being Dembski’s, this leaves it unclear whether gpuccio is or is not basing the argument on Dembski’s. But gpuccio does not directly invoke the LCCSI, or try to come up with some mathematical theorem that replaces it.

So possibility #1 can be safely ruled out.

Possibility #2. That the target region in the computation of Functional Information consists of all of the sequences that have nonzero function, while all other sequences have zero function. As there is no function elsewhere, natural selection for this function then cannot favor sequences closer and closer to the target region.

Such cases are possible, and usually gpuccio is talking about cases like this. But gpuccio does not require them in order to have Functional Information. gpuccio does not rule out that the region could be defined by a high level of function, with lower levels of function in sequences outside of the region, so that there could be paths allowing evolution to reach the target region of sequences.

An example in which gpuccio recognizes that lower levels of function can exist outside the target region is found here, where gpuccio is discussing natural and artificial selection:

Then you can ask: why have I spent a lot of time discussing how NS (and AS) can in some cases add some functional information to a sequence (see my posts #284, #285 and #287)

There is a very good reason for that, IMO.

I am arguing that:

1) It is possible for NS to add some functional information to a sequence, in a few very specific cases, but:

2) Those cases are extremely rare exceptions, with very specific features, and:

3) If we understand well what are the feature that allow, in those exceptional cases, those limited “successes” of NS, we can easily demonstrate that:

4) Because of those same features that allow the intervention of NS, those scenarios can never, never be steps to complex functional information.

Jack Szostak defined functional information by having us define a cutoff level of function to define a set of sequences that had function greater than that, without any condition that the other sequences had zero function. Neither did Durston. And as we’ve seen gpuccio associates his argument with theirs.

So this second possibility could not be the source of gpuccio’s general assertion about 500 bits of functional information being a reliable indicator of design, however much gpuccio concentrates on such cases.

Possibility #3. That there is an additional condition in gpuccio’s Functional Information, one that does not allow us to declare it to be present if there is a way for evolutionary processes to achieve that high a level of function. In short, if we see 500 bits of Szostak’s functional information, and if it can be put into the genome by natural evolutionary processes such as natural selection then for that reason we declare that it is not really Functional Information. If gpuccio is doing this, then gpuccio’s Functional Information is really a very different animal than Szostak’s functional information.

Is gpuccio doing that? gpuccio does associate his argument with William Dembski’s, at least in some of his statements. And William Dembski has defined his Complex Specified Information in this way, adding the condition that it is not really CSI unless it is sufficiently improbable that it be achieved by natural evolutionary forces (see my discussion of this here in the section on “Dembski’s revised CSI argument” that refer to Dembski’s statements here). And Dembski’s added condition renders use of his CSI a useless afterthought to the design inference.

gpuccio does seem to be making a similar condition. Dembski’s added condition comes in via the calculation of the “probability” of each genotype. In Szostak’s definition, the probabilities of sequences are simply their frequencies among all possible sequences, with each being counted equally. In Dembski’s CSI calculation, we are instead supposed to compute the probability of the sequence given all evolutionary processes, including natural selection.

gpuccio has a similar condition in the requirements for concluding that complex
functional information is present: We can see it at step (6) here:

If our conclusion is yes, we must still do one thing. We observe carefully the object and what we know of the system, and we ask if there is any known and credible algorithmic explanation of the sequence in that system. Usually, that is easily done by excluding regularity, which is easily done for functional specification. However, as in the particular case of functional proteins a special algorithm has been proposed, neo darwininism, which is intended to explain non regular functional sequences by a mix of chance and regularity, for this special case we must show that such an explanation is not credible, and that it is not supported by facts. That is a part which I have not yet discussed in detail here. The necessity part of the algorithm (NS) is not analyzed by dFSCI alone, but by other approaches and considerations. dFSCI is essential to evaluate the random part of the algorithm (RV). However, the short conclusion is that neo darwinism is not a known and credible algorithm which can explain the origin of even one protein superfamily. It is neither known nor credible. And I am not aware of any other algorithm ever proposed to explain (without design) the origin of functional, non regular sequences.

In other words, you, the user of the concept, are on your own. You have to rule out that natural selection (and other evolutionary processes) could reach the target sequences. And once you have ruled it out, you have no real need for the declaration that complex functional information is present.

I have gone on long enough. I conclude that the rule that observation of 500 bits of functional information is present allows us to conclude in favor of Design (or at any rate, to rule out normal evolutionary processes as the source of the adaptation) is simply nonexistent. Or if it does exist, it is as a useless add-on to an argument that draws that conclusion for some other reason, leaving the really hard work to the user.

Let’s end by asking gpuccio some questions:
1. Is your “functional information” the same as Szostak’s?
2. Or does it add the requirement that there be no function in sequences that
are outside of the target set?
3. Does it also require us to compute the probability that the sequence arises as a result of normal evolutionary processes?

1,971 thoughts on “Does gpuccio’s argument that 500 bits of Functional Information implies Design work?”

Mung on June 21, 2018 at 10:11 pm said:

Rumraket: What are we then even trying to explain?

Good question.

…then how are we to even make sense of the question “where did the information come from” that creationists like to ask?

To make sense of that question we’d have to ask what they mean by “information” in that context. It’s probably not “functional information” as defined by Szostak et al.

As to where FI comes from, I think I explained that already. It comes from the mind of man.
DNA_Jock on June 21, 2018 at 10:26 pm said:

Mung:

DNA_Jock: All other sequences (in this sequence space) have lower FI.

What you ought to say is that all other sequences (in this sequence space) have lower degree of function.

You all still don’t understand the plane intersecting the cone at all.

What you wrote here makes me think that you, Mung, do not understand FI, and do not understand the plane intersecting the cone.
My statement and your “what you ought to say” rewriting of it are logically identical.
For a given function and sequence space, FI increases monotonically with increasing degree of function.
If for element X the numerator is 1, that means all other elements have a lower degree of function, and a lower FI.

Mung: If the FI were a property of each individual sequence then you would not need to know the amounts of function of all possible sequences.

It disappoints me that you can write something this dumb after I had explained

DNA_Jock: What that level is depends on the activity level of the entire set, same as if we were talking about what growth percentile your kid is in.

Mung Jr’s FI = -logbase2(1- Mung Jr’s percentile)
Mung Jr is in the “90th percentile w.r.t. height”. Mung Jr’s height percentile is a property of Mung Jr, yet it does depend on the height of all other humans. Do you get it now?

Mung: He is saying that no other sequences have the same degree of function.

NO. He is saying that no other sequences have the same or greater degree of function
Joe Felsenstein on June 22, 2018 at 1:00 am said:

Mung: To make sense of that question we’d have to ask what they mean by “information” in that context. It’s probably not “functional information” as defined by Szostak et al.

As to where FI comes from, I think I explained that already. It comes from the mind of man.

There’s this thing called Functional Information. Whether it “really is” information, whether it is “new information”, whether it originates from the process of natural selection, or was already lying around somewhere out there

… are completely irrelevant to this discussion.

The point is that whatever FI is or is not, wherever it originates, whether it is “new”, it has been asserted by gpuccio and by colewd that seeing 500 bits of FI in the genome is a reliable indicator of ID. The question is whether that rule works.

At this point the issue has become whether any FI can end up in the genome after a process of natural selection.

The answers are No (the 500 bits rule does not work) and Yes, some FI can end up in the genome after natural selection.
colewd on June 22, 2018 at 4:35 am said:

Joe Felsenstein,

The answers are No (the 500 bits rule does not work) and Yes, some FI can end up in the genome after natural selection.

Were almost 1000 comments in and these are still the unsupported assertions you started with.
Joe Felsenstein on June 22, 2018 at 6:15 am said:

colewd:
Joe Felsenstein,

Were almost 1000 comments in and these are still the unsupported assertions you started with.

No. The shoe is on the other foot. You and gpuccio stated a general rule that 500 bits of FI is impossible by ordinary evolutionary processes. Asked why, you provide no reason that this is generally true, but demand that we disprove it instead.

Yawn …
Rumraket on June 22, 2018 at 8:16 am said:

Yes, it is ironic that the 500 bits rule is based on our ignorance about sequence space. The assumption made, completely unjustified, is that the sequences we see in extant life are the only ones possible and that no slope exists to the current level of function. The degree of conservation of the sequence is actually completely irrelevant. Even if ATP synthase was only conserved at 1% across the diversity of life, it would not be possible to meet the threshold where the system exhibits less than 500 bits of FI.

Suppose we knew about 10^4 different functional variants of ATP synthase subunit beta.

According to Gpuccio, that would mean ATP synthase subunit beta exhibits:
-log2(10^4 / 20^500) ≈ 2144 bits.

Then suppose we were to experimentally determine that there was a slope for selection to climb towards ATP synthase function for the beta subunit, and this would massively increase the number of functional sequences. Let’s say we experimentally map a large portion of a slope all the way down from nonfunction and up to a local optimum where extant ATP synthase sequences are found. We find that there are 10^50 possible ATP synthase subunit beta sequences on this slope. An incredible number.

So Gpuccio reduces his threshold so that now 10^50 sequences meet the minimum desired function for ATP synthase subunit beta, and recalculates FI:

-log2(10^50 / 20^500) ≈ 1994 bits.

So even if ATP synthase subunit beta could evolve from nonfunction, and there were 10^50 possible ATP synthase subunit beta sequences, the 500 bits rule says it can’t evolve because even with 10^50 possible ATP synthase sequences the system still exhibits 1994 bits.

That raises the question, how many would there have to be before the system exhibits less than 500 bits of FI?

There’d have to be 10^501 (ten to the five hundred and first power) possible ATP synthase subunit beta sequences for the system to exhibit less than 500 bits of FI before Gpuccio would declare it evolvable.

That is absolutely ridiculous. It doesn’t even matter how conserved the sequence is. The fact that it is ~500 amino acids in length is what makes it exhibit inordinate amounts of FI. The single most contributing factor to FI quantity is sequence length. The degree of conservation across life’s diversity or history could never add up to reducing the amount of FI exhibited by the system below 500 bits if we are dealing with a protein with a length of 500 amino acids.

Adding more sequences that meet the threshold has a neglible impact on FI, compared to adding more sequence length. This is important, because even if we were to massively increase the number of known possible functional variants of a sequence, we could never hope to meet the 500 bits threshold.

For ten sequences of L=500:
-log2(10/(20^500)) ≈ 2157 bits.

Let’s increase the threshold by 1000:
-log2(10^4/(20^500)) ≈ 2147 bits.

Let’s increase the threshold by 1000 again:
-log2(10×10^6/(20^500)) ≈ 2137 bits.

So going from 10 sequences to 10 million sequences reduces the number of bits exhibited by the system by a mere 20.

Let’s add 50 to the sequence length instead:
-log2(10/(20^550)) ≈ 2373 bits.
So we basically added 200 bits to the system with that 50 aa insertion.

Or we can do another thought-experiment. Suppose we experimentally demonstrated the evolution of ATP synthase from an arbitrary GTP synthase, so we do the FI calculation and include all known ATP synthase and GTP synthase sequences (a few tens of thousands), plus a large ensemble of experimentally derived intermediate sequences that exhibit some degree of both ATP and GTP synthesis. As in there is a selectable route between them. In the experiment we determine there are about twenty billion possible intermediate sequences, in addition to the known ATP and GTP synthase sequences. So in total we factor in that twenty billion and thirty thousand sequences meet the threshold.

-log2(20.00003×10^9 / 20^500) ≈ 2126 bits.

So even if we were to experimentally demonstrate a path for selection to traverse from another function, with twenty billion different possible intermediate sequences, Gpuccio would still declare it well above being evolvable. But how does Gpuccio know that such a transition isn’t actually possible right now? He just assumes it isn’t and he wants it proven to be possible, otherwise he’s just going to assume the 500 bits rule stands.

The 500 bits rule is outright question begging nonsense. It’s basically just a blind declaration that “X can’t evolve, prove me wrong!” couched in sciency-sounding technobabble. And it’s based on a method for calculating FI that is skewed massively towards inflating FI for a longer sequence, while the number of possible sequences that meet some arbitrary threshold of function has no hope of ever bringing the FI of the system down such that Gpuccio could agree it was evolvable.

Again, it’s just question-begging. It assumes what it is supposed to prove.
OMagain on June 22, 2018 at 9:02 am said:

colewd: Were almost 1000 comments in and these are still the unsupported assertions you started with.

Let me help with what’s next. Now that anybody who had the vaguest interest in this has said their piece interest will wane. Due to the nature of the beast, this waning interest will be taken as a sign that gpuccios claims are correct as “nobody can defeat them”.

Neither you nor gpuccio will learn from this extended discussion, apparently because you don’t actually really understand what’s being talked about. This is clear from the asymmetry in the comments from you colewd (short, simple) responding to the long, complex explanations of complex things that are being spoon fed to you. And most responses are ignored by you anyway.

So continue to believe that those assertions are unsupported. The point is if that were really a true reflection of the situation you’d be planning your next grant application and Joe would be wondering where it all went so very wrong.
DNA_Jock on June 22, 2018 at 12:16 pm said:

Rumraket,

Well put.
With any of these bit-counting approaches, there is a carefully obscured assumption of independence. Find a 20 amino acid motif, that’s a one in 10^26 chance. Tandem duplication, and suddenly it’s one in 10^26 times less probable: grand total one in 10^52. Nope, that’s not how it works. But it does make it easy to generate arbitrarily large numbers.
I also enjoy the strange effect of adding or subtracting an amino acid or two from the repertoire.
gpuccio’s 500mer has a whooping 2,161 bits of unlikelihood; but what if there were 21 amino acids — say we decided to include formylglycine (we are looking at an aryl sulfatase) or selenocysteine — then his 500mer has 2,196 bits. The extra 35 bits doesn’t sound like much, but he suddenly decided that his protein is 39 billion times less likely than before. Really? Include both formylglycine and selenocysteine and his 500mer is 497 billion billion times less likely than it was yesterday.
colewd on June 22, 2018 at 4:20 pm said:

Rumraket,

The 500 bits rule is outright question begging nonsense. It’s basically just a blind declaration that “X can’t evolve, prove me wrong!” couched in sciency-sounding technobabble. And it’s based on a method for calculating FI that is skewed massively towards inflating FI for a longer sequence, while the number of possible sequences that meet some arbitrary threshold of function has no hope of ever bringing the FI of the system down such that Gpuccio could agree it was evolvable.

The 500 bit rule allowed for an interesting discussion. It is true that we cannot measure every possible sequence but as proteins in multicellular organisms get multifunctional more of the 3D real estate needs to bind to something.

This should limit completely different sequences doing the job. It also makes describing the protein by a single peak or island difficult.

If you study what the beta chain does, it joins the alpha chain in an every other configuration and creates a joint binding site with the Alpha chain taking one ADP molecule and catalyzing a reaction with a phosphate molecule and making ATP. There is a lot of the surface area of this structure that must successfully bind to the Alpha site so it is not surprising that it is mutationally sensitive.

Joe is claiming that our argument is that evolutionary processes are impossible and it is possible one of us stated this.

Gpuccio’s argument that I have seen is that 500 bits of FI make it safe to infer design.

As Corneel said, asking your opponent to prove a negative is a safe way to argue but says little about your position.

Does any one want to defend the argument that if we see 500 bits of FI it is safe to infer natural evolutionary processes?
Mung on June 22, 2018 at 4:38 pm said:

Possibly of interest to some:

DNA as Clue: How Intelligence Detects and Creates Information
Mung on June 22, 2018 at 4:55 pm said:

DNA_Jock: What you wrote here makes me think that you, Mung, do not understand FI, and do not understand the plane intersecting the cone.

🙂

I knew there was a reason I like you.

My statement and your “what you ought to say” rewriting of it are logically identical.

They are not.

For a given function and sequence space, FI increases monotonically with increasing degree of function.

Have I argued otherwise?

If for element X the numerator is 1, that means all other elements have a lower degree of function, and a lower FI.

I don’t know what you mean by an element. Do you mean a sequence in the sequence space? If so then you are merely repeating yourself. I already agreed that all other sequences would have a lower degree of function.

Perhaps I can pose a different question. What is the point of calculating the FI for each individual sequence, when all that matters for a sequence is it’s degree of function? For all that matters is where that sequence is, either at or above the threshold, or below the threshold, for calculating the FI. It’s the degree of function that is relevant when it comes to the individual sequence, not the FI.

He is saying that no other sequences have the same or greater degree of function

Yes, you are correct. But it should have been obvious that is what I meant. And the overall point stands.
Mung on June 22, 2018 at 5:03 pm said:

Joe Felsenstein: The answers are No (the 500 bits rule does not work) and Yes, some FI can end up in the genome after natural selection.

I like the way you boil it down.

I don’t know if the rule works or not. I am not convinced that it does. I don’t think you’ve seen me arguing that it does. But I also don’t see that anyone has shown that it does not work. I accept that people are skeptical. I don’t fault them for that.

As to whether FI can end up in the genome “after natural selection” or some other biological process, I have not seen anyone show how. So for now I consider it a draw.

Can FI end up in the genome “before natural selection” or “without natural selection”? And if not, why not?
Mung on June 22, 2018 at 5:07 pm said:

Rumraket: This is important, because even if we were to massively increase the number of known possible functional variants of a sequence, we could never hope to meet the 500 bits threshold.

Perhaps that is why gpuccio thinks 500 bits implies design. Perhaps it really is important.
DNA_Jock on June 22, 2018 at 5:24 pm said:

LOL Mung. You baldly assert that the two statements are not equivalent, and then, when I put them next to each other, you accuse me of ‘merely repeating myself’.
Thank you for that concession.

Mung: Perhaps I can pose a different question. What is the point of calculating the FI for each individual sequence, when all that matters for a sequence is it’s degree of function? For all that matters is where that sequence is, either at or above the threshold, or below the threshold, for calculating the FI. It’s the degree of function that is relevant when it comes to the individual sequence, not the FI.

That’s why I brought up your notional son’s height.

What is the point of calculating the percentile for any individual person, when all that matters for a person is his height? For all that matters is where that height is, either at or above the threshold, or below the threshold, for calculating the percentile. It’s the height that is relevant when it comes to the individual, not the percentile.

The percentile is merely a useful way of re-expressing height (or weight), which can be useful when comparing and contrasting different metrics (height vs weight, say).
As I noted above FI = -logbase2(1-percentile)
Rumraket on June 22, 2018 at 5:50 pm said:

Mung: Perhaps that is why gpuccio thinks 500 bits implies design. Perhaps it really is important.

You obviously missed the two points that conservation of sequence is then an irrelevant point, and that even with an experimental demonstration of a protein with one function evolving into another, with literally every amino acid changed, it still wouldn’t bring the FI of the system down below 500 bits provided both proteins were of sufficient length.

Which means that 500 bits implies design is merely a question-begging assertion. Calculating that the system exhibits 500 bits of FI (which could be true) doesn’t actually show it couldn’t evolve. It doesn’t even imply it. There’s absolutely no reason to think that. It’s a technobabble way of saying “Goddidit prove me wrong with an experiment where you evolve X from scratch or from something else”.
colewd on June 22, 2018 at 6:09 pm said:

Rumraket,

You obviously missed the two points that conservation of sequence is then an irrelevant point, and that even with an experimental demonstration of a protein with one function evolving into another, with literally every amino acid changed, it still wouldn’t bring the FI of the system down below 500 bits provided both proteins were of sufficient length.

This is not representative of the proteins that gpuccio is using for his calculations.
Rumraket on June 22, 2018 at 6:43 pm said:

colewd: This is not representative of the proteins that gpuccio is using for his calculations.

Sure it is. Take ubiquitin as an example. If I were to find 10^200 different functional sequences of some ubiquitylation-related protein that is on the order of 300 amino acids long or more, that still wouldn’t bring the FI of that system down below 500 bits.
colewd on June 22, 2018 at 7:31 pm said:

Rumraket,

Sure it is. Take ubiquitin as an example. If I were to find 10^200 different functional sequences of some ubiquitylation-related protein that is on the order of 300 amino acids long or more, that still wouldn’t bring the FI of that system down below 500 bits.

The protein the gpuccio is selecting has in its current function a limit to how much it can mutate. Lets table that for a discussion.

Do you think it is reasonable that an AA sequence starting from a random a random sequence or a completely different function would find function inside what you described above where there are 10^200 selectable sequences and a total of 20^300 possible sequences?
Rumraket on June 22, 2018 at 8:09 pm said:

colewd: The 500 bit rule allowed for an interesting discussion. It is true that we cannot measure every possible sequence but as proteins in multicellular organisms get multifunctional more of the 3D real estate needs to bind to something.

So what?

This should limit completely different sequences doing the job.

No, there’s no reason to think that. Changes in one protein can actually open up for changes in the other protein. Didn’t you read the article I linked earlier from the Thornton lab?

There are ATP synthases that are almost completely dissimilar at the sequence level (V-type compared to F-type), yet they still have beta and alpha subunits that bind each other just fine.

It also makes describing the protein by a single peak or island difficult.

What do you mean by “describing by a single peak or island”?

If you study what the beta chain does, it joins the alpha chain in an every other configuration and creates a joint binding site with the Alpha chain taking one ADP molecule and catalyzing a reaction with a phosphate molecule and making ATP. There is a lot of the surface area of this structure that must successfully bind to the Alpha site so it is not surprising that it is mutationally sensitive.

And that doesn’t mean it couldn’t evolve. That could just as well indicate the sequences sit on a local optimum, and that selection has moved them up there so that now a few mutations are neutral as long as they keep the sequences at the top of the hill, and the rest have lower fitness so are elimited by purifying selection.

Joe is claiming that our argument is that evolutionary processes are impossible and it is possible one of us stated this.

Gpuccio’s argument that I have seen is that 500 bits of FI make it safe to infer design.

Yeah that’s Gpuccio’s conclusion and it is still unsupported for reasons explained now something like twenty times in this thread.

As Corneel said, asking your opponent to prove a negative is a safe way to argue but says little about your position.

You’re not being asked to prove a negative. Rather we are trying to explain to you that you don’t get to establish design as true until evolution can falsify it by experiment. To establish a hypothesis as an explanation for some entity you need an actual hypothesis that predicts something about the entity.

Does any one want to defend the argument that if we see 500 bits of FI it is safe to infer natural evolutionary processes?

Yeah, sure. The Lactate Dehydrogenase family of enzymes are roughly 330 amino acids long. There are tens of thousands of variants of it across the diversity of life, but nowhere near enough to bring the ensemble below 500 bits.

The LDHs exhibit ~1412 bits of FI.
-log2(10^4 / 20^330) ≈ 1412 bits.

They evolved from malate dehydrogenase (MDH) ancestors almost 1 billion years ago. See this: An atomic-resolution view of neofunctionalization in the evolution of apicomplexan lactate dehydrogenases.

“Abstract
Malate and lactate dehydrogenases (MDH and LDH) are homologous, core metabolic enzymes that share a fold and catalytic mechanism yet possess strict specificity for their substrates. In the Apicomplexa, convergent evolution of an unusual LDH from MDH produced a difference in specificity exceeding 12 orders of magnitude. The mechanisms responsible for this extraordinary functional shift are currently unknown. Using ancestral protein resurrection, we find that specificity evolved in apicomplexan LDHs by classic neofunctionalization characterized by long-range epistasis, a promiscuous intermediate, and few gain-of-function mutations of large effect. In canonical MDHs and LDHs, a single residue in the active-site loop governs substrate specificity: Arg102 in MDHs and Gln102 in LDHs. During the evolution of the apicomplexan LDH, however, specificity switched via an insertion that shifted the position and identity of this ‘specificity residue’ to Trp107f. Residues far from the active site also determine specificity, as shown by the crystal structures of three ancestral proteins bracketing the key duplication event. This work provides an unprecedented atomic-resolution view of evolutionary trajectories creating a nascent enzymatic function.”

So here we have a case where it is safe to infer evolutionary processes for 1412 bits.
colewd on June 22, 2018 at 8:30 pm said:

Rumraket,

Yeah, sure. The Lactate Dehydrogenase family of enzymes are roughly 330 amino acids long. There are tens of thousands of variants of it across the diversity of life, but nowhere near enough to bring the ensemble below 500 bits.

The LDHs exhibit ~1412 bits of FI.
-log2(10^4 / 20^330) ≈ 1412 bits.

Great. A competitive challenge to gpuccio’s calculation.:-)
colewd on June 22, 2018 at 8:44 pm said:

Rumraket,

You’re not being asked to prove a negative. Rather we are trying to explain to you that you don’t get to establish design as true until evolution can falsify it by experiment.

I love this. Did you ever see the old film or read the book catch 22.
Mung on June 22, 2018 at 8:47 pm said:

Rumraket: Calculating that the system exhibits 500 bits of FI (which could be true) doesn’t actually show it couldn’t evolve.

I agree with you. And Joe. It doesn’t show that either the sequence or the function could not have evolved. FI does not address that question. I think I’ve said that before. 🙂
Mung on June 22, 2018 at 8:58 pm said:

DNA_Jock: LOL Mung. You baldly assert that the two statements are not equivalent, and then, when I put them next to each other, you accuse me of ‘merely repeating myself’.
Thank you for that concession.

You claimed taht the two are logically the same, did you not? Then you just keep repeating that claim. Saying the same thing over and over doesn’t make them true. You need to show how they are logically the same.

Degree of function is not logically the same as FI. From the fact that some sequence can be assigned some degree of function it does not logically follow that the sequence has FI. That leap is a non-sequitur.

You assert that a sequence has some degree of function and you assert that the sequence also has FI. Then you repeat those assertions. I say you are merely repeating yourself. And from this you deduce that they must be logically the same thing. Brilliant.

What is your case for the claim that individual sequences have their own individual FI? Is it that people have their own individual height? FI is a measure. Height is not a measure. [ETA: height is a measure. Need to rephrase that. :)]

How many inches do you have? It’s a nonsense question. Like asking how much FI a sequence has.
Joe Felsenstein on June 22, 2018 at 8:58 pm said:

Mung: As to whether FI can end up in the genome “after natural selection” or some other biological process, I have not seen anyone show how. So for now I consider it a draw.

Then you probably forgot my discussion of a simple example with two sequences, in which higher function means higher fitness (in a comment here). In that case, which scarcely stretches the bounds of plausibility, after natural selection we are very likely to have more FI.

Can FI end up in the genome “before natural selection” or “without natural selection”? And if not, why not?

Genetic drift and/or mutation can change a population so as to have more FI, by accident. But when there is natural selection, with higher “function” associated with higher fitness, the increase of FI is much more likely.
Mung on June 22, 2018 at 9:12 pm said:

Joe Felsenstein: In that case, which scarcely stretches the bounds of plausibility, after natural selection we are very likely to have more FI.

No Joe. “In the genome” is not the same thing as “in the population,” at least not in my world. To me, individuals have a genome, populations do not. And your claim that I was responding to was specifically about FI “in the genome.”

Joe Felsenstein: Yes, some FI can end up in the genome after natural selection.

Whatever it is has to already be in the genome before natural selection (or drift) can lead to its spread throughout the population.

after natural selection we are very likely to have more FI.

We are very likely to have more FI where? You always leave your sentences unfinished. We are very likely to have more FI in the population? The population is not the genome. Your claim was about getting FI into the genome.

Genetic drift and/or mutation can change a population so as to have more FI, by accident.

So you’re talking about the population and I’m talking about the genome. Are they the same thing to you?
DNA_Jock on June 22, 2018 at 9:31 pm said:

No, Mung,

The statements were about the global optimum, viz:

DNA_Jock:
All other sequences (in this sequence space) have lower FI.

Mung:
What you ought to say is that all other sequences (in this sequence space) have lower degree of function.

So you are mis-characterizing our previous interaction. Unintentionally, I assume.
Please stop.
J-Mac on June 22, 2018 at 9:44 pm said:

Joe Felsenstein: Then you probably forgot my discussion of a simple example with two sequences, in which higher function means higher fitness (in a comment here).In that case, which scarcely stretches the bounds of plausibility, after natural selection we are very likely to have more FI.

Genetic drift and/or mutation can change a population so as to have more FI, by accident.But when there is natural selection, with higher “function” associated with higher fitness, the increase of FI is much more likely.

Joe,
Is there any way we could experimentally test any of your long list of speculations?
I don’t want to insult you but your blah, blah, blahs are so boring my kids don’t want to read this blog anymore…
I got a donation recently I can’t use for my next project because the contract prohibits that… I wanted to return it to the donor but the poor old lady has already done her taxes and listed it as an expense…
Do you think you could use it to towards a good cause? Maybe Harshman can help you out with his experimental experience with the birdies.. Or Larry Moran is probably done with his 90 % Junk DNA book coming out soon…
Joe Felsenstein on June 22, 2018 at 10:31 pm said:

Mung: So you’re talking about the population and I’m talking about the genome. Are they the same thing to you?

Um, no. I actually do work on the distinction between the two, and the relationships between them. Like, for the last 58 years, and many many papers, and an online book, and being, like, the only person who has done a comprehensive bibliography of that field (in 1981). Thanks for the lecture.
J-Mac on June 22, 2018 at 10:35 pm said:

Joe has responded… It must mean only one thing…
Joe Felsenstein on June 22, 2018 at 10:39 pm said:

J-Mac: I don’t want to insult you but your blah, blah, blahs are so boring my kids don’t want to read this blog anymore…

We live in hope that the rest of your family will come to feel the same.
Mung on June 22, 2018 at 10:46 pm said:

Joe Felsenstein: We live in hope that the rest of your family will come to feel the same.

LoL.
Mung on June 22, 2018 at 10:50 pm said:

Joe Felsenstein: Um, no.

So is FI in the genome or is it in the population? You keep switching back and forth between the two. How does natural selection put FI into the genome?
Mung on June 22, 2018 at 10:56 pm said:

DNA_Jock: So you are mis-characterizing our previous interaction. Unintentionally, I assume.
Please stop.

ok, sure.
J-Mac on June 22, 2018 at 11:13 pm said:

Joe Felsenstein: We live in hope that the rest of your family will come to feel the same.

Yeah, but how does the suffering on the greater scale your unfounded teachings inflict justify that? Why should we feel the miseries of your unfounded teachings? Unless you present some scientific, experimental evidence for your shitty claims, I think we are all in agreement we can, and should treat it as science fiction…
Joe Felsenstein on June 23, 2018 at 12:24 am said:

Mung: So is FI in the genome or is it in the population? You keep switching back and forth between the two. How does natural selection put FI into the genome?

I’ve got news for you. The FI is in the genome, and the genome is in the population.
Joe Felsenstein on June 23, 2018 at 12:29 am said:

It is interesting that so many “ID theorists” consider that the important step is the origin of new alleles (or of new haplotypes) and that subsequent changes in gene frequency (or haplotype frequency) are not very interesting. Evolutionary biologists consider both processes but definitely consider the changes of frequency of great interest.

Say we’re at a horse-race, and I show you a list of 8 horses that are in the next race. I offer, for 5 dollars, to whisper in your ear a well-informed assessment of which horse will win the next race.

If you are an “ID theorist” I presume that your answer will be “No thanks, I’ll save my money, because all the information about that is right there in the list of 8 horses”.

If you were, say, Claude Shannon, you would pay the 5 dollars.
Erik on June 23, 2018 at 6:59 am said:

Joe Felsenstein,

Your analogy raises a bunch of questions:

Based on what or in what sense is your information well-informed so that it’s worth 5 dollars?

How did you get it? If the information is derived from the horses, then everybody else (who has been paying enough attention to the horses) has it too and you would not be any better informed than anybody else. But if the information entails that the race is rigged, then how is this a good analogy? Is evolution a rigged game?
Rumraket on June 23, 2018 at 7:44 am said:

colewd: You’re not being asked to prove a negative. Rather we are trying to explain to you that you don’t get to establish design as true until evolution can falsify it by experiment.

I love this. Did you ever see the old film or read the book catch 22.

There is no catch 22 involved here. I think you misunderstood what I wrote.

It is you who is advocating the view that Design is the explanation for X by default [when we detect 500 bits of FI], and you’re just going to believe that unless we can show to your satisfaction that evolution could produce those 500 bits of FI.

It’s a strange type of null-hypothesis testing where you assume that your null is true until someone does an experiment that falsifies it. And I’m saying you don’t get to do that.
Corneel on June 23, 2018 at 10:16 am said:

Erik: If the information is derived from the horses, then everybody else (who has been paying enough attention to the horses) has it too and you would not be any better informed than anybody else.

You would think so, but some of the participants at TSZ seem to be bent on remaining uninformed. 😉
Joe Felsenstein on June 23, 2018 at 3:30 pm said:

Erik: Your analogy raises a bunch of questions: …

We could be more and more specific about this particular hypothetical case, but let’s not waste time on the details. Except to note that, in such cases, anyone betting should pay close attention to the probabilities of the different horses winning and the probability of me giving you the name of the real winner. Which our “ID theorists” insist do not matter, at least when they come to calculate how much “information” is being transmitted.
Mung on June 23, 2018 at 4:04 pm said:

Joe Felsenstein: I’ve got news for you. The FI is in the genome, and the genome is in the population.

I don’t believe you. And the mere fact that you can assert it does not make it so.

Is FI in every bit of a genome or only in parts of a genome?

You’re saying that you could calculate the FI for a genome? Do different parts of a genome have different FI, and if so what do you do, average it all out to get the FI that is in that genome?
Mung on June 23, 2018 at 4:10 pm said:

Joe Felsenstein: If you were, say, Claude Shannon, you would pay the 5 dollars.

You are doing precisely what Tom warned against. And you’re wrong about Shannon Information too. This is just too bizarre.

In Shannon information the meaning in the message is irrelevant Joe. So I wouldn’t pay 5 bucks for it because I could not be certain that it would provide any information at all about the horses. I would pay you 5 bucks to go get me a drink though.
Mung on June 23, 2018 at 4:13 pm said:

Rumraket: It’s a strange type of null-hypothesis testing where you assume that your null is true until someone does an experiment that falsifies it. And I’m saying you don’t get to do that.

We’re not doing null hypothesis testing.

It is not uncommon in science to hold a hypothesis or theory to be true unless and until it is falsified or until a better theory comes along. Am I wrong?
Mung on June 23, 2018 at 4:14 pm said:

Corneel: You would think so, but some of the participants at TSZ seem to be bent on remaining uninformed.

I think you just fail to appreciate the effort that takes!

😀
colewd on June 23, 2018 at 7:13 pm said:

Joe Felsenstein,

Which our “ID theorists” insist do not matter, at least when they come to calculate how much “information” is being transmitted.

The ID theorists would be interested in the source of the information and the historical accuracy of the transmitter.

If the source and the transmitter could be determined to be accurate and that information predicted a favorite, the ID theorist would place the bet.
If the transmitter was inducing random copying errors the ID theorist would be skeptical depending on the error rate.

At no time would the ID theorist believe that the information could be delivered with a random letter transmitter alone even if by serendipity that random letter transmitter allowed a key word through by random copying errors and allowed him to pick a winner.

The question the ID theorist would ask is what are the possible sources of improving the information so he could make more money per race?
dazz on June 23, 2018 at 11:01 pm said:

colewd: The ID theorists would be interested in the source of the information and the historical accuracy of the transmitter.

Bullshit, IDiots A.K.A. creotards, claim ID is not about the identity of the designer
Joe Felsenstein on June 24, 2018 at 5:58 am said:

An “ID theorist” might well do the right things when dealing with racetrack tips. The question is whether that behavior is consistent with what they say about information — that the event that creates the first copy of the correct tip is what counts, and that frequencies of that tip among all tips does not matter.

In any case, the definition of Functional Information does rate all sequences according to a “function” scale, and computes a quantity for each possible sequence, based on the probability that a randomly chose sequence would do that well or better. We can discuss how the frequencies of sequences that actually occur in a population can change, and the average FI change, without establishing that this is, or is not, the correct way to measure “information”.
Mung on June 24, 2018 at 6:04 am said:

Joe, Shannon information has nothing to do with what some outside huckster is trying to sell, even if that huckster is you.

Given that you are wrong about Shannon information, I must remain skeptical when it comes to anything you have to say about functional information. FI is not a measure of information, and Hazen et al. do not claim that it is.
Joe Felsenstein on June 24, 2018 at 6:08 am said:

I disagree with Mung about whether FI measures information, and I disagree with Hazen about whether it measures “complexity”.

But in any case the issue here is whether 500 bits of the quantity FI, whatever it is that it represents, is a reliable indicator of Design. It isn’t, and there is no argument that it is.
Mung on June 24, 2018 at 6:09 am said:

Joe Felsenstein: …anyone betting should pay close attention to the probabilities of the different horses winning and the probability of me giving you the name of the real winner.

If the horses have a probability of winning, the shannon information can be calculated from that probability distribution without any need for an external informant. You sir, are entirely superfluous.