Does gpuccio’s argument that 500 bits of Functional Information implies Design work?

Posted on May 21, 2018 by Joe Felsenstein

On Uncommon Descent, poster gpuccio has been discussing “functional information”. Most of gpuccio’s argument is a conventional “islands of function” argument. Not being very knowledgeable about biochemistry, I’ll happily leave that argument to others.

But I have been intrigued by gpuccio’s use of Functional Information, in particular gpuccio’s assertion that if we observe 500 bits of it, that this is a reliable indicator of Design, as here, about at the 11th sentence of point (a):

… the idea is that if we observe any object that exhibits complex functional information (for example, more than 500 bits of functional information ) for an explicitly defined function (whatever it is) we can safely infer design.

I wonder how this general method works. As far as I can see, it doesn’t work. There would be seem to be three possible ways of arguing for it, and in the end; two don’t work and one is just plain silly. Which of these is the basis for gpuccio’s statement? Let’s investigate …

A quick summary

Let me list the three ways, briefly.

(1) The first is the argument using William Dembski’s (2002) Law of Conservation of Complex Specified Information. I have argued (2007) that this is formulated in such a way as to compare apples to oranges, and thus is not able to reject normal evolutionary processes as explanations for the “complex” functional information. In any case, I see little sign that gpuccio is using the LCCSI.

(2) The second is the argument that the functional information indicates that only an extremely small fraction of genotypes have the desired function, and the rest are all alike in totally lacking any of this function. This would prevent natural selection from following any path of increasing fitness to the function, and the rareness of the genotypes that have nonzero function would prevent mutational processes from finding them. This is, as far as I can tell, gpuccio’s islands-of-function argument. If such cases can be found, then explaining them by natural evolutionary processes would indeed be difficult. That is gpuccio’s main argument, and I leave it to others to argue with its application in the cases where gpuccio uses it. I am concerned here, not with the islands-of-function argument itself, but with whether the design inference from 500 bits of functional information is generally valid.

We are asking here whether, in general, observation of more than 500 bits of functional information is “a reliable indicator of design”. And gpuccio’s definition of functional information is not confined to cases of islands of function, but also includes cases where there would be a path to along which function increases. In such cases, seeing 500 bits of functional information, we cannot conclude from this that it is extremely unlikely to have arisen by normal evolutionary processes. So the general rule that gpuccio gives fails, as it is not reliable.

(3) The third possibility is an additional condition that is added to the design inference. It simply declares that unless the set of genotypes is effectively unreachable by normal evolutionary processes, we don’t call the pattern “complex functional information”. It does not simply define “complex functional information” as a case where we can define a level of function that makes probability of the set less than $2^{-500}$ . That additional condition allows us to safely conclude that normal evolutionary forces can be dismissed — by definition. But it leaves the reader to do the heavy lifting, as the reader has to determine that the set of genotypes has an extremely low probability of being reached. And once they have done that, they will find that the additional step of concluding that the genotypes have “complex functional information” adds nothing to our knowledge. CFI becomes a useless add-on that sounds deep and mysterious but actually tells you nothing except what you already know. So CFI becomes useless. And there seems to be some indication that gpuccio does use this additional condition.

Let us go over these three possibilities in some detail. First, what is the connection of gpuccio’s “functional information” to Jack Szostak’s quantity of the same name?

Is gpuccio’s Functional Information the same as Szostak’s Functional Information?

gpuccio acknowledges that gpuccio’s definition of Functional Information is closely connected to Jack Szostak’s definition of it. gpuccio notes here:

Please, not[e] the definition of functional information as:

“the fraction of all possible configurations of the system that possess a degree of function >=
Ex.”

which is identical to my definition, in particular my definition of functional information as the
upper tail of the observed function, that was so much criticized by DNA_Jock.

(I have corrected gpuccio’s typo of “not” to “note”, JF)

We shall see later that there may be some ways in which gpuccio’s definition
is modified from Szostak’s. Jack Szostak and his co-authors never attempted any use of his definition to infer Design. Nor did Leslie Orgel, whose Specified Information (in his 1973 book The Origins of Life) preceded Szostak’s. So the part about design inference must come from somewhere else.

gpuccio seems to be making one of three possible arguments;

Possibility #1 That there is some mathematical theorem that proves that ordinary evolutionary processes cannot result in an adaptation that has 500 bits of Functional Information.

Use of such a theorem was attempted by William Dembski, his Law of Conservation of Complex Specified Information, explained in Dembski’s book No Free Lunch: Why Specified Complexity Cannot Be Purchased without Intelligence (2001). But Dembski’s LCCSI theorem did not do what Dembski needed it to do. I have explained why in my own article on Dembski’s arguments (here). Dembski’s LCCSI changed the specification before and after evolutionary processes, and so he was comparing apples to oranges.

In any case, as far as I can see gpuccio has not attempted to derive gpuccio’s argument from Dembski’s, and gpuccio has not directly invoked the LCCSI, or provided a theorem to replace it. gpuccio said in a response to a comment of mine at TSZ,

Look, I will not enter the specifics of your criticism to Dembski. I agre with Dembski in most things, but not in all, and my arguments are however more focused on empirical science and in particular biology.

While thus disclaiming that the argument is Dembski’s, on the other hand gpuccio does associate the argument with Dembski here by saying that

Of course, Dembski, Abel, Durston and many others are the absolute references for any discussion about functional information. I think and hope that my ideas are absolutely derived from theirs. My only purpose is to detail some aspects of the problem.

and by saying elsewhere that

No generation of more than 500 bits has ever been observed to arise in a non design system (as you know, this is the fundamental idea in ID).

That figure being Dembski’s, this leaves it unclear whether gpuccio is or is not basing the argument on Dembski’s. But gpuccio does not directly invoke the LCCSI, or try to come up with some mathematical theorem that replaces it.

So possibility #1 can be safely ruled out.

Possibility #2. That the target region in the computation of Functional Information consists of all of the sequences that have nonzero function, while all other sequences have zero function. As there is no function elsewhere, natural selection for this function then cannot favor sequences closer and closer to the target region.

Such cases are possible, and usually gpuccio is talking about cases like this. But gpuccio does not require them in order to have Functional Information. gpuccio does not rule out that the region could be defined by a high level of function, with lower levels of function in sequences outside of the region, so that there could be paths allowing evolution to reach the target region of sequences.

An example in which gpuccio recognizes that lower levels of function can exist outside the target region is found here, where gpuccio is discussing natural and artificial selection:

Then you can ask: why have I spent a lot of time discussing how NS (and AS) can in some cases add some functional information to a sequence (see my posts #284, #285 and #287)

There is a very good reason for that, IMO.

I am arguing that:

1) It is possible for NS to add some functional information to a sequence, in a few very specific cases, but:

2) Those cases are extremely rare exceptions, with very specific features, and:

3) If we understand well what are the feature that allow, in those exceptional cases, those limited “successes” of NS, we can easily demonstrate that:

4) Because of those same features that allow the intervention of NS, those scenarios can never, never be steps to complex functional information.

Jack Szostak defined functional information by having us define a cutoff level of function to define a set of sequences that had function greater than that, without any condition that the other sequences had zero function. Neither did Durston. And as we’ve seen gpuccio associates his argument with theirs.

So this second possibility could not be the source of gpuccio’s general assertion about 500 bits of functional information being a reliable indicator of design, however much gpuccio concentrates on such cases.

Possibility #3. That there is an additional condition in gpuccio’s Functional Information, one that does not allow us to declare it to be present if there is a way for evolutionary processes to achieve that high a level of function. In short, if we see 500 bits of Szostak’s functional information, and if it can be put into the genome by natural evolutionary processes such as natural selection then for that reason we declare that it is not really Functional Information. If gpuccio is doing this, then gpuccio’s Functional Information is really a very different animal than Szostak’s functional information.

Is gpuccio doing that? gpuccio does associate his argument with William Dembski’s, at least in some of his statements. And William Dembski has defined his Complex Specified Information in this way, adding the condition that it is not really CSI unless it is sufficiently improbable that it be achieved by natural evolutionary forces (see my discussion of this here in the section on “Dembski’s revised CSI argument” that refer to Dembski’s statements here). And Dembski’s added condition renders use of his CSI a useless afterthought to the design inference.

gpuccio does seem to be making a similar condition. Dembski’s added condition comes in via the calculation of the “probability” of each genotype. In Szostak’s definition, the probabilities of sequences are simply their frequencies among all possible sequences, with each being counted equally. In Dembski’s CSI calculation, we are instead supposed to compute the probability of the sequence given all evolutionary processes, including natural selection.

gpuccio has a similar condition in the requirements for concluding that complex
functional information is present: We can see it at step (6) here:

If our conclusion is yes, we must still do one thing. We observe carefully the object and what we know of the system, and we ask if there is any known and credible algorithmic explanation of the sequence in that system. Usually, that is easily done by excluding regularity, which is easily done for functional specification. However, as in the particular case of functional proteins a special algorithm has been proposed, neo darwininism, which is intended to explain non regular functional sequences by a mix of chance and regularity, for this special case we must show that such an explanation is not credible, and that it is not supported by facts. That is a part which I have not yet discussed in detail here. The necessity part of the algorithm (NS) is not analyzed by dFSCI alone, but by other approaches and considerations. dFSCI is essential to evaluate the random part of the algorithm (RV). However, the short conclusion is that neo darwinism is not a known and credible algorithm which can explain the origin of even one protein superfamily. It is neither known nor credible. And I am not aware of any other algorithm ever proposed to explain (without design) the origin of functional, non regular sequences.

In other words, you, the user of the concept, are on your own. You have to rule out that natural selection (and other evolutionary processes) could reach the target sequences. And once you have ruled it out, you have no real need for the declaration that complex functional information is present.

I have gone on long enough. I conclude that the rule that observation of 500 bits of functional information is present allows us to conclude in favor of Design (or at any rate, to rule out normal evolutionary processes as the source of the adaptation) is simply nonexistent. Or if it does exist, it is as a useless add-on to an argument that draws that conclusion for some other reason, leaving the really hard work to the user.

Let’s end by asking gpuccio some questions:
1. Is your “functional information” the same as Szostak’s?
2. Or does it add the requirement that there be no function in sequences that
are outside of the target set?
3. Does it also require us to compute the probability that the sequence arises as a result of normal evolutionary processes?

1,971 thoughts on “Does gpuccio’s argument that 500 bits of Functional Information implies Design work?”

Rumraket on June 25, 2018 at 5:44 pm said:

Mung: Therefore design. Q.E.D.

LOL that’s hilarious. So when evolution fails to produce a particular function, it has generated an infinite amount of FI.
Mung on June 25, 2018 at 5:50 pm said:

Rumraket: And based on this complete dismissal you conclude no FI has evolved.

I don’t know what you mean when you say that FI has evolved.

If a duplication of this sequence with function A:
AAGTCTCAT
can evolve by mutation into this sequence and gain function B:
GAGGCTCGT
So that now we have both sequences with function A, and function B:
AAGTCTCATGAGGCTCGT
Then it would obviously be ridiculous to claim that no information has been added.

Is this what you mean by FI evolving? So a function that had not previously evolved would have an FI of infinity? So an infinite amount of FI was added?
Mung on June 25, 2018 at 5:51 pm said:

Does Mung’s argument that infinite bits of Functional Information implies Design work?
Mung on June 25, 2018 at 5:55 pm said:

Rumraket: We’d have to calculate FI for the two of them combined. Where one could fail to meet the minimum threshold, two of them combined could.

I honestly don’t know what you mean here, but I think it’s a good discussion to have.

So do I have to include all sequences of length 1, and all sequences of length 2, and all sequences of length 3, etc., in order to calculate FI, even though, none of these sequences have any degree of function?

That just doesn’t seem right.
Rumraket on June 25, 2018 at 5:57 pm said:

colewd:So we have WNT evolving new tissue related variants through gene duplications. Now with the duplication of WNT we also need Fizzled (WNT receptor) to co-mutate (made up) with the WNT plus all the other proteins that bind with Fizzled.

What are the hell are you now talking about?

This is just more moving of goalposts. You don’t have a sensible answer for what I wrote about your silly idea that duplication and subsequent evolution of duplicates towards gaining new functions can’t yield any new FI, so now you’re just skipping to something else and start blathering about some other pet system you want to have explained how evolved.

I take that as a concession. You now accept that the evolution of the LDH enzyme from MDH ancestors is a case of evolution generating >1400 bits of FI.

The big assumption here is that sequence space and functional space are almost the same. Yet specific binding is important also these proteins are finding sequences that are intolerant of mutation. The need for creativity is on the rise.

I don’t even know what the hell this means to say. It is just blather.
Rumraket on June 25, 2018 at 6:01 pm said:

Mung: Is this what you mean by FI evolving? So a function that had not previously evolved would have an FI of infinity? So an infinite amount of FI was added?

No I genuinely don’t think we calculate FI for functions that don’t exist. Because it would be nonsensical to do that. And one way to see that it would be nonsensical is because they give infinities. It would be nonsensical to think that the absense of a function yields an infinite amount of functional information.

So we simply refrain from calculating FI for a function that does not exist. Only when the function exists do we attempt to calculate the FI for it.

I made a joke in response to you making what I take to be a joke (the “therefore design. Q.E.D”). That’s it.
colewd on June 25, 2018 at 6:04 pm said:

Rumraket,

This is just more moving of goalposts. You don’t have a sensible answer for what I wrote about your silly idea that duplication and subsequent evolution of duplicates towards gaining new functions can’t yield any new FI, so now you’re just skipping to something else and start blathering about some other pet system you want to have explained how evolved.

I am trying to show you how your gene duplication hypothesis is almost certainly wrong. If I moved the goal posts, I apologize.

As far as functional information goes I think you need to bone up on the subject. Mung appears to have a pretty good handle on it.
Mung on June 25, 2018 at 6:12 pm said:

Rumraket: That is one way among several that can happen. And yes there is actual evidence of this. See for example this paper: Reconstruction of Ancestral Metabolic Enzymes Reveals Molecular Mechanisms Underlying Evolutionary Innovation through Gene Duplication–

And you say you don’t believe in miracles?

Structural analysis and activity measurements on resurrected and present-day enzymes suggest that both activities cannot be fully optimized in a single enzyme. However, gene duplications repeatedly spawned daughter genes in which mutations optimized either isomaltase or maltase activity.

How convenient. Not just one, but both were optimized. Ain’t evolution grand!
Mung on June 25, 2018 at 6:15 pm said:

Rumraket: This is just more moving of goalposts.

No, I left the goalposts where they were. I moved the football field.
Mung on June 25, 2018 at 6:17 pm said:

Rumraket: You now accept that the evolution of the LDH enzyme from MDH ancestors is a case of evolution generating >1400 bits of FI.

What is the FI for the MDH enzyme, and what is the FI for the ancestral MDH sequences?
OMagain on June 25, 2018 at 6:23 pm said:

colewd: I am trying to show you how your gene duplication hypothesis is almost certainly wrong.

Do you then accept that the evolution of the LDH enzyme from MDH ancestors is a case of evolution generating >1400 bits of FI? As you did not comment on that.

It’s what you don’t say that is more interesting then what you do say colewd….
Rumraket on June 25, 2018 at 6:23 pm said:

Mung: I honestly don’t know what you mean here, but I think it’s a good discussion to have.

I agree, and I agree there is a significant danger of ambiguity about how to treat this case.

What I was thinking of was something like this:
Suppose we define a minimum threshold for function E(min) = X for some functional system.

The function could be anything, like the minimum amount of activity of protein biosynthesis to allow a cell to grow and divide. Pretty important function.

Now we could happen to know about sequences of ribosomal RNA (that carry out protein biosynthesis) that have mutations in them, such that if there were only one such ribosomal gene in the genome of the organism, it would have too low activity to allow the cell to grow and divide before it degrades and dies.

But, rather than change the sequence, it is possible this problem could be solved by dosage compensation (in fact that is actually the case, all organisms have dusins of ribosomal gene copies, some even have several hundred).

More ribosomal genes, more combined activity of all of them, and through mere duplication we could meet the minimum threshold.

If we just take a single ribosomal RNA sequence, it patently doesn’t meet the minimum threshold for function. So we can’t calculate the the FI for E(min) using only a single sequence.

But if we have, say, five such sequences, they do meet the minimum threshold for function.

Now the question is, how do we treat this case using Hazen & Szostak’s method?

I can see one way of doing that, by just adding the total sequence length up and calculating FI.

That doesn’t mean that is the only way to meet E(min), there could be other ways that have less FI (for example another mutated gene with very high activity could meet E(min) with fewer copies, but then that case would have less FI in comparison to the case of the five copies of a mutated gene). So a few high-activity gene that meets the E(min) threshold has less FI than the case of five duplications of the low-activity gene.

I don’t see what is wrong with that actually. It seems intuitively agreeable to me that a function that literally requires more genes also “has more” FI.

So do I have to include all sequences of length 1, and all sequences of length 2, and all sequences of length 3, etc., in order to calculate FI, even though, none of these sequences have any degree of function?

That just doesn’t seem right.

I agree that doesn’t make sense. I don’t see how I have suggested that.
Rumraket on June 25, 2018 at 6:25 pm said:

Mung: And you say you don’t believe in miracles?

What is a miracle? Define miracle.
Rumraket on June 25, 2018 at 6:26 pm said:

colewd: As far as functional information goes I think you need to bone up on the subject. Mung appears to have a pretty good handle on it.

Thank you for your commentary. I will give it all the consideration I think it merits. Particularly given how you have yourself demonstrated such a penetrating grasp of the subject that you should be considered a judge of who else “has a handle on it”.
OMagain on June 25, 2018 at 6:27 pm said:

Rumraket: What is a miracle? Define miracle.

Mung has previously accepted “the laws of Nature can be suspended” as a definition.
Rumraket on June 25, 2018 at 6:32 pm said:

OMagain: Mung has previously accepted “the laws of Nature can be suspended” as a definition.

No actually I think he’s jokingly saying that if skeptics accepted evidence that a miracle occured, that would itself be a miracle. 🙂
Rumraket on June 25, 2018 at 6:42 pm said:

Mung: What is the FI for the MDH enzyme, and what is the FI for the ancestral MDH sequences?

I don’t really know how to answer the question here. The threshold is set for a particular function, so you can’t really calculate the FI for one particular sequence.

When I calculated FI for the MDH function I just set E(min) such that all known variants of the protein qualify. That gives approximately 1430 bits FI.
colewd on June 25, 2018 at 7:41 pm said:

OMagain,

Do you then accept that the evolution of the LDH enzyme from MDH ancestors is a case of evolution generating >1400 bits of FI? As you did not comment on that.

I think the first thing to establish is if the cause is design or mutation and I have not read the paper close enough to have an opinion.

If you read Mungs opinion which I agree with a gene duplication is not adding information it is simply copying it. The mutations beyond that are trivial. If they are not trivial then we are probably looking at a separate design.
Mung on June 25, 2018 at 8:18 pm said:

Rumraket: The threshold is set for a particular function, so you can’t really calculate the FI for one particular sequence.

Joe and DNA_Jock disagree with you. I agree with you. Funny.

When I calculated FI for the MDH function I just set E(min) such that all known variants of the protein qualify. That gives approximately 1430 bits FI.

But all known variants would not be the ancestral sequence(s). How did you decide that the ancestral sequences would not provide you with the same result, 1430 bits?

Wouldn’t you agree that if you are going to say there has been a gain in information that if you can’t quantify it you can’t say how much of a gain there was?

The FI for LDH seems to be about the same. So where is the gain in FI?

Also, when you speak of the ancestral sequence(s) why would they not have the same value for FI as the modern sequences? The function hasn’t changed, the minimum threshold has not changed, the quantity of sequences hasn’t changed, therefore the same FI.
Rumraket on June 25, 2018 at 8:19 pm said:

colewd: The mutations beyond that are trivial.

In what sense? Their effect? The types of mutations? What does it even mean to say a mutation is trivial?

If they’re trivial, then it is trivial for new functions to evolve. Is that what you’re saying? It really reads like you just felt like saying something. An argument from the feelies.

If they are not trivial then we are probably looking at a separate design.

What separates trivial from non-trivial mutations? Give examples and explain why one is trivial and the other is not. Then explain why the “triviality” of a genetic mutation implies design.

Bill, you just say stuff. That’s mostly your output. You argue by just saying stuff random stuff that comes to your mind. It’s barely thought about stuff that you just type into your browser and post.
Mung on June 25, 2018 at 8:29 pm said:

Rumraket: It’s barely thought about stuff that you just type into your browser and post.

Do you miss Salvador?
colewd on June 25, 2018 at 8:32 pm said:

Rumraket,

What separates trivial from non-trivial mutations? Give examples and explain why one is trivial and the other is not. Then explain why the “triviality” of a genetic mutation implies design.

When you duplicate a gene get a small modification and catalyze a different reaction the majority of the source of the FI is the original protein.

We are not even discussing the new protein without the old one.

So if you want to figure out the source of the FI the primary analysis is the original protein. The paper is not about generating 1400 bits of FI de novo. It is about the idea of generating a new enzyme by mutation and selection of an existing enzyme.
colewd on June 25, 2018 at 8:54 pm said:

Rumraket,

What separates trivial from non-trivial mutations? Give examples and explain why one is trivial and the other is not. Then explain why the “triviality” of a genetic mutation implies design.

A trivial beneficial mutation could be one that enables an adaption by a few AA changes. An example is malaria resistance by the sickle cell mutation.

This is what I believe mutations can do. On the other hand I believe a complex adaption requires design. I believe that a random search through a sequence will most often lead to the loss of FI.

I see that I brought out an ambiguous word, I apologize.
Rumraket on June 25, 2018 at 9:05 pm said:

Mung: Joe and DNA_Jock disagree with you. I agree with you. Funny.

Actually now that I think about it you can, you just set the threshold such that only 1 sequence meets it and calculate FI for that as long as the length of the sequence is equal to the particular one you want to calculate FI for they’d have identical amount of FI.

So it’s roughly 1430 bits for any particular MDH homologue. They’re all about 330 amino acids long.

Rumraket: When I calculated FI for the MDH function I just set E(min) such that all known variants of the protein qualify. That gives approximately 1430 bits FI.

Mung: But all known variants would not be the ancestral sequence(s). How did you decide that the ancestral sequences would not provide you with the same result, 1430 bits?

I have not decided that they would not. They would. The ancestral MDH exhibits ~1430 bits of FI, so does any particular modern counterpart.

In addition to the MDH enzymes, we have the LDH enzymes, which also exhibit 1430 bits of FI. They shared common descent with the MDH enzymes, but they are separate functions so their FI is calculated separately.

Wouldn’t you agree that if you are going to say there has been a gain in information that if you can’t quantify it you can’t say how much of a gain there was?

We calculate FI for a particular function (and whether we pick a single sequence, or we pick all known variants capable of the function, we can in fact do that). But the functions are different, remember? So for different functions we calculate the FI for each of them.

The FI for LDH seems to be about the same. So where is the gain in FI?

The 1430 bits of FI for the LDH function was gained in addition to the 1430 bits for the MDH function. There are now 2860 bits of total functional information, 1430 bits for the MDH function, and 1430 bits for the LDH function.

Also, when you speak of the ancestral sequence(s) why would they not have the same value for FI as the modern sequences?

With respect to a single function, they would (in so far as they didn’t change in length).

The function hasn’t changed, the minimum threshold has not changed, the quantity of sequences hasn’t changed, therefore the same FI.

For one particular function, yes. But another function evolved in a duplicate by mutation of that duplicate. That new function also has about 1430 bits of FI. So a genome that contains both genes contains at least 2860 bits of FI. 1430 bits for function A, 1430 bits for function B. Any other view would be absurd.

It would be preposterous to say, for example, that if we have two apparently unrelated proteins with different functions, that each of them exhibit 1000 bits of FI and therefore the genome of an organism that have both has at least 2000 bits of FI.
But if we subsequently learn that they evolved from a common ancestor protein that only had one of those functions, that means the genome now suddenly only has 1000 bits because somehow for mysterious reasons the method by which that 2nd gene came about makes it not count.

It’s obviously just a silly ad-hoc excuse making effort to exclude the evolution of new functions from being capable of producing any new FI that wasn’t “already there”.

We could have twenty different highly specific enzymes. But because they ultimately derive from a common ancestor, only one of them counts. So we calculate the FI for one and ignore then 19 others, they don’t exhibit any FI.

C’mon. Anyone can see through this crap. You’re just trying to excude evolution from producing FI with motivated ad-hoc reasoning.
Rumraket on June 25, 2018 at 9:08 pm said:

colewd: When you duplicate a gene get a small modification and catalyze a different reaction the majority of the source of the FI is the original protein.

We are not even discussing the new protein without the old one.

So if you want to figure out the source of the FI the primary analysis is the original protein. The paper is not about generating 1400 bits of FI de novo. It is about the idea of generating a new enzyme by mutation and selection of an existing enzyme.

This is just shitty excuses Bill.

So if a designer creates a protein 330 amino acids in length with a particular function, that designer has created 1430 bits of FI.

Now the designer creates a new protein, that has some similarity to the previous one, but this one has a new function.

Has the designer not created any new FI?
colewd on June 25, 2018 at 9:19 pm said:

Rumraket,

Has the designer not created any new FI?

Interesting question 🙂 Mung, what do you think?
Mung on June 25, 2018 at 10:23 pm said:

colewd: Interesting question. Mung, what do you think?

I don’t think the designer creates FI. I think humans create FI.
Mung on June 25, 2018 at 10:35 pm said:

Rumraket: You’re just trying to excude evolution from producing FI with motivated ad-hoc reasoning.

Hardly.

I’m just pointing out that if you start with 1400 bits of FI and end up with 1400 bits of FI that evolution hasn’t produced 1400 bits of FI. Now if you started with 0 bits of FI and ended up with 1400 bits of FI, that’s something worth discussing.

It would be preposterous to say, for example, that if we have two apparently unrelated proteins with different functions, that each of them exhibit 1000 bits of FI and therefore the genome of an organism that have both has at least 2000 bits of FI.

The genome doesn’t have FI and I have no reason to think that FI is additive in the way you seem to think it is.

By analogy here is what you are saying. You have a short paper that consists of a series of sentences, no two of which are the same, each of which as it’s own “function.” You calculate the FI for each sentence. And then you add up all the FI for each sentence and claim the paper has x bits of FI. That’s what you are claiming you can do and I think it’s silly for the paper example and likewise just as silly for the genome example.

You don’t get the FI of the paper by adding up the individual FI for each sentence in the paper. To get the FI of the paper you’d have to define the “function” of the paper, not by adding up the FI for the “function” of each of it’s parts.
Mung on June 25, 2018 at 10:41 pm said:

Rumraket: So we calculate the FI for one and ignore then 19 others, they don’t exhibit any FI.

I’ve never said that. For each defined function and defined degree of function and set of sequences that perform that function to some degree you can calculate the FI.

I’m saying that you can’t then create one single long sequence from them all and declare the FI of the single long sequence is the sum of the FI of the parts of that sequence. That’s not the way it works.
DNA_Jock on June 25, 2018 at 11:08 pm said:

I think there is a subtlety here that could (conceivably) help colewd out of his endless confusion.
Hazen and Szostak talk about the FI as defined for a specific function. We can imagine the sequence –> activity level surface for any specific function, and we can imagine calculating the activity level –> FI mapping that results, all the way from 0 FI to 4.322 x the length of the amino acid sequence. For every single function imaginable.
For straightforward functions, like kcat-LDH and kcat-MDH, these surfaces and the resulting activity –> FI mapping are invariant.
Which is a useful property if, like Hazen and Szostak, one is trying to understand the relationship between activity level and FI.
The problem is when we start talking about Natural Selection (or Intelligent Selection, for that matter). Now there is a second mapping, from kcat-LDH and kcat-MDH (and the billions of other potentially selectable functions) to the unitary fitness surface for the sequence space. Because it is fitness that gets selected.
Now, as a massive simplification, one could stipulate that for ATP synthases, fitness is a function of kcat/Km for ATP synthesis, and nothing else, and that it is a non-decreasing function of kcat/Km. Therefore (conveniently) FI for fitness is the same as FI for kcat/Km-ATPsynth. Hey, the better you synthesize ATP, the better off you’ll be…we can get behind that approximation, okay?
But the mapping from kcat to fitness is not invariant; it is highly context dependent. It depends (Dawkins’s huge error in the Selfish Gene) on the other genes in that individual organism. It depends on copy number. It depends on a pile of context that varies over time and from individual to individual.
So trying to parse movement over fitness surfaces in terms of FI is problematic.
Instead, just think about the fitness surface, and how it relates to the kcat surfaces.
At time zero, we have an enzyme that dehyrdogenates lactate. Cool. It’s okay at that. It also dehyrdogenates malate. Really badly. No-one really cares. Fitness is 1000 x kcat LDH + 1 x kcat MDH. The population of sequences is exploring this little valley where LDH matters a lot, and MDH hardly at all.
The gene duplicates! Holy shit, what happens?
The answer is that the fitness surface, for both copies, suddenly gets MUCH flatter. Mutations that would have been fatal yesterday are now mildly deleterious, such that they may fix. If that happens (the most likely outcome), then the fitness surface for the surviving copy reverts to what it looked like before the duplication. We are back where we started, but with a chunk of currently useless DNA added to the genome. No biggie.
HOWEVER, sometimes one of the copies wanders across that flattened surface to an area of high MDH activity. That’s awesome, and MDH activity becomes an important new driver of fitness. And the new enzyme becomes better and better at using malate. Loses it’s LDH activity in the process. The fitness surface of the LDH copy changes back to something close to what is was originally (slightly different: catalyzing MDH has become irrelevant for that copy), and the MDH copy gets more and more specialized.
Joe Felsenstein on June 26, 2018 at 1:07 am said:

Excellent explanation. When there are two enzymes and two separate reactions, the parameters of one reaction would indicate a level of function F1 and of the other would indicate a level F2. Fitness would indicate the overall “function” F which would be dependent on both F1 and F2. The most sensible scale of “function” would then be fitness.
Rumraket on June 26, 2018 at 6:13 am said:

Mung: Hardly.

I’m just pointing out that if you start with 1400 bits of FI and end up with 1400 bits of FI that evolution hasn’t produced 1400 bits of FI.

Once again, there are two different functions. What you’re basically saying is the 2nd function doesn’t count.

Now if you started with 0 bits of FI and ended up with 1400 bits of FI, that’s something worth discussing.

You do, for the 2nd function.
Rumraket on June 26, 2018 at 6:22 am said:

Mung: I don’t think the designer creates FI. I think humans create FI.

@Bill

This is Mung’s way of avoiding the question.

What Mung is saying is that it is humans that decide to calculate -log2[F(Ex)] for systems with a particular function, so the sequences don’t “have” any FI.

So in Mung’s view the designer has not created any FI at all, neither for the 1st nor 2nd gene. By implication Mung should agree that both the 1st and 2nd sequence exhibits FI, different FI for different functions, insofar as a human decides to calculate it. But Mung thinks we cant’ add up FI for different functions, they’re separate, so we can never say there’s “more”.

Function A exhibits 1400 bits of FI.
Function B exhibits 1400 bits of FI.

But there’s no more than 1400 bits of FI in this system. There could be twenty thousand different genes with twenty thousand different functions, each exhibiting 1400 bits of FI for their respective functions. So in grand total there is… 1400 bits of FI, is all we can say according to Mung.

So you remember earlier in this thread when you wrote there are millions of bits of FI to be explained? Mung disagrees, you can’t add them up. You have to look at them separately. And if there was already something functional from which something else evolved, then no additional FI evolves, the system has what it started out with.
Rumraket on June 26, 2018 at 6:49 am said:

Mung: The genome doesn’t have FI and I have no reason to think that FI is additive in the way you seem to think it is.

That makes no sense I’m sorry to say. The implication of what you’re saying is that the quantity of functional information for an organism that has a 200 megabase functional genome partitioned into something like 45.000 functional genes, is potentially no more than an organism that has a 300 kilobase functional genome partitioned into 2000 functional genes.

It seems rather intuitively agreeable to me that an organism who’s genome is larger and encodes more elements with different functions, also “contains” more FI, the FI for each of those functional elements just added up.

By analogy here is what you are saying. You have a short paper that consists of a series of sentences, no two of which are the same, each of which as it’s own “function.” You calculate the FI for each sentence. And then you add up all the FI for each sentence and claim the paper has x bits of FI. That’s what you are claiming you can do and I think it’s silly for the paper example and likewise just as silly for the genome example.

Yes, the FI for each sentence while different, adds up to the total FI for the paper. Why not?

You don’t get the FI of the paper by adding up the individual FI for each sentence in the paper. To get the FI of the paper you’d have to define the “function” of the paper, not by adding up the FI for the “function” of each of it’s parts.

I think you do get the FI of the paper by adding up the FI for each functional element of it. At least it’d come out the same as if you just set the function for the entire paper and tried to calculate it’s FI.

Whether you have 10 sequences of length 100, calculate the FI for each and add them up, or you have one sequence of length 1000 and calculate the FI for that, the FI comes out the same.

So whether we treat the paper as 100 different fragments and add up the FI for each, or we treat the entire text of it as one long sequence with a single function, we get the same total FI.

10 x -log2(1/26^100) ≈ 4700 bits
-log2(1/26^1000) ≈ 4700 bits
Corneel on June 26, 2018 at 8:19 am said:

Mung: Does it come with a cherry on top?

Sure, but no whipped cream. I have bad memories of whipped cream.
Corneel on June 26, 2018 at 8:38 am said:

colewd: This is not the argument but carry on 🙂

Yes, it is. You argued that a duplication would be incapable of increasing information. Since we are discussing duplication of biopolymer sequences and their effect on functional information (which relies on degree of function) your statement is simply wrong.

colewd: How do you envision a protein with 6 different functions climbing up a single hill?

Are you taking drugs?

Ah no, sorry. You are talking about pleiotropy, aren’t you? If you want me to answer that you need to specify what the hill represents here and what the different functions are.
Corneel on June 26, 2018 at 8:59 am said:

colewd: When you duplicate a gene get a small modification and catalyze a different reaction the majority of the source of the FI is the original protein.

EXACTLY!

So a protein sequence that starts of at a high level of functional complexity is introduced into a novel function. This effectively collapses the 500-bit limit rule, which does not allow for that sort of thing. Instead, it relies on the assumption that for every novel function a new search needs to be initiated from scratch, which is clearly false.
Corneel on June 26, 2018 at 9:08 am said:

Rumraket: Mung: You don’t get the FI of the paper by adding up the individual FI for each sentence in the paper. To get the FI of the paper you’d have to define the “function” of the paper, not by adding up the FI for the “function” of each of it’s parts.

Rum: I think you do get the FI of the paper by adding up the FI for each functional element of it. At least it’d come out the same as if you just set the function for the entire paper and tried to calculate it’s FI.

When analogies cease to enlighten, but become sources of confusion themselves, it is time to drop them.

Either you both specify which function you believe you are discussing, or you seek out a biological example. If you choose the latter, I suggest you follow Joe’s advice and adopt fitness as your function.
dazz on June 26, 2018 at 9:58 am said:

Corneel: EXACTLY!

So a protein sequence that starts of at a high level of functional complexity is introduced into a novel function. This effectively collapses the 500-bit limit rule, which does not allow for that sort of thing. Instead, it relies on the assumption that for every novel function a new search needs to be initiated from scratch, which is clearly false.

Gene duplication doesn’t poof functionality out of nowhere the way jeebus does it, so it’s never going to be a satisfactory explanation for Billy and his ilk
Mung on June 26, 2018 at 3:43 pm said:

Rumraket: You do, for the 2nd function.

No you don’t. Don’t you read what you write? Haven’t you already stated that the ancestral sequence had some degree of function for LDH?

Alternatively the LDH function has not yet evolved so the FI is infinite. Either way it’s not zero to begin with.
Mung on June 26, 2018 at 3:44 pm said:

dazz: Gene duplication doesn’t poof functionality out of nowhere…

That’s right. Gene duplication just duplicates a gene.
Mung on June 26, 2018 at 3:49 pm said:

Corneel: When analogies cease to enlighten, but become sources of confusion themselves, it is time to drop them.

But the analogy was enlightening. 🙂

Rumraket thinks that you can take a book or paper and divide it into parts, assign different functions to each part, calculate the FI for each function/part and add them all up, and that gives you the FI of the book or paper.

I now know that Rumraket is clueless when it comes to FI.
Mung on June 26, 2018 at 4:01 pm said:

Rumraket: I don’t see what is wrong with that actually. It seems intuitively agreeable to me that a function that literally requires more genes also “has more” FI.

And this is how intuition can mislead you. The number of genes is irrelevant because there is no mapping or encoding from number of genes to number of sequences that can perform the function at or above the threshold of function relative to the number of possible sequences, and that is what matters.

Seriously man, stop arguing with me for a minute and think.
Mung on June 26, 2018 at 4:07 pm said:

Corneel: I suggest you follow Joe’s advice and adopt fitness as your function.

Since Hazen and Szostak. don’t do that I don’t feel the need to do it either. And Rimraket’s argument (and Joe’s too for that matter) depends on having distinct functions.
Corneel on June 26, 2018 at 4:10 pm said:

Mung: Rumraket thinks that you can take a book or paper and divide it into parts, assign different functions to each part, calculate the FI for each function/part and add them all up, and that gives you the FI of the book or paper.

…which is correct if you sum exact matches to several target sequences. This is what Rum showcased in his example, and it correctly mimics the gpuccio-style approach.

Whereas if you try to sum the FI for a dozen different molecular functions this doesn’t fly, which is what you are getting at.

So are you both on the same page? (pun totally intended!)
Corneel on June 26, 2018 at 4:25 pm said:

Mung: The number of genes is irrelevant because there is no mapping or encoding from number of genes to number of sequences that can perform the function at or above the threshold of function relative to the number of possible sequences, and that is what matters.

Why should we restrict ourselves to sequences? Hazen et al. specify a system that can assume a number of different configurations. Once we have a segmental duplication we have two alleles in the population: One carrying the duplicate gene and the other lacking it. These can become part of different multi-locus genotypes. Joe has put up a lovely example of that.

Mung: Since Hazen and Szostak. don’t do that I don’t feel the need to do it either. And Rimraket’s argument (and Joe’s too for that matter) depends on having distinct functions.

Not true, increases in both functions are assumed to have adaptive value.
Mung on June 26, 2018 at 4:27 pm said:

Corneel: …which is correct if you sum exact matches to several target sequences.

But that’s a pretty big if. Would the FI of the book or paper go down if for some parts of the book more than one sequence meets or exceeds the threshold of function for that part of the book?

It really doesn’t matter, because his approach is all wrong. If you want to calculate the FI for some system you have to first define the function of that system, not the function of each of its constituent parts. Joe makes the same mistake.

What is the function of a genome? What is the threshold of function of a genome? How many different sequences can there be fore a genome?
Mung on June 26, 2018 at 4:36 pm said:

Joe Felsenstein: I disagree with Mung about whether FI measures information, and I disagree with Hazen about whether it measures “complexity”.

A complexity metric is of little utility unless its conceptual framework and predictive power result in a deeper understanding of the behavior of complex systems.

Functional information provides a measure of complexity by quantifying the probability that an arbitrary configuration of a system of numerous interacting agents (and hence a combinatorially large number of different configurations) will achieve a specified degree of function.

It is important to emphasize that functional information, unlike previous complexity measures, is based on a statistical property of an entire system of numerous agent configurations (e.g., sequences of letters, RNA oligonucleotides, or a collection of sand grains) with respect to a specific function.

Functional Information as a Measure of System Complexity

etc. etc.

http://www.pnas.org/content/104/suppl_1/8574
Rumraket on June 26, 2018 at 5:08 pm said:

Mung: Mung: Now if you started with 0 bits of FI and ended up with 1400 bits of FI, that’s something worth discussing.

Rumraket: You do, for the 2nd function.

Mung: No you don’t.

Yes I do. When the 2nd function has evolved, you calculate FI for it. You don’t calculate FI for a function that isn’t there.

I mean unless you’re an idiot, but you can do what you want. Just be aware that if you do that, you’re not doing something that makes sense.

It’s like trying to measure how pleasing a painting yet to be painted makes random people in the street feel. You can “act out” taking that poll, but “there’s no there, there”.

Don’t you read what you write?

Yeah I was right then, I am still right now.

Haven’t you already stated that the ancestral sequence had some degree of function for LDH?

Have I? Where?

Alternatively the LDH function has not yet evolved so the FI is infinite. Either way it’s not zero to begin with.

That’s what you get when you have 0 in the denominator of a fraction. That doesn’t mean the absense of function has the highest possible amount of FI.

If you disagree, then something completely absurd follows. Like if evolution fails to evolve a particular function (no sequences have evolved that meet the minimum threshold), then an infinite amount of FI has evolved. So any mutation that doesn’t result in a new function, creates an infinite amount of FI.

How many sequences meet the minimum threshold? None. So zero.
How long are those sequences? Well they don’t exist, so 0.
What is the alphabet size of those sequences? Well they don’t exist, so 0.
Not that it matters how long they are or how big the alphabet size is, any number you plug in will still get you 0 when you have zero sequences that meet the minimum threshold.
-log2(0/0^0) = ∞

Let me repeat myself: If there are no sequences that have the function (meet the minimum threshold), you end up with
-log2[F(Ex)] = infinity. So the highest possible amount of FI is when the function is absent. Yeah, I’m going to assume that means we aren’t supposed to calculate FI for absent functions. Call me crazy.

And here you are, insisting we both can (as in we can plug in the numbers) and should calculate FI for absent functions.
Rumraket on June 26, 2018 at 5:20 pm said:

Mung: Rumraket thinks that you can take a book or paper and divide it into parts, assign different functions to each part, calculate the FI for each function/part and add them all up, and that gives you the FI of the book or paper.

Yeah and you have not argued why not. The individual functions contribute to the function of the entity as a whole. They each play some role in it. For books or for organisms. I don’t see what’s wrong with that.

I now know that Rumraket is clueless when it comes to FI.

Thank you for that mere commentary but I think you need to do something more convincing than just declare it.