Does gpuccio’s argument that 500 bits of Functional Information implies Design work?

Posted on May 21, 2018 by Joe Felsenstein

On Uncommon Descent, poster gpuccio has been discussing “functional information”. Most of gpuccio’s argument is a conventional “islands of function” argument. Not being very knowledgeable about biochemistry, I’ll happily leave that argument to others.

But I have been intrigued by gpuccio’s use of Functional Information, in particular gpuccio’s assertion that if we observe 500 bits of it, that this is a reliable indicator of Design, as here, about at the 11th sentence of point (a):

… the idea is that if we observe any object that exhibits complex functional information (for example, more than 500 bits of functional information ) for an explicitly defined function (whatever it is) we can safely infer design.

I wonder how this general method works. As far as I can see, it doesn’t work. There would be seem to be three possible ways of arguing for it, and in the end; two don’t work and one is just plain silly. Which of these is the basis for gpuccio’s statement? Let’s investigate …

A quick summary

Let me list the three ways, briefly.

(1) The first is the argument using William Dembski’s (2002) Law of Conservation of Complex Specified Information. I have argued (2007) that this is formulated in such a way as to compare apples to oranges, and thus is not able to reject normal evolutionary processes as explanations for the “complex” functional information. In any case, I see little sign that gpuccio is using the LCCSI.

(2) The second is the argument that the functional information indicates that only an extremely small fraction of genotypes have the desired function, and the rest are all alike in totally lacking any of this function. This would prevent natural selection from following any path of increasing fitness to the function, and the rareness of the genotypes that have nonzero function would prevent mutational processes from finding them. This is, as far as I can tell, gpuccio’s islands-of-function argument. If such cases can be found, then explaining them by natural evolutionary processes would indeed be difficult. That is gpuccio’s main argument, and I leave it to others to argue with its application in the cases where gpuccio uses it. I am concerned here, not with the islands-of-function argument itself, but with whether the design inference from 500 bits of functional information is generally valid.

We are asking here whether, in general, observation of more than 500 bits of functional information is “a reliable indicator of design”. And gpuccio’s definition of functional information is not confined to cases of islands of function, but also includes cases where there would be a path to along which function increases. In such cases, seeing 500 bits of functional information, we cannot conclude from this that it is extremely unlikely to have arisen by normal evolutionary processes. So the general rule that gpuccio gives fails, as it is not reliable.

(3) The third possibility is an additional condition that is added to the design inference. It simply declares that unless the set of genotypes is effectively unreachable by normal evolutionary processes, we don’t call the pattern “complex functional information”. It does not simply define “complex functional information” as a case where we can define a level of function that makes probability of the set less than $2^{-500}$ . That additional condition allows us to safely conclude that normal evolutionary forces can be dismissed — by definition. But it leaves the reader to do the heavy lifting, as the reader has to determine that the set of genotypes has an extremely low probability of being reached. And once they have done that, they will find that the additional step of concluding that the genotypes have “complex functional information” adds nothing to our knowledge. CFI becomes a useless add-on that sounds deep and mysterious but actually tells you nothing except what you already know. So CFI becomes useless. And there seems to be some indication that gpuccio does use this additional condition.

Let us go over these three possibilities in some detail. First, what is the connection of gpuccio’s “functional information” to Jack Szostak’s quantity of the same name?

Is gpuccio’s Functional Information the same as Szostak’s Functional Information?

gpuccio acknowledges that gpuccio’s definition of Functional Information is closely connected to Jack Szostak’s definition of it. gpuccio notes here:

Please, not[e] the definition of functional information as:

“the fraction of all possible configurations of the system that possess a degree of function >=
Ex.”

which is identical to my definition, in particular my definition of functional information as the
upper tail of the observed function, that was so much criticized by DNA_Jock.

(I have corrected gpuccio’s typo of “not” to “note”, JF)

We shall see later that there may be some ways in which gpuccio’s definition
is modified from Szostak’s. Jack Szostak and his co-authors never attempted any use of his definition to infer Design. Nor did Leslie Orgel, whose Specified Information (in his 1973 book The Origins of Life) preceded Szostak’s. So the part about design inference must come from somewhere else.

gpuccio seems to be making one of three possible arguments;

Possibility #1 That there is some mathematical theorem that proves that ordinary evolutionary processes cannot result in an adaptation that has 500 bits of Functional Information.

Use of such a theorem was attempted by William Dembski, his Law of Conservation of Complex Specified Information, explained in Dembski’s book No Free Lunch: Why Specified Complexity Cannot Be Purchased without Intelligence (2001). But Dembski’s LCCSI theorem did not do what Dembski needed it to do. I have explained why in my own article on Dembski’s arguments (here). Dembski’s LCCSI changed the specification before and after evolutionary processes, and so he was comparing apples to oranges.

In any case, as far as I can see gpuccio has not attempted to derive gpuccio’s argument from Dembski’s, and gpuccio has not directly invoked the LCCSI, or provided a theorem to replace it. gpuccio said in a response to a comment of mine at TSZ,

Look, I will not enter the specifics of your criticism to Dembski. I agre with Dembski in most things, but not in all, and my arguments are however more focused on empirical science and in particular biology.

While thus disclaiming that the argument is Dembski’s, on the other hand gpuccio does associate the argument with Dembski here by saying that

Of course, Dembski, Abel, Durston and many others are the absolute references for any discussion about functional information. I think and hope that my ideas are absolutely derived from theirs. My only purpose is to detail some aspects of the problem.

and by saying elsewhere that

No generation of more than 500 bits has ever been observed to arise in a non design system (as you know, this is the fundamental idea in ID).

That figure being Dembski’s, this leaves it unclear whether gpuccio is or is not basing the argument on Dembski’s. But gpuccio does not directly invoke the LCCSI, or try to come up with some mathematical theorem that replaces it.

So possibility #1 can be safely ruled out.

Possibility #2. That the target region in the computation of Functional Information consists of all of the sequences that have nonzero function, while all other sequences have zero function. As there is no function elsewhere, natural selection for this function then cannot favor sequences closer and closer to the target region.

Such cases are possible, and usually gpuccio is talking about cases like this. But gpuccio does not require them in order to have Functional Information. gpuccio does not rule out that the region could be defined by a high level of function, with lower levels of function in sequences outside of the region, so that there could be paths allowing evolution to reach the target region of sequences.

An example in which gpuccio recognizes that lower levels of function can exist outside the target region is found here, where gpuccio is discussing natural and artificial selection:

Then you can ask: why have I spent a lot of time discussing how NS (and AS) can in some cases add some functional information to a sequence (see my posts #284, #285 and #287)

There is a very good reason for that, IMO.

I am arguing that:

1) It is possible for NS to add some functional information to a sequence, in a few very specific cases, but:

2) Those cases are extremely rare exceptions, with very specific features, and:

3) If we understand well what are the feature that allow, in those exceptional cases, those limited “successes” of NS, we can easily demonstrate that:

4) Because of those same features that allow the intervention of NS, those scenarios can never, never be steps to complex functional information.

Jack Szostak defined functional information by having us define a cutoff level of function to define a set of sequences that had function greater than that, without any condition that the other sequences had zero function. Neither did Durston. And as we’ve seen gpuccio associates his argument with theirs.

So this second possibility could not be the source of gpuccio’s general assertion about 500 bits of functional information being a reliable indicator of design, however much gpuccio concentrates on such cases.

Possibility #3. That there is an additional condition in gpuccio’s Functional Information, one that does not allow us to declare it to be present if there is a way for evolutionary processes to achieve that high a level of function. In short, if we see 500 bits of Szostak’s functional information, and if it can be put into the genome by natural evolutionary processes such as natural selection then for that reason we declare that it is not really Functional Information. If gpuccio is doing this, then gpuccio’s Functional Information is really a very different animal than Szostak’s functional information.

Is gpuccio doing that? gpuccio does associate his argument with William Dembski’s, at least in some of his statements. And William Dembski has defined his Complex Specified Information in this way, adding the condition that it is not really CSI unless it is sufficiently improbable that it be achieved by natural evolutionary forces (see my discussion of this here in the section on “Dembski’s revised CSI argument” that refer to Dembski’s statements here). And Dembski’s added condition renders use of his CSI a useless afterthought to the design inference.

gpuccio does seem to be making a similar condition. Dembski’s added condition comes in via the calculation of the “probability” of each genotype. In Szostak’s definition, the probabilities of sequences are simply their frequencies among all possible sequences, with each being counted equally. In Dembski’s CSI calculation, we are instead supposed to compute the probability of the sequence given all evolutionary processes, including natural selection.

gpuccio has a similar condition in the requirements for concluding that complex
functional information is present: We can see it at step (6) here:

If our conclusion is yes, we must still do one thing. We observe carefully the object and what we know of the system, and we ask if there is any known and credible algorithmic explanation of the sequence in that system. Usually, that is easily done by excluding regularity, which is easily done for functional specification. However, as in the particular case of functional proteins a special algorithm has been proposed, neo darwininism, which is intended to explain non regular functional sequences by a mix of chance and regularity, for this special case we must show that such an explanation is not credible, and that it is not supported by facts. That is a part which I have not yet discussed in detail here. The necessity part of the algorithm (NS) is not analyzed by dFSCI alone, but by other approaches and considerations. dFSCI is essential to evaluate the random part of the algorithm (RV). However, the short conclusion is that neo darwinism is not a known and credible algorithm which can explain the origin of even one protein superfamily. It is neither known nor credible. And I am not aware of any other algorithm ever proposed to explain (without design) the origin of functional, non regular sequences.

In other words, you, the user of the concept, are on your own. You have to rule out that natural selection (and other evolutionary processes) could reach the target sequences. And once you have ruled it out, you have no real need for the declaration that complex functional information is present.

I have gone on long enough. I conclude that the rule that observation of 500 bits of functional information is present allows us to conclude in favor of Design (or at any rate, to rule out normal evolutionary processes as the source of the adaptation) is simply nonexistent. Or if it does exist, it is as a useless add-on to an argument that draws that conclusion for some other reason, leaving the really hard work to the user.

Let’s end by asking gpuccio some questions:
1. Is your “functional information” the same as Szostak’s?
2. Or does it add the requirement that there be no function in sequences that
are outside of the target set?
3. Does it also require us to compute the probability that the sequence arises as a result of normal evolutionary processes?

1,971 thoughts on “Does gpuccio’s argument that 500 bits of Functional Information implies Design work?”

Rumraket on June 21, 2018 at 8:06 am said:

colewd: If 50 AA can tolerate lots of AA acid substitutions then the base is much wider.

That doesn’t make sense. You can’t establish the total size of the hill near the base simply from sequences that sit at the top. You can’t derive the width (or height) of the Mt. Everest simply by being given a picture of the very top.

Look again at the drawing in my previous post. If the sequences we have all sit at the top because those sequences with mutations in them have lower fitness and are selected against, then merely seeing that the sequences are clustered together and are “conserved” doesn’t tell us how broad the base is, it could be a tiny peak with a huge base.

Here’s a new picture below. Only if the “hill” looks like A will the extant sequences (red dots at the top) be able to tell us how broad the hill is. For the rest of these hill shapes, purifying selection will be biasing mutants towards the uppermost part of the hills.

I submit that the hill represented by A is physically unrealistic and there’s zero reason to think a protein with a fitness landscape correctly represented by such a hill exists in reality.
All experiments and all knowledge we have of protein biochemistry tells us that their functions come in degrees that go from very low levels of activity to much higher, which implies hills in the fitness landscape of protein functions have the shape of actual hills or mountains (mostly like C and D), they have broad bases, smooth or curved slopes, and a much narrower top.

The hill simply represents the quantity of functional sequences that are similar.

It doesn’t represent just that. The height and slope of the hill represents differences in fitness between different sequences. The higher up on the hill, the higher fitness the sequences have. That would explain why we don’t get information about the slope or base of the hill simply by looking at extant sequences, because when we move away from the peak, we are moving downhill which means we move into an area of lower fitness. Purifying selection would be constantly in operation biasing against that.

This is essentially what you’re saying too, except that you seem to imply that any movement away from the top of the hill constitutes a drop down into outright lethality or zero function, instead of a gradual decrease in function (and thus fitness of carriers).
Rumraket on June 21, 2018 at 8:14 am said:

colewd: Has it occurred to you that it is because I am dealing with lots of unsupported claims?

Nobody except you have been making unsupported claims. We are correcting errors in reasoning you make. We aren’t claiming to know for a fact how the fitness landscape of F-type ATP synthase subunit beta looks, we are trying to get YOU to understand that you don’t either and you can’t derive it’s shape by just seeing that we have a relatively well-conserved collection of sequences in extant life.

Getting you to understand that you’re making an unwarranted extrapolation is not making unsupported claims, it is us attempting to get YOU to stop doing that very thing.

If you can show evidence of an exception to simply pushing against unsupported claims I will accept your criticism.

And once again I must point out that you are revealing a double standard. You want evidence to falsify an extrapolation you have in your mind, an extrapolation that isn’t actually based on evidence, just pure assumption.

So you want us to give evidence and experiments, but for your own part you can just assume and extrapolate anything you want and we must then falsify experimentally your assumptions.

Sorry Bill, but you’re going to have to stop drawing conclusions the data cannot support. And the data we have cannot suppor the inference that extant protein sequences have no slope for selection to climb towards extant functions. You simply can’t claim to know this just by observing a collection of conserved sequences. It is not even remotely implied. Look at the drawings – think!
Rumraket on June 21, 2018 at 8:18 am said:

colewd:
DNA_Jock,

Why would it not tell us that the hill is small therefore the base is small?

Look at the drawings Bill. How can you know what the total size, or the shape of the hill is, just by observing sequences that could be sitting just a the top? We have no way of ruling out (without conducting actual experiments) that we are seeing sequences that just sit on a local optimum.

If the sequences we see sit at the top of a hill like C or D in my last drawing, then you don’t have the kind of information that would tell you how far out the base extends, or how steep the slope leading up is.
Rumraket on June 21, 2018 at 8:22 am said:

Mung: Rumraket: Here the double standard is revealed once again. To Bill Cole it is entirely acceptable that “Design” can operate exclusively by “inference to the best explanation”, but evolution must provide “demonstration”.

Mung Then how do you explain the following?

colewd: Once you buy into this evolutionary inference then you try and jam all square pegs into all the round holes.

Bill understands that we too are some times making inferences, but he wants experimental demonstrations from us instead. But he’s okay with himself doing inferences only. Hence he has a double standard.
Corneel on June 21, 2018 at 8:29 am said:

Mung: Would I say it if it were not true?

Good point

Mung: I certainly believe it is true. And Joe isn’t claiming that evolution can create FI.

Semantics, semantics

Natural selection can make a population acquire a system configuration that displays a degree of function belonging to a set that needs 500 bits of functional information more to be specified than the set to which its original configuration belonged.

You are right: Much better!
Corneel on June 21, 2018 at 9:31 am said:

colewd: There is a difference between the WNT signaling pathway and the WNT ligand. BIG DIFFERENCE.

Yeah. A BIG BIG DIFFERENCE. Yet you stated that part of function of the Wnt and Shh ligands was to initiate transcription. Excuse me? How are these molecules going to pull that off without binding the DNA? Do all homologs of WNT1 and SHH have DNA binding activity? I think not.

This is your confusion, not mine. You are mixing up biological levels . Molecular function (protein binding, DNA binding) is not the same as a pathway function (signaling), and a pathway function is not necessarily tied up with some fixed higher level biological function (morphogenesis, regulating cell proliferation).

colewd: Me: I’d say exactly zero, because this function didn’t change

Why do you believe this?

Because, to the best of my knowledge, all the past and present homologs of the ligands you mentioned could fulfill the function “receptor binding”. Since your function is binary all-or-nothing, there is no way to increase beyond this level. This is the consequence of you shifting the goal posts. There has hardly been any change in the function of the individual conserved components of the Wnt and hedgehog signaling pathways, but what did change is regulation: where are they expressed, when are they expressed, and what transcriptional targets do they control?

colewd: Is there an explanation how randomly changing DNA of one gene changes the other or is this an evolutionary inference 🙂

This follows from taking a biochemical view of receptor-ligand binding instead of your lock-and-key view. That has been explained to you several times before.

colewd: So you start with coevolution and now here is the simplified WNT Sonic pathways.https://goo.gl/images/X1LzS3

AND Notch signaling as well, I see. Did you know Notch ALSO was first discovered in Drosophila, just like Wingless and hedgehog? Can you guess what function it was found doing first? Look closely at the wingtips.

Co-option baby.
Rumraket on June 21, 2018 at 11:03 am said:

colewd: Is there an explanation how randomly changing DNA of one gene changes the other or is this an evolutionary inference

Intermolecular epistasis shaped the function and evolution of an ancient transcription factor and its DNA binding sites.

“Abstract
Complexes of specifically interacting molecules, such as transcription factor proteins (TFs) and the DNA response elements (REs) they recognize, control most biological processes, but little is known concerning the functional and evolutionary effects of epistatic interactions across molecular interfaces. We experimentally characterized all combinations of genotypes in the joint protein-DNA sequence space defined by an historical transition in TF-RE specificity that occurred some 500 million years ago in the DNA-binding domain of an ancient steroid hormone receptor. We found that rampant epistasis within and between the two molecules was essential to specific TF-RE recognition and to the evolution of a novel TF-RE complex with unique derived specificity. Permissive and restrictive epistatic mutations across the TF-RE interface opened and closed potential evolutionary paths accessible by the other, making the evolution of each molecule contingent on its partner’s history and allowing a molecular complex with novel specificity to evolve.

From the conclusion:
“Epistasis and the evolution of molecular complexes
Complexity and interdependence are often thought to act as constraints on evolution (Lewontin, 1984; Bonner, 1988; Kauffman, 1993; Wagner and Altenberg, 1996; Orr, 2000; Schank and Wimsatt, 2001). Previous studies of epistasis within a molecule have shown that interactions among mutations constrict the number of passable evolutionary trajectories through sequence space and make the outcomes of evolution contingent upon the prior occurrence of permissive mutations (Weinreich et al., 2006; Bridgham et al., 2009; Podgornaia and Laub, 2015). The joint sequence space of multiple interacting molecules contains more dimensions and therefore far more paths between functional joint genotypes than does either molecule’s separate sequence space (Gavrilets, 2004). Our work shows that intermolecular epistasis is indeed rampant and blocks many of these additional pathways.

But intermolecular epistasis also has an opposite effect, which introduces new degrees of freedom into the evolutionary process. Permissive epistatic mutations across the molecular interface open paths for the other molecule that would otherwise have been blocked. As a result, the absolute number of trajectories (and ultimate endpoints) available to either partner is larger than it would appear to be if only the ‘slice’ or profile through the joint sequence space represented by variability in a single molecule were considered. Under the scenario we examined, for example, an ERE regulated by an immutable AncSR1-DBD could not reach SRE without losing functionality, and AncSR1 could not reach AncSR1+RH if it regulated only immutable binding sites. But because amino acid replacements in the TF are permissive for mutations in the RE, and subsequent RE mutations are permissive for replacements in the protein, both members of the complex can explore regions of their own sequence space and reach genotypes that were not previously available to either.

Molecular complexes are pervasive in biology. The extensive intermolecular epistasis that we observed among a small number of sites in a simple binary complex suggests that the reach of intermolecular epistasis may be vast. If the structure of that epistasis is anything like that present in the AncSR1-RE complex, then the web of contingent events that structures evolutionary trajectories is likely to be dense, with the evolutionary potential of any one molecule depending on prior events in other molecules. But if some of those events are permissive, as our results suggest they are likely to be, then the potential of evolution to generate new functional complexes by the combined action of mutation, drift and selection is greater than it may appear if the molecules are viewed only in isolation. Complexity and interdependence not only constrain evolution; they can also buy freedom for the parts of a system to reach new states, if the historical events are right.
My bolds.
DNA_Jock on June 21, 2018 at 11:37 am said:

Bill,
Please look long and hard at Rumraket’s diagrams. He is trying to explain to you, with helpful graphics, what all of us have been trying to explain to you in words on this thread.
To avoid potential confusion I will note that, when I talk about “holes” rather than hills, and when Entropy talks about “dancing around a local minimum”, we have inverted the landscape, such that “more fit” is DOWN, instead of up. We do this to make it more intuitive that sequences will congregate around a local optimum; the arguments are unchanged.
Understand that gpuccio is assuming that all hills have the shape “A” above, yet all the data that we have argues that hills have the shape “D”. You cannot say jack about the base by merely assessing the peak.
ETA:off topic, more flagrant abuse of site rules by DNA_Jock, changing <b> tags to <strong> in Rumraket’s post. Because I can.
Rumraket on June 21, 2018 at 11:56 am said:

What’s the difference between strong and bold anyway?
DNA_Jock on June 21, 2018 at 12:11 pm said:

Rumraket,
In my hands on this site, bold tagging has no effect, whereas strong tagging works.
[The word ‘bold‘ is flanked by <b> tags…]
Rumraket on June 21, 2018 at 12:14 pm said:

DNA_Jock:
Rumraket,
In my hands on this site, bold tagging has no effect, whereas strong tagging works.
[The word ‘bold‘ is flanked by <b> tags…]

How odd, to me they look identical as both just make thicker letters.

Test. Bolded.
Test. Strong.
Joe Felsenstein on June 21, 2018 at 12:56 pm said:

In the discussion of shapes of peaks, I think colewd is assuming that one can go only a small distance (accumulating one or two mutations) before the “function” drops to all the way to zero.

Even if true, that of course leaves us still not knowing whether there might be other such peaks elsewhere. But gpuccio and colewd seem somehow to think that knowing the narrowness of the one peak shows us that there can’t be any others.
Joe Felsenstein on June 21, 2018 at 1:20 pm said:

On the issue of whether FI can increase when the frequency of an allele changes during the process of natural selection (*) let me simplify the issue, as one can argue that my formulas for change in FI should be done differently.

The issue is not whether FI typically increases, or always increases, but whether it can ever increase — is there some reason it can’t? In the case where higher “function” means higher fitness, it can.

In Szostak and Hazen’s method, we calculate FI using the set of all possible sequences, counted equally. Each has some value of function. We see a sequence with a given value of function. $f$ . How much FI does it have? we set $f$ as the threshold value, and figure out what fraction $P$ of all sequences have function greater than or equal to $f$ . The FI for that one sequence is then $-\log_2(P)$ .

If we have a population whose genes have two different sequences, we can calculate FI for each of the two sequences, by using for each the value of FI calculated above. The calculation uses the full set of all possible sequences, not just the sequences in the particular population. Whichever sequence has the higher function will then have the higher FI.

If the higher function goes along with higher fitness, the change of gene frequency as the allele with the higher FI increases then increases the average FI of the sequences in the population. Hence Mung’s argument that increasing gene frequency of that sequence means lower FI is incorrect.

Anyone who thinks that there is some general principle that FI cannot increase as a result of natural evolutionary processes only has to consider simple cases like this to see that there is no such principle.

——–
(*) I am trying to avoid all sorts of words that will set off the Definition Police, but probably “process” will do that too. I intend to ignore their objections and stick to the issue of change in FI. It does not matter for my argument here whether natural selection “is” a “force” or whether it “causes” anything.
phoodoo on June 21, 2018 at 2:29 pm said:

dazz: I swear if you were any more stupid you’d have to be watered every other day

Jock is very busy breaking other rules.
Mung on June 21, 2018 at 3:25 pm said:

Joe Felsenstein: How much FI does it have?

None. Let’s start with that.

Perhaps you can refer me to where Hazen/Szostak ever calculate the FI for an individual sequence.

Joe Felsenstein: Whichever sequence has the higher function will then have the higher FI.

FI is not a measure of how much or how little “function” a sequence has.

Does FYRE ON MAIN have more or less function than FIRE ON MANE?
Mung on June 21, 2018 at 3:40 pm said:

Joe Felsenstein: If you have a single mutant, and it has a higher level of function, and because of that a higher fitness, then as it increases in gene frequency in the population the Functional Information of the population increases.

A single mutant what, and what is the function that it has a higher level of?

If you say that the FI of the population increases then you must say what the function of the population is.

Don’t you agree that if you are going to calculate the FI that you must first define the function in question?
Mung on June 21, 2018 at 3:45 pm said:

Functional Information as a Measure of System Complexity

– Szostak et al.

Functional information is defined only in the context of a specific function x.

– Szostak et al.
DNA_Jock on June 21, 2018 at 3:50 pm said:

Mung: Perhaps you can refer me to where Hazen/Szostak ever calculate the FI for an individual sequence.

page 3 of Hazen et al 2007, where they say “To determine the functional information of any particular sequence we must specify three parameters:”.
They are clearly calculating the I(Ex) for a particular sequence, with the context of the sequence space (of 26^n)
DNA_Jock on June 21, 2018 at 3:52 pm said:

phoodoo,
Good point, phoodoo.

dazz: more content and less insult, please.
Rumraket on June 21, 2018 at 4:16 pm said:

Mung: Perhaps you can refer me to where Hazen/Szostak ever calculate the FI for an individual sequence.

“Consider, for example, sequences of 10 letters that have a high probability (Ex ≅ 1) of evoking a positive response from the fire department. Such sequences might include “FIREONMAIN,” “MAINSTFIRE,” or “MAPLENMAIN.” Additionally, some messages containing phonetic misspellings (FYRE or MANE), mistakes in grammar or usage (FIREOFMAIN), or typing errors (MAZLE or NAPLE) may also yield a significant but lower probability of response (0 ≪ Ex < 1). Given these variants, on the order of 1,000 combinations of 10 letters might initiate a rapid response to the approximate location of the fire. Thus
I(1) ≈ -log2[1000/(26^10)] ≈ 36 bits."

So for 1 single sequence of L = 10, it is just
I(1) ≈ -log2[1/(26^10)] ≈ 47 bits.

There’s absolutely nothing preventing us from calculating the FI of a single sequence if we assume Ex is constrained such that only a single perfect sequence has the desired function (like a password). It just done as -log2 of 1(that one sequence) divided by sequence space.
Mung on June 21, 2018 at 4:20 pm said:

DNA_Jock: dazz: more content and less insult, please.

You might as well ask him to just stop posting.
Mung on June 21, 2018 at 4:29 pm said:

Rumraket: There’s absolutely nothing preventing us from calculating the FI of a single sequence…

Sure there is. You’re not calculating the FI of the sequence. You’re calculating the FI of a system where only one sequence meets or exceeds the threshold.

We conclude that rigorous analysis of the functional information of a system with respect to a specified function x requires knowledge of two attributes: (i) all possible configurations of the system (e.g., all possible sequences of a given length in the case of letters or RNA nucleotides) and (ii) the degree of function x for every configuration.

Sorry, but you guys just don’t understand FI, or you’re talking about something else entirely. You’re doing the same thing you’re accusing gpuccio of. Call it TSZFI.
dazz on June 21, 2018 at 4:57 pm said:

Mung: Sure there is. You’re not calculating the FI of the sequence. You’re calculating the FI of a system where only one sequence meets or exceeds the threshold.

Not at all, it’s you who doesn’t understand FI. Using the level of function of a single sequence to set the threshold doesn’t imply that only that sequence meets the threshold.
colewd on June 21, 2018 at 5:15 pm said:

Joe Felsenstein,

In the discussion of shapes of peaks, I think colewd is assuming that one can go only a small distance (accumulating one or two mutations) before the “function” drops to all the way to zero.

With 2000 protein superfamilies that perform different functions and many times multiple functions is the island, hill and peak concept a reasonable way to describe even more then a few of them?

From a historical perspective we can measure preservation. It is an indirect measurement yet it captures a very profound observation. How much flexibility does the observed protein have to change sequences over time given reproductive variation?

If the answer is small that tells us that we are observing a sequence that is profoundly functionally relevant. As we observe more of these the design inference stands out.
DNA_Jock on June 21, 2018 at 5:17 pm said:

Mung,

I see the source of your error.
In this quote

: We conclude that rigorous analysis of the functional information of a system with respect to a specified function x requires knowledge of two attributes: (i) all possible configurations of the system (e.g., all possible sequences of a given length in the case of letters or RNA nucleotides) and (ii) the degree of function x for every configuration.

The word ‘system’ is used to refer to an element within sequence space, NOT the entire ensemble. They used this somewhat confusing turn of phrase because they want to generalize their metric beyond simple linear sequences.
An individual element can have a level of FI.
What that level is depends on the activity level of the entire set, same as if we were talking about what growth percentile your kid is in.
colewd on June 21, 2018 at 5:26 pm said:

Corneel,

This follows from taking a biochemical view of receptor-ligand binding instead of your lock-and-key view. That has been explained to you several times before.

Did you know that if a single OH is missing from a small molecule in the cells nucleus that transcription of hundreds of relevant molecules will not occur?
Rumraket on June 21, 2018 at 5:54 pm said:

colewd: Did you know that if a single OH is missing from a small molecule in the cells nucleus that transcription of hundreds of relevant molecules will ~~not~~ occur at a lower level?

Fify.

Did you know that highly specific regulatory proteins that bind to DNA and initiate transcription, still binds completely random and nonfunctional DNA at low but measurable levels, and initiates transcription regardless?

http://www.homolog.us/blogs/blog/2013/07/17/random-dna-sequence-mimics-encode/
colewd on June 21, 2018 at 6:34 pm said:

Rumraket,

Did you know that highly specific regulatory proteins that bind to DNA and initiate transcription, still binds completely random and nonfunctional DNA at low but measurable levels, and initiates transcription regardless?

In the case I am describing real biological regulation that does not occur with a missing OH from a small molecule.
Rumraket on June 21, 2018 at 6:58 pm said:

colewd: In the case I am describing real biological regulation that does not occur with a missing OH from a small molecule.

Is there a particular example you are thinking of, or are you making a general statement about all transcription?
colewd on June 21, 2018 at 7:08 pm said:

Joe Felsenstein,

In Szostak and Hazen’s method, we calculate FI using the set of all possible sequences, counted equally. Each has some value of function. We see a sequence with a given value of function. f. How much FI does it have? we set f as the threshold value, and figure out what fraction P of all sequences have function greater than or equal to f. The FI for that one sequence is then -\log_2(P).

Given this methodology can you calculate the FI of the beta chain of the F1 ATP synthase subunit?
colewd on June 21, 2018 at 7:10 pm said:

Rumraket,

There is a very specific example but lets table for now as it could derail this very interesting thread.
Mung on June 21, 2018 at 7:36 pm said:

dazz: Using the level of function of a single sequence to set the threshold doesn’t imply that only that sequence meets the threshold.

Explain the “1” in the numerator of his equation dazz.

I(1) ≈ -log2[1/(26^10)] ≈ 47 bits.
DNA_Jock on June 21, 2018 at 7:42 pm said:

Mung,

That’s the global optimum sequence. It’s FI is equal to the information required to specify a single sequence. You will notice that this FI value depends only on the size of the sequence space.
All other sequences (in this sequence space) have lower FI.
It’s really pretty simple Mung.
colewd on June 21, 2018 at 7:54 pm said:

DNA_Jock,

That’s the global optimum sequence.

Could it be described as the only sequence that will perform the specified function?
Joe Felsenstein on June 21, 2018 at 8:18 pm said:

colewd: Given this methodology can you calculate the FI of the beta chain of the F1 ATP synthase subunit?

No, of course not, because I don’t know the amounts of function of all possible sequences.
Mung on June 21, 2018 at 8:19 pm said:

DNA_Jock: It’s really pretty simple Mung.

Then why are you all so confused?

Even his I(1) indicates how he is defining his degree of function. It has nothing to do with a global optimum. He is saying that no other sequences have the same degree of function. Yet in principle there is nothing to prohibit two different sequences from having the same degree of function.

Which has more FI, beer or bier?
Joe Felsenstein on June 21, 2018 at 8:22 pm said:

Mung: Yet in principle there is nothing to prohibit two different sequences from having the same degree of function.

Does that disturb you? Should it?
Mung on June 21, 2018 at 8:22 pm said:

Joe Felsenstein: No, of course not, because I don’t know the amounts of function of all possible sequences.

If the FI were a property of each individual sequence then you would not need to know the amounts of function of all possible sequences.
Mung on June 21, 2018 at 8:24 pm said:

Joe Felsenstein: Does that disturb you?

Terribly.
Mung on June 21, 2018 at 8:27 pm said:

colewd: Could it be described as the only sequence that will perform the specified function?

Wouldn’t make much sense to call it a global optimum, which at the very least implies less optimal function exists.
Mung on June 21, 2018 at 8:31 pm said:

DNA_Jock: All other sequences (in this sequence space) have lower FI.

What you ought to say is that all other sequences (in this sequence space) have lower degree of function.

You all still don’t understand the plane intersecting the cone at all.
Joe Felsenstein on June 21, 2018 at 8:43 pm said:

Mung: If the FI were a property of each individual sequence then you would not need to know the amounts of function of all possible sequences.

You would need to know what fraction of all possible sequences had function greater than or equal to that amount of function. That is most easily assessed if you know the amounts of function of all possible sequences.
Corneel on June 21, 2018 at 8:43 pm said:

Mung: colewd: Could it be described as the only sequence that will perform the specified function?

Wouldn’t make much sense to call it a global optimum, which at the very least implies less optimal function exists.

How, exactly, can a function be “less optimal”?

Please think about that and give your best answer.

Anyway, the correct answer is “only if all other sequences display zero function”, which implies an all-or-nothing function. However, the functions that Hazen et al. were thinking of were clearly continuous, hence other sequences can perform the specified function as well, albeit to a lower degree.
Corneel on June 21, 2018 at 9:03 pm said:

colewd: Given this methodology can you calculate the FI of the beta chain of the F1 ATP synthase subunit?

which one?
colewd on June 21, 2018 at 9:06 pm said:

Corneel,

Anyway, the correct answer is “only if all other sequences display zero function”, which implies an all-or-nothing function. However, the functions that Hazen et al. were thinking of were clearly continuous, hence other sequences can perform the specified function as well, albeit to a lower degree.

Sure, but at what point does it make sense to take a highly theoretical conversation and apply it to the real world?

If Joe can’t measure or observe natural selection creating functional information what good is his claim?
colewd on June 21, 2018 at 9:11 pm said:

Corneel,

Which one

The one that exists in humans with the similar one that exists in e coli.
Corneel on June 21, 2018 at 9:23 pm said:

colewd: If Joe can’t measure or observe natural selection creating functional information what good is his claim?

recommended substitutions:

Joe => gpuccio
natural selection => the Designer
Corneel on June 21, 2018 at 9:44 pm said:

colewd:The one that exists in humans with the similar one that exists in e coli.

The human copy of course, silly me.

Did you note that there are bacteria that have more similar sequences than Escherichia coli in Rumraket’s tree?
Rumraket on June 21, 2018 at 9:47 pm said:

Mung: What you ought to say is that all other sequences (in this sequence space) have lower degree of function.

If all we were interested in was degree of function then there’d be no reason to even have a measure of FI, since we already have measures of degree of function for these systems. For enzymes, which catalyze chemical reactions, we have kcat/km. For binding strength we have Kd. For ability to live and reproduce, we have selection coefficient S (fitness and similar measures). These are degrees of function measures that don’t consider information content.

But merely saying about an enzyme that it catalyzes substrate turnover at kcat/km = 1.8×10^12
doesn’t tell you anything about genetic information.

If you want to say that Hazen and Szostak’s is a bad or strange way of measuring the quantity of functional information, then fine. But then what other way can we do that? If you can’t think of a way of doing such a calculation, of “measuring” how much functional information some entity or system in question contains, then how are we to even make sense of the question “where did the information come from” that creationists like to ask? What are we then even trying to explain?

Does the genome contain information? Does the human genome contain more information than the E coli genome?

Does a protein of length 100 amino acids contain less information than a protein of length 2800 amino acids?
Mung on June 21, 2018 at 10:01 pm said:

Rumraket: …doesn’t tell you anything about genetic information.

Neither does FI. Didn’t you read Tom’s cautionary comments earlier in the thread?

FI does not mean “the information in a sequence that has a function,” but that’s how people are treating it.

If you want to say that Hazen and Szostak’s is a bad or strange way of measuring the quantity of functional information, then fine.

That’s exactly what I am say8ing. That is a misunderstanding and a misuse of FI. FI is not a measure of “functional information” and it does not measure “the quantity of functional information.”

That is something you are all tacking on that does not exist in Hazen/Szostak.

In Hazen/Szostak FI is “a measure of system complexity.”