Does gpuccio’s argument that 500 bits of Functional Information implies Design work?

Posted on May 21, 2018 by Joe Felsenstein

On Uncommon Descent, poster gpuccio has been discussing “functional information”. Most of gpuccio’s argument is a conventional “islands of function” argument. Not being very knowledgeable about biochemistry, I’ll happily leave that argument to others.

But I have been intrigued by gpuccio’s use of Functional Information, in particular gpuccio’s assertion that if we observe 500 bits of it, that this is a reliable indicator of Design, as here, about at the 11th sentence of point (a):

… the idea is that if we observe any object that exhibits complex functional information (for example, more than 500 bits of functional information ) for an explicitly defined function (whatever it is) we can safely infer design.

I wonder how this general method works. As far as I can see, it doesn’t work. There would be seem to be three possible ways of arguing for it, and in the end; two don’t work and one is just plain silly. Which of these is the basis for gpuccio’s statement? Let’s investigate …

A quick summary

Let me list the three ways, briefly.

(1) The first is the argument using William Dembski’s (2002) Law of Conservation of Complex Specified Information. I have argued (2007) that this is formulated in such a way as to compare apples to oranges, and thus is not able to reject normal evolutionary processes as explanations for the “complex” functional information. In any case, I see little sign that gpuccio is using the LCCSI.

(2) The second is the argument that the functional information indicates that only an extremely small fraction of genotypes have the desired function, and the rest are all alike in totally lacking any of this function. This would prevent natural selection from following any path of increasing fitness to the function, and the rareness of the genotypes that have nonzero function would prevent mutational processes from finding them. This is, as far as I can tell, gpuccio’s islands-of-function argument. If such cases can be found, then explaining them by natural evolutionary processes would indeed be difficult. That is gpuccio’s main argument, and I leave it to others to argue with its application in the cases where gpuccio uses it. I am concerned here, not with the islands-of-function argument itself, but with whether the design inference from 500 bits of functional information is generally valid.

We are asking here whether, in general, observation of more than 500 bits of functional information is “a reliable indicator of design”. And gpuccio’s definition of functional information is not confined to cases of islands of function, but also includes cases where there would be a path to along which function increases. In such cases, seeing 500 bits of functional information, we cannot conclude from this that it is extremely unlikely to have arisen by normal evolutionary processes. So the general rule that gpuccio gives fails, as it is not reliable.

(3) The third possibility is an additional condition that is added to the design inference. It simply declares that unless the set of genotypes is effectively unreachable by normal evolutionary processes, we don’t call the pattern “complex functional information”. It does not simply define “complex functional information” as a case where we can define a level of function that makes probability of the set less than $2^{-500}$ . That additional condition allows us to safely conclude that normal evolutionary forces can be dismissed — by definition. But it leaves the reader to do the heavy lifting, as the reader has to determine that the set of genotypes has an extremely low probability of being reached. And once they have done that, they will find that the additional step of concluding that the genotypes have “complex functional information” adds nothing to our knowledge. CFI becomes a useless add-on that sounds deep and mysterious but actually tells you nothing except what you already know. So CFI becomes useless. And there seems to be some indication that gpuccio does use this additional condition.

Let us go over these three possibilities in some detail. First, what is the connection of gpuccio’s “functional information” to Jack Szostak’s quantity of the same name?

Is gpuccio’s Functional Information the same as Szostak’s Functional Information?

gpuccio acknowledges that gpuccio’s definition of Functional Information is closely connected to Jack Szostak’s definition of it. gpuccio notes here:

Please, not[e] the definition of functional information as:

“the fraction of all possible configurations of the system that possess a degree of function >=
Ex.”

which is identical to my definition, in particular my definition of functional information as the
upper tail of the observed function, that was so much criticized by DNA_Jock.

(I have corrected gpuccio’s typo of “not” to “note”, JF)

We shall see later that there may be some ways in which gpuccio’s definition
is modified from Szostak’s. Jack Szostak and his co-authors never attempted any use of his definition to infer Design. Nor did Leslie Orgel, whose Specified Information (in his 1973 book The Origins of Life) preceded Szostak’s. So the part about design inference must come from somewhere else.

gpuccio seems to be making one of three possible arguments;

Possibility #1 That there is some mathematical theorem that proves that ordinary evolutionary processes cannot result in an adaptation that has 500 bits of Functional Information.

Use of such a theorem was attempted by William Dembski, his Law of Conservation of Complex Specified Information, explained in Dembski’s book No Free Lunch: Why Specified Complexity Cannot Be Purchased without Intelligence (2001). But Dembski’s LCCSI theorem did not do what Dembski needed it to do. I have explained why in my own article on Dembski’s arguments (here). Dembski’s LCCSI changed the specification before and after evolutionary processes, and so he was comparing apples to oranges.

In any case, as far as I can see gpuccio has not attempted to derive gpuccio’s argument from Dembski’s, and gpuccio has not directly invoked the LCCSI, or provided a theorem to replace it. gpuccio said in a response to a comment of mine at TSZ,

Look, I will not enter the specifics of your criticism to Dembski. I agre with Dembski in most things, but not in all, and my arguments are however more focused on empirical science and in particular biology.

While thus disclaiming that the argument is Dembski’s, on the other hand gpuccio does associate the argument with Dembski here by saying that

Of course, Dembski, Abel, Durston and many others are the absolute references for any discussion about functional information. I think and hope that my ideas are absolutely derived from theirs. My only purpose is to detail some aspects of the problem.

and by saying elsewhere that

No generation of more than 500 bits has ever been observed to arise in a non design system (as you know, this is the fundamental idea in ID).

That figure being Dembski’s, this leaves it unclear whether gpuccio is or is not basing the argument on Dembski’s. But gpuccio does not directly invoke the LCCSI, or try to come up with some mathematical theorem that replaces it.

So possibility #1 can be safely ruled out.

Possibility #2. That the target region in the computation of Functional Information consists of all of the sequences that have nonzero function, while all other sequences have zero function. As there is no function elsewhere, natural selection for this function then cannot favor sequences closer and closer to the target region.

Such cases are possible, and usually gpuccio is talking about cases like this. But gpuccio does not require them in order to have Functional Information. gpuccio does not rule out that the region could be defined by a high level of function, with lower levels of function in sequences outside of the region, so that there could be paths allowing evolution to reach the target region of sequences.

An example in which gpuccio recognizes that lower levels of function can exist outside the target region is found here, where gpuccio is discussing natural and artificial selection:

Then you can ask: why have I spent a lot of time discussing how NS (and AS) can in some cases add some functional information to a sequence (see my posts #284, #285 and #287)

There is a very good reason for that, IMO.

I am arguing that:

1) It is possible for NS to add some functional information to a sequence, in a few very specific cases, but:

2) Those cases are extremely rare exceptions, with very specific features, and:

3) If we understand well what are the feature that allow, in those exceptional cases, those limited “successes” of NS, we can easily demonstrate that:

4) Because of those same features that allow the intervention of NS, those scenarios can never, never be steps to complex functional information.

Jack Szostak defined functional information by having us define a cutoff level of function to define a set of sequences that had function greater than that, without any condition that the other sequences had zero function. Neither did Durston. And as we’ve seen gpuccio associates his argument with theirs.

So this second possibility could not be the source of gpuccio’s general assertion about 500 bits of functional information being a reliable indicator of design, however much gpuccio concentrates on such cases.

Possibility #3. That there is an additional condition in gpuccio’s Functional Information, one that does not allow us to declare it to be present if there is a way for evolutionary processes to achieve that high a level of function. In short, if we see 500 bits of Szostak’s functional information, and if it can be put into the genome by natural evolutionary processes such as natural selection then for that reason we declare that it is not really Functional Information. If gpuccio is doing this, then gpuccio’s Functional Information is really a very different animal than Szostak’s functional information.

Is gpuccio doing that? gpuccio does associate his argument with William Dembski’s, at least in some of his statements. And William Dembski has defined his Complex Specified Information in this way, adding the condition that it is not really CSI unless it is sufficiently improbable that it be achieved by natural evolutionary forces (see my discussion of this here in the section on “Dembski’s revised CSI argument” that refer to Dembski’s statements here). And Dembski’s added condition renders use of his CSI a useless afterthought to the design inference.

gpuccio does seem to be making a similar condition. Dembski’s added condition comes in via the calculation of the “probability” of each genotype. In Szostak’s definition, the probabilities of sequences are simply their frequencies among all possible sequences, with each being counted equally. In Dembski’s CSI calculation, we are instead supposed to compute the probability of the sequence given all evolutionary processes, including natural selection.

gpuccio has a similar condition in the requirements for concluding that complex
functional information is present: We can see it at step (6) here:

If our conclusion is yes, we must still do one thing. We observe carefully the object and what we know of the system, and we ask if there is any known and credible algorithmic explanation of the sequence in that system. Usually, that is easily done by excluding regularity, which is easily done for functional specification. However, as in the particular case of functional proteins a special algorithm has been proposed, neo darwininism, which is intended to explain non regular functional sequences by a mix of chance and regularity, for this special case we must show that such an explanation is not credible, and that it is not supported by facts. That is a part which I have not yet discussed in detail here. The necessity part of the algorithm (NS) is not analyzed by dFSCI alone, but by other approaches and considerations. dFSCI is essential to evaluate the random part of the algorithm (RV). However, the short conclusion is that neo darwinism is not a known and credible algorithm which can explain the origin of even one protein superfamily. It is neither known nor credible. And I am not aware of any other algorithm ever proposed to explain (without design) the origin of functional, non regular sequences.

In other words, you, the user of the concept, are on your own. You have to rule out that natural selection (and other evolutionary processes) could reach the target sequences. And once you have ruled it out, you have no real need for the declaration that complex functional information is present.

I have gone on long enough. I conclude that the rule that observation of 500 bits of functional information is present allows us to conclude in favor of Design (or at any rate, to rule out normal evolutionary processes as the source of the adaptation) is simply nonexistent. Or if it does exist, it is as a useless add-on to an argument that draws that conclusion for some other reason, leaving the really hard work to the user.

Let’s end by asking gpuccio some questions:
1. Is your “functional information” the same as Szostak’s?
2. Or does it add the requirement that there be no function in sequences that
are outside of the target set?
3. Does it also require us to compute the probability that the sequence arises as a result of normal evolutionary processes?

1,971 thoughts on “Does gpuccio’s argument that 500 bits of Functional Information implies Design work?”

Corneel on May 30, 2018 at 10:49 pm said:

gpuccio@UD

Incredible! Corneel insists with the weasel

Incredible! gpuccio insists on selectively fishing out comments from my exchange with Mung, which actually concerns the weasel program, and misses all the comments that are actually aimed at him.

How can he not understand that “matches” can be evaluated only because the information is already in the system, all of it, the whole phrase?

How can he be so blind?

Oh, the irony!

Anyway, I am perfectly aware that the target string is being evaluated to find the number of matches, but I don’t see why that would prevent us from calculating the functional information of the resulting strings, which is what the whole exercise is about.

Are you going to repeat your lament again how everybody’s arguments are pitiful, and how only Joe Felsenstein is a worthy opponent or will you be actually taking on some of the arguments that were raised? I am very interested whether cnidarians have a functional copy of TRIM62 for example, and how they pull that off without all the conserved information that humans have.
dazz on May 30, 2018 at 10:59 pm said:

Seems to me Puccio is missing the point big time:

Defending Intelligent Design theory: Why targets are real targets, probabilities real probabilities, and the Texas Sharp Shooter fallacy does not apply at all.

Puccio: they are making again the same error that we observed in the Weasel “argument”. Of course, if you alredy know where the functional isalnd is, or even better what the final solution is, it’s really easy to find it. It’s always easy to find what you already know

because a large portion of the sequence space in the weasel algo has some function (at least one match)

It would be interesting to hear from him what he thinks about the PI algorithm that Richard Hughes proposed a while back, with no targets or “islands “known beforehand.
Corneel on May 30, 2018 at 11:15 pm said:

Mung: According to the example we’ve been working with you want me to accept that the string GTAFSDRNJPGOZXDQBQBRWHHPODLL has some degree of function relative to METHINKS IT IS LIKE A WEASEL because one letter matches, which is pretty far afield considering the examples given by Hazen et al.

That is because the weasel has been (heh heh) designed to have a smoothly increasing fitness function. This allows it to start its approach to the target relatively fast (it is an instruction tool after all). Note: In an analogous way, gpuccio designed his “English language” example to be near impossible to navigate for an evolving population. Hazen et al. are not concerned with such considerations and use different examples, but the principle remains the same.

Mung: I just want to know where you set your threshold that’s all. Then we can try to figure out how many strings are in and how many are out and THEN we can figure out whether the FI is 500 bits or not.

Joe just explained that the threshold does get adjusted every time the number of matches changes, in order to recalculate the functional information. Once the target string is reached, you have witnessed the gradual accumulation of 133 bits of functional information. If you want to see 500 bits, insert a target of at least 106 characters into your copy of WEASEL and run it.
Corneel on May 30, 2018 at 11:21 pm said:

Joe Felsenstein: It’s also funny that nonlin.org made the argument about the nonexistence of fitness and of phenotypes over at Uncommon Descent as well.

I love reading the nonlin discussion over at UD. *snigger*
Rumraket on May 31, 2018 at 9:21 am said:

I have to accept a correction Gpuccio brings up regarding my response to Bill Cole earlier:

I see that they don’t even understand the Hayashi paper. They don’t understand what it says:

Here is the relevant quote from the paper:

“By extrapolation, we estimated that adaptive walking requires a library size of 10^70 with 35 substitutions to reach comparable fitness.”

They are mentioning the size of the starting library, not the number of trials.

That is correct and I made the mistake of using the same words as Bill Cole when I responded to him:

colewd: In the Hayashi case the wild type required 10^70 trials. How is it we are observing the wild type?

Rumraket: It evolved from another location in sequence space. Look at the graph DNA_Jock linked you. Actually I’ve modified that graph by drawing on it and writing an explanation that should clarify what we mean. Please take a look at this picture. In the original the spike which is occupied by the wild-type protein isn’t shown as that peak is much taller than the peaks found by Hayashi et al so I had to draw it on. It’s not to scale.

They’re not “trials”, they’re talking about how many variants with how many mutations would be required to sample further away in sequence space.

But I take issue with Gpuccio’s claim that:

It’s not, as they say at TSZ, a problem of “where you start”. Of course you start random, because you don’t know where the functional island is.

Of course it matters where you start, as that has significant implications for what level of function is in the immediate sequence neighborhood. The probability of finding mutations that sit on a slope that leads to levels of infectivity comparable to the wild-type is directly related to where that random initial location is.

If you start with a completely dissimilar sequence that has zero similarity to the wild type, and if that sequence happens to sit close to the base of an entirely different hill from the one occupied by the wild-type sequence, then there’s no way selection is going to be able to climb to that other WT-hill, as it will obviously just crawl up the local hill rather than go down and start randomly walking through a neutral or deleterious space. In fact given that Gpuccio accepts DNA_Jocks “holes” rather than hills metaphor it should be obvious that the starting position is critical for what hole the protein will fall into. And if that local hill isn’t as tall as the WT hill, then it’s just stuck there.

For the same reasons it also makes sense that the authors suggest that finding the wild-type sequence (or one of comparable fitness) would require an enormous library with a significant fraction of the protein mutated(35 substitutions) to increase the sampling space so much that it becomes likely that a hill leading to comparable fitness to wildtype will be among the sampled.

Gpuccion:Of course. If I start for:
“Methinks it is like a weasel”
from:
“Methinks it is like e weasel”
the transition is easy enough.
I am really tired of this nonsense.

Yes, exactly. Which could very well be the explanation for how the wild-type sequence originally evolved. That, or genetic recombination which would make it possible to sample areas of sequence space much further away from whatever random starting position is chosen.

But as I have stated now like twenty times, the fact that random proteins seem to invariaby sit near hills that selection can climb (or “holes” for the protein to “fall” into) implies that the infectivity function is ubiquitous in sequence space. As the authors write:
“The landscape structure has a number of implications for initial functional evolution of proteins and for molecular evolutionary engineering. First, the smooth surface of the mountainous structure from the foot to at least a relative fitness of 0.4 means that it is possible for most random or primordial sequences to evolve with relative ease up to the middle region of the fitness landscape by adaptive walking with only single substitutions. In fact, in addition to infectivity, we have succeeded in evolving esterase activity from ten arbitrarily chosen initial random sequences [17]. Thus, the primordial functional evolution of proteins may have proceeded from a population with only a small degree of sequence diversity.”

The proportion of holes that go as deep as the one occupied by the wild-type protein is actually a diversion. The question is whether evolution can “find” new functions in sequence space that can give beneficial effects to the organism. Hayashi et al 2006 is just another piece of evidence that indicates that yes it can.
dazz on May 31, 2018 at 1:27 pm said:

Rumraket: The proportion of holes that go as deep as the one occupied by the wild-type protein is actually a diversion. The question is whether evolution can “find” new functions in sequence space that can give beneficial effects to the organism. Hayashi et al 2006 is just another piece of evidence that indicates that yes it can.

Puccio goes as far as to claim that they were trying to get to the wild type, from a random starting point…

Puccio: The problem is that the researchers, correctly, started random, because that’s the purpose of the paper: to verify if they could find the wildtype starting random, with the described procedure, including RV + NS.

And they couldn’t.

For that to happen, it seems to me that the fitness landscape would need to be very much like the weasel’s, with a single peak and a more or less smooth upwards slope everywhere else. why would that need to be the case? Sounds ridiculous
Rumraket on May 31, 2018 at 1:34 pm said:

He’s just flat out incorrect there. It was not a stated goal of the experiment to reproduce the wild-type or even to reach one with a comparable level of infectivity.

It’s basically just an attempt to shed some light on the overall structure of the fitness landscape of the infectivity function. Which can be gathered from reading the introduction. Nowhere in the article is it stated that they are really trying to evolve the wild-type sequence.
Mung on May 31, 2018 at 3:03 pm said:

Joe Felsenstein: A Weasel-like model with a longer string could easily get to 500 bits of FI.

Mung: Starting from where?

Joe Felsenstein: From any string whatsoever.

To be pedantic, you don’t actually mean any string whatsoever, do you?

What is the FI of that string? If you don’t know that then you don’t know if you have got to 500 bits of FI.

Let’s say, for the sake of discussion, that the string “TO BE OR NOT TO BE THAT IS THE QUESTION WHETHER TIS NOBLER IN THE MIND TO SUFFER THE SLINGS AND ARROWS OF OUTRAGEOUS FORTUNE OR TO TAKE ARMS AGAINST A SEA OF TROUBLES” has 794 bits of FI.

Doesn’t the following string also have 794 bits of FI?

“TO BE OR NOT TO BE THAT IS THE QUESTION WHETHER TIS NOBLER IN THE MIND TO SUFFER THE SLINGS AND ARROWS OF OUTRAGEOUS FORTUNE OR TO FAKE ARMS AGAINST A SEA OF TROUBLES”

So what do you mean by “getting to” 500 bits of FI?

Can you give me a version of that string that has only 793 bits of FI, so that we can see how to get from 793 bits to 794 bits?

Feel free to work with shorter stings and fewer characters (perhaps using 0 and 1) in order to simplify, since it’s the principle of “getting to” some amount of FI that is being discussed, and that is independent of any actual amount of FI.
Mung on May 31, 2018 at 3:20 pm said:

Joe Felsenstein: Or do you think that somehow once it gets past 499 bits a Weasel-like program wouldn’t find the target string?

I don’t know what you mean by “getting past” 499 bits of of FI. I am still of the opinion that you don’t understand FI and how it is calculated.

Give me an example string that represents a function and then tell me how many other strings also have a degree of function

Accordingly, we define “functional information,” I(E(x)), as a measure of system complexity. For a given system and function, x, and degree of function, E(x), I(E(x)) = -log(2)[F(E(x))], where F(E(x)) is the fraction of all possible configurations of the system that possess a degree of function > or = E(x).

You have no problem defining a function x, but have yet to define the degree of function E(x). Until you have done so you cannot calculate the FI. You have to know (or estimate) the fraction of all possible configurations of the system that possess a degree of function > or = E(x).

Any such WEASEL program consists of little more than hand-waving until you lock down the numbers.

Back to the original WEASEL. “METHINKS IT IS LIKE A WEASEL.” Do all other strings the program generates have zero FI except for that one?
Mung on May 31, 2018 at 3:32 pm said:

Corneel: Anyway, I am perfectly aware that the target string is being evaluated to find the number of matches, but I don’t see why that would prevent us from calculating the functional information of the resulting strings, which is what the whole exercise is about.

I don’t see why that would prevent us from calculating the functional information of the resulting strings either. I just want to see it in practice.

Take a binary to 7-bit ASCII mapping. 01000001 = ‘A’

Only that one string has the function ‘A’ and there are no strings that have any “degree of function” greater than or equal to that one, so they all get an FI of zero.

So it’s meaningless to talk about calculating the FI of those resulting strings. We already know what it is. Any string generated by a program that does not exactly match that one sequence has an FI of zero. period.

Why am I wrong?
Mung on May 31, 2018 at 3:52 pm said:

Corneel: Joe just explained that the threshold does get adjusted every time the number of matches changes, in order to recalculate the functional information. Once the target string is reached, you have witnessed the gradual accumulation of 133 bits of functional information. If you want to see 500 bits, insert a target of at least 106 characters into your copy of WEASEL and run it.

Well I think Joe is wrong. But I also may be wrong. I like talking to both of you because you do a great job of staying away from insults and name calling, which I really appreciate. I hope we can continue to explore this.

Let’s expand a slight bit beyond my previous example with the ‘A’ and make our string match TO followed by a space. The first word of our TO BE OR NOT …

Our binary string is 010101000100111100100000. We require an exact match.

24 bits of FI?

Now if we create a program that generates strings of 24 characters comprised of 0s and 1s, how do you propose to calculate the FI of those strings so that we can see the FI “accumulate”?
dazz on May 31, 2018 at 4:03 pm said:

Mung: Back to the original WEASEL. “METHINKS IT IS LIKE A WEASEL.” Do all other strings the program generates have zero FI except for that one?

Of course not, since fitness is defined as the Hamming distance to “METHINKS IT IS LIKE A WEASEL.”. More similarity, more FI
Mung on May 31, 2018 at 4:16 pm said:

dazz: Of course not, since fitness is defined as the Hamming distance to “METHINKS IT IS LIKE A WEASEL.”. More similarity, more FI

No dazz, more similarity does not mean more FI. Go back and read the papers again. And what does “fitness” have to do with it? Where do they ever appeal to “fitness” in calculating FI?
Mung on May 31, 2018 at 4:24 pm said:

Here’s a new exercise:

Let’s say that there is one single string that has zero function and all others do have function. Does the string with zero function have zero FI. And how much FI do the other strings have?

So back to ASCII. 00000000 is NUL. Say it is the string with no function. All other 8 bit strings have an ASCII function. What is the FI of those strings? Do they all have different FI? Do they all have the same FI?
dazz on May 31, 2018 at 4:25 pm said:

Mung: No dazz, more similarity does not mean more FI. Go back and read the papers again.

In the weasel algo, more similarity means more fitness.

Mung: Where do they ever appeal to “fitness” in calculating FI?

Degree of function… fitness… call it what you want.

(word lawyering in 3…2…1…)
dazz on May 31, 2018 at 4:34 pm said:

Mung: Let’s say that there is one single string that has zero function and all others do have function. Does the string with zero function have zero FI. And how much FI do the other strings have?

the probability that a random sequence will have a greater degree of function than 0 in that case is 1. log2(1) = 0. So FI=0

Seems to me you’re changing the definition midway there. If we define FI for a sequence as the FI of the ensemble taking the sequence’s degree of function as the threshold, then it makes no sense to ask how much FI do the other strings have for threshold=0, since all other string have a function greater than 0
dazz on May 31, 2018 at 4:47 pm said:

Mung @ UD: “I’m still trying to pin them down on FI and whether WEASEL can generate FI or cause FI to accumulate.”

If you were right, NOTHING would be able to do that, including design. Didn’t you claim the weasel was an example of ID anyway? Does all that mean ID can’t generate FI?
Joe Felsenstein on May 31, 2018 at 4:54 pm said:

Mung:
To be pedantic, you don’t actually mean any string whatsoever, do you?

What is the FI of that string? If you don’t know that then you don’t know if you have got to 500 bits of FI.

Based on these statements and some others you made, let me more properly qualify my statements:

1. We’re talking about strings of characters, where the alphabet is the 26 letters of the upper-case English alphabet, plus one that is a space.
2. We’re talking about strings that are all 167 characters long.
3. We’re counting matches of characters, not matches of their binary encodings in ISO/ASCII.

Those are the counterparts of what Dawkins did in the original Weasel, with a 28-character-long string.

Let’s say, for the sake of discussion, that the string “TO BE OR NOT TO BE THAT IS THE QUESTION WHETHER TIS NOBLER IN THE MIND TO SUFFER THE SLINGS AND ARROWS OF OUTRAGEOUS FORTUNE OR TO TAKE ARMS AGAINST A SEA OF TROUBLES” has 794 bits of FI.

Doesn’t the following string also have 794 bits of FI?

“TO BE OR NOT TO BE THAT IS THE QUESTION WHETHER TIS NOBLER IN THE MIND TO SUFFER THE SLINGS AND ARROWS OF OUTRAGEOUS FORTUNE OR TO FAKE ARMS AGAINST A SEA OF TROUBLES”

Well, first of all, there is a technical tangle that we need to unravel first. Szostak stated his criterion as calculating $-\log_2(P)$ , where $P$ is the fraction of all sequences that have greater than the threshold value of function. However, Hazen et al. instead defined it as the fraction that had function greater than or equal to the threshold value.

Personally, I prefer the latter. The former leads to some silliness, like nonzero
amounts of FI when we consider the least functional string and, more alarmingly, infinite amounts of FI when we consider the most functional string. The latter, greater than or equal to, is what makes more sense, leading to zero FI for the least functional string (because then $P = 1$ ) and in the “TO BE OR NOT TO BE” case a finite value of about 794 bits for a perfect match.

Allow me to accept the Hazen-Szostak correction to greater than or equal to. (I know I used his original formulation upthread).

In the care of the corrected definition, the string that contains the word “FAKE” instead of “TAKE” is one of $26 \times 167$ possible one-letter mismatches. So $P = (1 + (26 \times 167)) / (27^{167})$ , the fraction of all strings having one or fewer mismatches. Taking $-\log_2$ of that I get 781.9817 bits.

So what do you mean by “getting to” 500 bits of FI?

Can you give me a version of that string that has only 793 bits of FI, so that we can see how to get from 793 bits to 794 bits?

No, as seen above, the next smaller value is not 793 but about 782.

Feel free to work with shorter stings and fewer characters (perhaps using 0 and 1) in order to simplify, since it’s the principle of “getting to” some amount of FI that is being discussed, and that is independent of any actual amount of FI.

No, I am restricting this simplified model to having strings all of the same length. If we want to consider strings of shorter or longer length we need to define their degree of matching to the target. For these simple Weaseloid teaching models no one bothers to do that.
Mung on May 31, 2018 at 5:03 pm said:

dazz: In the weasel algo, more similarity means more fitness.

So? Your claim was that more similarity meant more FI. You also seem to think that “more fit” means more FI. Neither of those come from the papers, so where are you getting it from? It’s ok to say you’re just making it up.

call it what you want.

Let’s not. Let’s call it what they call it and calculate it the way they do.
dazz on May 31, 2018 at 5:17 pm said:

Mung: So? Your claim was that more similarity meant more FI. You also seem to think that “more fit” means more FI. Neither of those come from the papers, so where are you getting it from? It’s ok to say you’re just making it up.

Let’s not. Let’s call it what they call it and calculate it the way they do.

and … 0

Seriously, if we can’t agree that the fitness function in the weasel algo is analogous to defining degrees of function there’s not much more anyone can say or do
RoyLT on May 31, 2018 at 6:08 pm said:

vjtorley: I’m sorry, but I’m going to have to side with Origenes here. Think of the Monolith On The Moon in 2001: A Space Odyssey. The design inference comes first, and it’s a mathematical inference based on the simple fact that the likelihood of any unguided natural process generating an object like that is astronomically low. Questions about the identity or even the current existence of the designer (who might have lived and died billions of years ago) come second.

I’m in agreement with Entropy and RodW as far as their responses to you. An inference of design without any notion of who, how, or why has no explanatory power.

I am a bit young to be familiar with 2001, so I had to look it up. An interesting example, but I think that it misses the mark. Humans know how to make monoliths. While the location, material of construction, function, and dimensions of those in Clark’s story might be different, they would still be examples of craftsmanship that we could probably understand. Also, the who, when, and why questions occur simultaneously with the discovery. I cannot personally think of a single example from Archaeology or Forensic science where those questions are secondary.

I would propose a different example from a more recent film. In ‘Interstellar’, a worm-hole is found near Saturn. The worm-hole allows instantaneous travel to different star systems. Ultimately, the conclusion is drawn that humans from the distant future placed the worm-hole as a means for humans to escape Earth.

If such an anomaly were detected in real life, how would we decide whether to draw a design inference? We have no experience of ‘building’ worm-holes and the physics of such things are theoretical in the extreme. Without clear evidence of who/what could have made it and what their motive was, I suspect that we would have no way of determining whether the anomaly was naturally occurring or designed.

We likewise have no experience with designing life, and so I don’t see how drawing an inference of design where we can know nothing about the designer, the methods used, or the motive is anything but absurd.
colewd on May 31, 2018 at 6:56 pm said:

RoyLT,

I’m in agreement with Entropy and RodW as far as their responses to you. An inference of design without any notion of who, how, or why has no explanatory power.

I see this continually repeated here and think it is an empty statement. Design is an argument from analogy and even though you cannot explain the who etc. it gives an explanation of how based on evidence of humans capability.

We likewise have no experience with designing life, and so I don’t see how drawing an inference of design where we can know nothing about the designer, the methods used, or the motive is anything but absurd.

We have experience designing functional information and life is full of functional information. A design inference based on the observation of biological functional information seems obvious.
Corneel on May 31, 2018 at 7:02 pm said:

Just for the fun of it (and because it annoys gpuccio so much) I produced the mapping of the bits of functional information to the number of matches to the target string. This is for the classical 28 character string weasel, with zero bits for no matches to a little over 133 bits for the fully matching string .
Corneel on May 31, 2018 at 7:04 pm said:

And here is the accumulation of FI in a typical run of the weasel program. This run took 123 generations to reach the target.
colewd on May 31, 2018 at 7:27 pm said:

Corneel,

And here is the accumulation of FI in a typical run of the weasel program. This run took 123 generations to reach the target.

So this shows the process of an inefficient algorithm copying over several steps existing information:-)
Corneel on May 31, 2018 at 7:41 pm said:

Mung: Any string generated by a program that does not exactly match that one sequence has an FI of zero. period.

Why am I wrong?

You are not. If the function is “encode an A”, that is precisely correct.

Mung: I like talking to both of you because you do a great job of staying away from insults and name calling, which I really appreciate.

You are easily impressed, aren’t you? 🙂

Mung: Our binary string is 010101000100111100100000. We require an exact match.

24 bits of FI?

That’s an easy one.

Mung: Now if we create a program that generates strings of 24 characters comprised of 0s and 1s, how do you propose to calculate the FI of those strings so that we can see the FI “accumulate”?

What is it supposed to do? Is it a weasel again? Then have the binary string translated to a regular character “phenotype” and count the number of matches. The rest proceeds exactly as described previously. You can even feed it into a regular weasel to see whether it evolves to a higher level of FI. Since it looks like the fitness landscape is a little wonky, I have no idea whether it is going to find the target. Perhaps you can team up with Bill to work on your “impossible fitness landscape” projects together.
RoyLT on May 31, 2018 at 7:45 pm said:

colewd: Design is an argument from analogy and even though you cannot explain the who etc. it gives an explanation of how based on evidence of humans capability.

It does not give an explanation of ‘how’. Functional Information is the ‘what’ that you are claiming to identify. The inference contains no hypothesis of ‘who’ designed it, ‘how’ the Functional Information got there, or ‘why’ the designer(s) did it at all.
Corneel on May 31, 2018 at 7:46 pm said:

colewd: So this shows the process of an inefficient algorithm copying over several steps existing information:-)

Since we are repeatedly told that mutation and selection are “blind and mindless” processes, I assume they are incapable of reading the string in my code. That takes INTELLIGENCE, you know.
Mung on May 31, 2018 at 8:00 pm said:

dazz: Seriously, if we can’t agree that the fitness function in the weasel algo is analogous to defining degrees of function there’s not much more anyone can say or do

You need to go back and read what I’ve written about it and take time to try to actually comprehend what I wrote.
Corneel on May 31, 2018 at 8:00 pm said:

colewd: So this shows the process of an inefficient algorithm copying over several steps existing information:-)

All joking aside. I could easily add a few lines of code, telling the program that the string “METHINKS IT IS LIKE A WEASEL” has NO matches whatsoever to the target string. Do you think that the algorithm would suddenly be incapable of reaching any of the 728 1-letter mismatches that now are top of the hill? How do you reckon it does that without me telling it explicitely which strings belong to the target set?

gpuccio’s unthinking dismissal of the weasel is really a foolish thing to do.
Mung on May 31, 2018 at 8:01 pm said:

Corneel: That takes INTELLIGENCE, you know.

Anyone else’s code, no. But your code, yes.

😉
colewd on May 31, 2018 at 8:03 pm said:

Corneel,

Since we are repeatedly told that mutation and selection are “blind and mindless” processes, I assume they are incapable of reading the string in my code. That takes INTELLIGENCE, you know.

Blind and unguided are an evolutionist claim. Don’t give this treasure up so easily:-)
Mung on May 31, 2018 at 8:06 pm said:

Corneel: Perhaps you can team up with Bill to work on your “impossible fitness landscape” projects together.

Not sure what an impossible fitness landscape is. So I’ll leave that to Bill.
Mung on May 31, 2018 at 8:13 pm said:

Corneel, how are you calculating the FI for each string/step/whatever? I keep asking and I keep not getting any answer, lol.

So you have some subset of the strings that are below the function threshold. Which strings are those? Above that threshold are strintgs which have “some given degree of function.” Which strings are those?

see this
colewd on May 31, 2018 at 8:16 pm said:

Corneel,

Do you think that the algorithm would suddenly be incapable of reaching any of the 728 1-letter mismatches that now are top of the hill? How do you reckon it does that without me telling it explicitely which strings belong to the target set?

What do you think this demonstrates?

Wiesel shows you can copy existing information through a search algorithm.

The fact is in order to model this “cumulative selection” you need to have knowledge of the sequence is a very telling demonstration of the challenges that sequences face when there is any random change involved.

Copying existing information is very different the generating new information.
colewd on May 31, 2018 at 8:24 pm said:

RoyLT,

It does not give an explanation of ‘how’. Functional Information is the ‘what’ that you are claiming to identify. The inference contains no hypothesis of ‘who’ designed it, ‘how’ the Functional Information got there, or ‘why’ the designer(s) did it at all.

Most of this is true. It does however give us an inference of what we are observing and what general mechanism caused it.

If you think about the cell in these terms it starts to make a lot more sense.

A designed system will operate according to certain principles. When you understand these principles you can then predict how unknown parts of the system will behave.
dazz on May 31, 2018 at 8:25 pm said:

colewd: The fact is in order to model this “cumulative selection” you need to have knowledge of the sequence

Demonstrably false

Math Genome Fun 2, UPB and beyond?
Tom English on May 31, 2018 at 8:35 pm said:

Corneel: Anyway, I am perfectly aware that the target string is being evaluated to find the number of matches, but I don’t see why that would prevent us from calculating the functional information of the resulting strings, which is what the whole exercise is about.

It’s crucial not to mistake the internal workings of a computational model for the workings of the modeled system. The simulator is not the simuland.

Perhaps I have missed your statement of the model that is implemented by your program. (I haven’t read the preceding page of comments.) You cannot simply hack a Weasel program, and then say that the model is whatever the hacked program does.
Tom English on May 31, 2018 at 8:42 pm said:

Corneel: All joking aside. I could easily add a few lines of code, telling the program that the string “METHINKS IT IS LIKE A WEASEL” has NO matches whatsoever to the target string. Do you think that the algorithm would suddenly be incapable of reaching any of the 728 1-letter mismatches that now are top of the hill? How do you reckon it does that without me telling it explicitely which strings belong to the target set?

I just happened to see this next. It’s a grave error to focus on the algorithm instead of the modeled system.
Rumraket on May 31, 2018 at 8:43 pm said:

colewd: Copying existing information is very different the generating new information.

What makes new information “new”?
Mung on May 31, 2018 at 8:48 pm said:

Rumraket: What makes new information “new”?

The smell. Haven’t you ever experienced that “new information” smell?
Tom English on May 31, 2018 at 8:58 pm said:

colewd: What do you think this demonstrates?

Wiesel shows you can copy existing information through a search algorithm.

The fact is in order to model this “cumulative selection” you need to have knowledge of the sequence is a very telling demonstration of the challenges that sequences face when there is any random change involved.

Copying existing information is very different the generating new information.

And there you have an immediate illustration that, as I have said,

Tom English: It’s a grave error to focus on the algorithm instead of the modeled system.
Corneel on May 31, 2018 at 9:09 pm said:

Tom English: Perhaps I have missed your statement of the model that is implemented by your program. (I haven’t read the preceding page of comments.) You cannot simply hack a Weasel program, and then say that the model is whatever the hacked program does.

Not sure what you are trying to tell me. As far as I am concerned, the weasel is an instructional tool to demonstrate the working of cumulative selection, as opposed to “tornado in a junkyard” style random sampling. It does not properly model natural selection, as the algorithm simply picks a single best matching string every generation, instead of having differential survival and reproduction among individuals within a population. This I am aware of, if that is your concern.

Tom English: I just happened to see this next. It’s a grave error to focus on the algorithm instead of the modeled system.

Am I being corrected, or of the hook now? The fitness landscape is the thing we are trying to model right?
Corneel on May 31, 2018 at 9:13 pm said:

Mung: Corneel, how are you calculating the FI for each string/step/whatever? I keep asking and I keep not getting any answer, lol.

Sorry about that. I just followed Joe’s handy recipe and performed the calculations in the R statistical package. R has a function that gives the tail probabilities for binomial distributions.
Mung on May 31, 2018 at 9:17 pm said:

dazz: the probability that a random sequence will have a greater degree of function than 0 in that case is 1. log2(1) = 0. So FI=0

No dazz, that is incorrect. Go back to the papers and learn how to calculate FI.

Also, there is a non-zero probability that a string with all zeros would be randomly generated.

Try again.
RoyLT on May 31, 2018 at 9:27 pm said:

colewd: Most of this is true. It does however give us an inference of what we are observing and what general mechanism caused it.

Then pick your favorite design candidate and state what general mechanism you think caused it. I can imagine the process of designing a concept for a very small outboard-motor. What I cannot imagine is how I would go about assembling and installing said motor without leaving any forensic evidence behind.
dazz on May 31, 2018 at 9:41 pm said:

Mung: No dazz, that is incorrect. Go back to the papers and learn how to calculate FI.

http://molbio.mgh.harvard.edu/szostakweb/publications/Szostak_pdfs/Szostak_2003.pdf

“By analogy with classical information,
functional information is simply -log2 of
the probability that a random sequence will
encode a molecule with greater than any
given degree of function”

Mung:
Also, there is a non-zero probability that a string with all zeros would be randomly generated.

Try again.

Nitpicking much?
Mung on May 31, 2018 at 9:54 pm said:

dazz: Seems to me you’re changing the definition midway there. If we define FI for a sequence as the FI of the ensemble taking the sequence’s degree of function as the threshold, then it makes no sense to ask how much FI do the other strings have for threshold=0, since all other string have a function greater than 0

I am not redefining anything. I am providing an alternative way to look at things. The way to calculate FI has not changed.

We can define a “target” string as the only member of the ensemble of strings that meet the degree of function threshold. We can calculate the FI for that ensemble.

We can define a string as the only string that is not a member of the ensemble and calculate the FI for the ensemble of all strings that exclude that string. All those other strings would meet the minimum threshold condition. Only one string would not.

Case 1: Only one string meets the minimum threshold, all others do not. We can calculate the FI.

Case 2: Only one string does not meet the minimum threshold, all others do. We can calculate the FI.

The way to calculate the FI hasn’t changed one whit.

Anyone not named dazz who is not getting this?
Mung on May 31, 2018 at 10:00 pm said:

Corneel: I just followed Joe’s handy recipe and performed the calculations in the R statistical package.

I bought The Book of R yesterday. I thought it was about pirates.
Mung on May 31, 2018 at 10:04 pm said:

Show of hands.

How many people think that the FI of strings that are below the degree of function threshold is zero?

How many people think that the question, what is the FI of strings that are below the degree of function threshold, is a nonsense question? The FI of those strings is undefined.