Does gpuccio’s argument that 500 bits of Functional Information implies Design work?

Posted on May 21, 2018 by Joe Felsenstein

On Uncommon Descent, poster gpuccio has been discussing “functional information”. Most of gpuccio’s argument is a conventional “islands of function” argument. Not being very knowledgeable about biochemistry, I’ll happily leave that argument to others.

But I have been intrigued by gpuccio’s use of Functional Information, in particular gpuccio’s assertion that if we observe 500 bits of it, that this is a reliable indicator of Design, as here, about at the 11th sentence of point (a):

… the idea is that if we observe any object that exhibits complex functional information (for example, more than 500 bits of functional information ) for an explicitly defined function (whatever it is) we can safely infer design.

I wonder how this general method works. As far as I can see, it doesn’t work. There would be seem to be three possible ways of arguing for it, and in the end; two don’t work and one is just plain silly. Which of these is the basis for gpuccio’s statement? Let’s investigate …

A quick summary

Let me list the three ways, briefly.

(1) The first is the argument using William Dembski’s (2002) Law of Conservation of Complex Specified Information. I have argued (2007) that this is formulated in such a way as to compare apples to oranges, and thus is not able to reject normal evolutionary processes as explanations for the “complex” functional information. In any case, I see little sign that gpuccio is using the LCCSI.

(2) The second is the argument that the functional information indicates that only an extremely small fraction of genotypes have the desired function, and the rest are all alike in totally lacking any of this function. This would prevent natural selection from following any path of increasing fitness to the function, and the rareness of the genotypes that have nonzero function would prevent mutational processes from finding them. This is, as far as I can tell, gpuccio’s islands-of-function argument. If such cases can be found, then explaining them by natural evolutionary processes would indeed be difficult. That is gpuccio’s main argument, and I leave it to others to argue with its application in the cases where gpuccio uses it. I am concerned here, not with the islands-of-function argument itself, but with whether the design inference from 500 bits of functional information is generally valid.

We are asking here whether, in general, observation of more than 500 bits of functional information is “a reliable indicator of design”. And gpuccio’s definition of functional information is not confined to cases of islands of function, but also includes cases where there would be a path to along which function increases. In such cases, seeing 500 bits of functional information, we cannot conclude from this that it is extremely unlikely to have arisen by normal evolutionary processes. So the general rule that gpuccio gives fails, as it is not reliable.

(3) The third possibility is an additional condition that is added to the design inference. It simply declares that unless the set of genotypes is effectively unreachable by normal evolutionary processes, we don’t call the pattern “complex functional information”. It does not simply define “complex functional information” as a case where we can define a level of function that makes probability of the set less than $2^{-500}$ . That additional condition allows us to safely conclude that normal evolutionary forces can be dismissed — by definition. But it leaves the reader to do the heavy lifting, as the reader has to determine that the set of genotypes has an extremely low probability of being reached. And once they have done that, they will find that the additional step of concluding that the genotypes have “complex functional information” adds nothing to our knowledge. CFI becomes a useless add-on that sounds deep and mysterious but actually tells you nothing except what you already know. So CFI becomes useless. And there seems to be some indication that gpuccio does use this additional condition.

Let us go over these three possibilities in some detail. First, what is the connection of gpuccio’s “functional information” to Jack Szostak’s quantity of the same name?

Is gpuccio’s Functional Information the same as Szostak’s Functional Information?

gpuccio acknowledges that gpuccio’s definition of Functional Information is closely connected to Jack Szostak’s definition of it. gpuccio notes here:

Please, not[e] the definition of functional information as:

“the fraction of all possible configurations of the system that possess a degree of function >=
Ex.”

which is identical to my definition, in particular my definition of functional information as the
upper tail of the observed function, that was so much criticized by DNA_Jock.

(I have corrected gpuccio’s typo of “not” to “note”, JF)

We shall see later that there may be some ways in which gpuccio’s definition
is modified from Szostak’s. Jack Szostak and his co-authors never attempted any use of his definition to infer Design. Nor did Leslie Orgel, whose Specified Information (in his 1973 book The Origins of Life) preceded Szostak’s. So the part about design inference must come from somewhere else.

gpuccio seems to be making one of three possible arguments;

Possibility #1 That there is some mathematical theorem that proves that ordinary evolutionary processes cannot result in an adaptation that has 500 bits of Functional Information.

Use of such a theorem was attempted by William Dembski, his Law of Conservation of Complex Specified Information, explained in Dembski’s book No Free Lunch: Why Specified Complexity Cannot Be Purchased without Intelligence (2001). But Dembski’s LCCSI theorem did not do what Dembski needed it to do. I have explained why in my own article on Dembski’s arguments (here). Dembski’s LCCSI changed the specification before and after evolutionary processes, and so he was comparing apples to oranges.

In any case, as far as I can see gpuccio has not attempted to derive gpuccio’s argument from Dembski’s, and gpuccio has not directly invoked the LCCSI, or provided a theorem to replace it. gpuccio said in a response to a comment of mine at TSZ,

Look, I will not enter the specifics of your criticism to Dembski. I agre with Dembski in most things, but not in all, and my arguments are however more focused on empirical science and in particular biology.

While thus disclaiming that the argument is Dembski’s, on the other hand gpuccio does associate the argument with Dembski here by saying that

Of course, Dembski, Abel, Durston and many others are the absolute references for any discussion about functional information. I think and hope that my ideas are absolutely derived from theirs. My only purpose is to detail some aspects of the problem.

and by saying elsewhere that

No generation of more than 500 bits has ever been observed to arise in a non design system (as you know, this is the fundamental idea in ID).

That figure being Dembski’s, this leaves it unclear whether gpuccio is or is not basing the argument on Dembski’s. But gpuccio does not directly invoke the LCCSI, or try to come up with some mathematical theorem that replaces it.

So possibility #1 can be safely ruled out.

Possibility #2. That the target region in the computation of Functional Information consists of all of the sequences that have nonzero function, while all other sequences have zero function. As there is no function elsewhere, natural selection for this function then cannot favor sequences closer and closer to the target region.

Such cases are possible, and usually gpuccio is talking about cases like this. But gpuccio does not require them in order to have Functional Information. gpuccio does not rule out that the region could be defined by a high level of function, with lower levels of function in sequences outside of the region, so that there could be paths allowing evolution to reach the target region of sequences.

An example in which gpuccio recognizes that lower levels of function can exist outside the target region is found here, where gpuccio is discussing natural and artificial selection:

Then you can ask: why have I spent a lot of time discussing how NS (and AS) can in some cases add some functional information to a sequence (see my posts #284, #285 and #287)

There is a very good reason for that, IMO.

I am arguing that:

1) It is possible for NS to add some functional information to a sequence, in a few very specific cases, but:

2) Those cases are extremely rare exceptions, with very specific features, and:

3) If we understand well what are the feature that allow, in those exceptional cases, those limited “successes” of NS, we can easily demonstrate that:

4) Because of those same features that allow the intervention of NS, those scenarios can never, never be steps to complex functional information.

Jack Szostak defined functional information by having us define a cutoff level of function to define a set of sequences that had function greater than that, without any condition that the other sequences had zero function. Neither did Durston. And as we’ve seen gpuccio associates his argument with theirs.

So this second possibility could not be the source of gpuccio’s general assertion about 500 bits of functional information being a reliable indicator of design, however much gpuccio concentrates on such cases.

Possibility #3. That there is an additional condition in gpuccio’s Functional Information, one that does not allow us to declare it to be present if there is a way for evolutionary processes to achieve that high a level of function. In short, if we see 500 bits of Szostak’s functional information, and if it can be put into the genome by natural evolutionary processes such as natural selection then for that reason we declare that it is not really Functional Information. If gpuccio is doing this, then gpuccio’s Functional Information is really a very different animal than Szostak’s functional information.

Is gpuccio doing that? gpuccio does associate his argument with William Dembski’s, at least in some of his statements. And William Dembski has defined his Complex Specified Information in this way, adding the condition that it is not really CSI unless it is sufficiently improbable that it be achieved by natural evolutionary forces (see my discussion of this here in the section on “Dembski’s revised CSI argument” that refer to Dembski’s statements here). And Dembski’s added condition renders use of his CSI a useless afterthought to the design inference.

gpuccio does seem to be making a similar condition. Dembski’s added condition comes in via the calculation of the “probability” of each genotype. In Szostak’s definition, the probabilities of sequences are simply their frequencies among all possible sequences, with each being counted equally. In Dembski’s CSI calculation, we are instead supposed to compute the probability of the sequence given all evolutionary processes, including natural selection.

gpuccio has a similar condition in the requirements for concluding that complex
functional information is present: We can see it at step (6) here:

If our conclusion is yes, we must still do one thing. We observe carefully the object and what we know of the system, and we ask if there is any known and credible algorithmic explanation of the sequence in that system. Usually, that is easily done by excluding regularity, which is easily done for functional specification. However, as in the particular case of functional proteins a special algorithm has been proposed, neo darwininism, which is intended to explain non regular functional sequences by a mix of chance and regularity, for this special case we must show that such an explanation is not credible, and that it is not supported by facts. That is a part which I have not yet discussed in detail here. The necessity part of the algorithm (NS) is not analyzed by dFSCI alone, but by other approaches and considerations. dFSCI is essential to evaluate the random part of the algorithm (RV). However, the short conclusion is that neo darwinism is not a known and credible algorithm which can explain the origin of even one protein superfamily. It is neither known nor credible. And I am not aware of any other algorithm ever proposed to explain (without design) the origin of functional, non regular sequences.

In other words, you, the user of the concept, are on your own. You have to rule out that natural selection (and other evolutionary processes) could reach the target sequences. And once you have ruled it out, you have no real need for the declaration that complex functional information is present.

I have gone on long enough. I conclude that the rule that observation of 500 bits of functional information is present allows us to conclude in favor of Design (or at any rate, to rule out normal evolutionary processes as the source of the adaptation) is simply nonexistent. Or if it does exist, it is as a useless add-on to an argument that draws that conclusion for some other reason, leaving the really hard work to the user.

Let’s end by asking gpuccio some questions:
1. Is your “functional information” the same as Szostak’s?
2. Or does it add the requirement that there be no function in sequences that
are outside of the target set?
3. Does it also require us to compute the probability that the sequence arises as a result of normal evolutionary processes?

1,971 thoughts on “Does gpuccio’s argument that 500 bits of Functional Information implies Design work?”

dazz on June 1, 2018 at 5:11 pm said:

Mung: Why don’t people get this?

Nobody is saying that the calculations for every step in the weasel is for the information needed to specify each particular sequence found by the algo. Perhaps “the level of FI achieved by the process at each step, based on the degree of function” is more like it?
Mung on June 1, 2018 at 5:30 pm said:

dazz: so it’s *poof*!, 500 bits of protein out of the blue

Yes, that’s what the evidence shows. Deal with it.
Mung on June 1, 2018 at 5:38 pm said:

dazz: Perhaps “the level of FI achieved by the process at each step, based on the degree of function” is more like it?

You have to calculate a proportion dazz. DNA-Jock gets it:

DNA_Jock: the Hazen/Szostak definition of FI, which is (minus log2, whatever) of the proportion of the sequences that have function level X or above.

That gives you a number. It’s a property of the system.

It only become a plot if you change something. I’m waiting to hear how the degree of function is changing in WEASEL.
Rumraket on June 1, 2018 at 5:38 pm said:

DNA_Jock: He defends his position by asserting that steps do not exist.

And he asserts steps to not exist because protein X with function Y has been conserved at some level over the history of life.

It’s ad-hoc rulemaking and excuses all the way down.
dazz on June 1, 2018 at 5:48 pm said:

Mung: You have to calculate a proportion dazz. DNA-Jock gets it:

And the proportion is the number of sequences with function greater than the current one, over the size of sequence space, how is it so hard? That’s most definitely NOT the same as the the information required to specify a particular sequence.

Mung: That gives you a number. It’s a property of the system.

Yeah, it’s a property of the ensemble, because of the particular distribution of function levels, that it takes N amount of FI to reach N’ degree of function, so what?

Mung: It only become a plot if you change something. I’m waiting to hear how the degree of function is changing in WEASEL.

The degree of function increases because of selection, obviously
colewd on June 1, 2018 at 5:49 pm said:

DNA_Jock,

Wrong. gpuccio states that it need only be an absolute value of 500. He defends his position by asserting that steps do not exist.

If steps exist then over time these steps become visible from AA substitutions. These changes are not part of his 500 bit calculation.
Rumraket on June 1, 2018 at 5:50 pm said:

Mung: I’m waiting to hear how the degree of function is changing in WEASEL.

Well the function of the sentence from Shakespeare is to convey the meaning expressed by the original sentence, right?
The more like the original sequence, the more likely it is to correctly convey the same meaning.

Hazen and Szostak give a relevant example where a string is intended to instigate a response by the fire department. Some strings would be better at it than others, by being more easily interpreted, and that is the strings degree of function.

“Functional information is determined by identifying the fraction of all sequences that achieve a specified outcome.

Consider, for example, sequences of 10 letters that have a high probability (Ex ≅ 1) of evoking a positive response from the fire department. Such sequences might include “FIREONMAIN,” “MAINSTFIRE,” or “MAPLENMAIN.” Additionally, some messages containing phonetic misspellings (FYRE or MANE), mistakes in grammar or usage (FIREOFMAIN), or typing errors (MAZLE or NAPLE) may also yield a significant but lower probability of response (0 ≪ Ex < 1). Given these variants, on the order of 1,000 combinations of 10 letters might initiate a rapid response to the approximate location of the fire. Thus,

I(1) ≈ -log2[1000/26^10] ≈ 36 bits

Numerous additional 10-letter sequences convey some relevant information but would result in a lower probability of response (0 < Ex < 1): “FIREHELPME,” “DANGERFIRE,” or “BURNINGNOW.” A lower degree of function, Ex, will generally correspond to a larger number of effective letter sequences, M(Ex)."

There are probably hundreds of variants, if not thousands, of the METHINKS… sentence that are capable conveying the same meaning as the full correct sentence.
dazz on June 1, 2018 at 5:55 pm said:

Mung: Yes, that’s what the evidence shows

Not really

Mung: Deal with it.

No need to

I asked this question a while back. How come, after 400 million years of non functionally constrained engineering, the designer just happened to settle for a sequence with significant similarity to it’s precursor’s?. Why would it take the designer so long to get to the target if he knew the target from the get go? Why do non functional / drifting sequences always evolve at the predicted drift pace with a fixation rate equal to the mutation rate instead of what one would expect from an engineer building a targeted sequence?
Mung on June 1, 2018 at 6:27 pm said:

Rumraket: It’s ad-hoc rulemaking and excuses all the way down.

He’s a good Darwinist.
Mung on June 1, 2018 at 6:38 pm said:

dazz: The degree of function increases because of selection, obviously

It’s the threshold that matters dazz. Go read the paper. Sequences are above it or not. The degree of function just tells us whether it is above or below.

The FI is calculated based on the threshold.

In WEASEL the minimum degree of function is a single letter matching the target. So all strings with at least one letter match are in the ensemble. How many are there and what proportion of all strings does that represent? That’s the FI.

Selection has nothing to do with it. Increasing degree of function has nothing to do with it.
Mung on June 1, 2018 at 6:56 pm said:

dazz: I asked this question a while back. How come, after 400 million years of non functionally constrained engineering, the designer just happened to settle for a sequence with significant similarity to it’s precursor’s?.

No one but you knows what you mean by “significant similarity.” Certainly gpuccio doesn’t know what you mean by it. gpuccio is always careful to show the absence of similarity because his reasoning depends on that.
dazz on June 1, 2018 at 6:57 pm said:

Mung: It’s the threshold that matters dazz. Go read the paper. Sequences are above it or not. The degree of function just tells us whether it is above or below.

Because it matters, you pick the degree of function the algo is at at each step. What’s the point of choosing any other one?

Mung: In WEASEL the minimum degree of function is a single letter matching the target. So all strings with at least one letter match are in the ensemble. How many are there and what proportion of all strings does that represent? That’s the FI.

Why wouldn’t sequences with no matches not be part of the ensemble? If the algo starts there, it works just fine. You seem to be quibbling at irrelevant stuff. What difference would it make if the fitness function was “# of matches plus 1”?
None at all.

Mung: Selection has nothing to do with it. Increasing degree of function has nothing to do with it.

It has everything to do with it, obviously
dazz on June 1, 2018 at 7:02 pm said:

Mung: No one but you knows what you mean by “significant similarity.” Certainly gpuccio doesn’t know what you mean by it. gpuccio is always careful to show the absence of similarity because his reasoning depends on that.

I mean statistically significant (over 4-5% for proteins?) similarity. Remember 400 million years of non functionally-constrained evolution / engineering is, going by puccio, enough to erase any trace of similarity. Are you going to give my questions a shot?
Mung on June 1, 2018 at 7:10 pm said:

Joe Felsenstein: The Weasel got there, without FI being calculated in any part of it.

I know. I want to know the FI.

But I think you meant “how do we get there in our analysis of FI?”

The threshold is changed.

I know that you changed the threshold in your example. But in WEASEL no one is poking around changing the fitness function. The minimum number of required matches for it to be “functional” doesn’t change.

No FI is being gained or lost, you’re simply moving the goalposts (redefining the minimum level of functionality – moving the threshold).
Mung on June 1, 2018 at 7:14 pm said:

dazz: Because it matters, you pick the degree of function the algo is at at each step. What’s the point of choosing any other one?

LoL. You just chose a different one and then ask what is the point of doing it?

I agree, what is the point of choosing any other minimum degree of function? Why are you doing it? You are painting the target around what you hit, you evil IDist you.

#TexasSharpshooterFallacy
Mung on June 1, 2018 at 7:22 pm said:

dazz: Why wouldn’t sequences with no matches not be part of the ensemble? If the algo starts there, it works just fine. You seem to be quibbling at irrelevant stuff.

You’re a funny guy dazz. What is the FI if all sequences are at or above the minimum degree of function. Did you read the paper?

I(Emin) = -log2(N/N) = -log2(1) = 0 bits
dazz on June 1, 2018 at 7:26 pm said:

Turns out it’s you who doesn’t understand FI. There’s no objective “minimum function” property in the ensemblento set the threshold to . If you set a constant threshold you obviously get a single result.
That you don’t see why it’s informative to calculate FI for each level of function at every step of the algo is your problem. Texas sharp shooting? WTF?
dazz on June 1, 2018 at 7:31 pm said:

Mung: You’re a funny guy dazz. What is the FI if all sequences are at or above the minimum degree of function. Did you read the paper?

I(Emin) = -log2(N/N) = -log2(1) = 0 bits

So fucking what? Are you assuming that sequences below the threshold are out of the ensemble?
DNA_Jock on June 1, 2018 at 7:42 pm said:

Mung: Can we start over from the beginning with a simplified example?

Sure, you seem like a nice guy, why not.
Consider the traditional Weasel.
There are 27^28 strings in the ensemble. This is invariant.
Our metric is “Closeness to the quote”, i.e. 28 – the Hamming distance to the quote.

Imagine a landscape of 27^28 pillars, representing all possible strings. All pillars have the same (square) cross-section.
Each pillar has a height (vertical, up) corresponding to our metric.
So, there is one pillar of height 28. All other pillars are shorter. Of these, there are 728 pillars of height 27.
If it helps, imagine that the pillars have been re-arranged to form a nice ziggurat: there’s a ground level at 0 (which constitutes ~35% of the total area), and then 28 steps up to the lone pillar at the top.
Got that visualized? Good.
Now, let’s consider a different way of describing height, let’s call it Area-at-or-above-this-contour (AAOATC). Instead of saying that the penultimate step is at height 27, we could say there are 728+1=729 pillars “this high or taller”, thus AAOATC = 729 at this step. Notice that, by definition, AAOATC cannot increase when height increases. For each altitude, we can list the vertical distance from the ground, or list the AAOATC. There’s a one-to-one correspondence for a given ziggurat.
Finally, to make everyone happy, we log-transform the AAOATC thus:
FI = logbase2(27^28) – logbase2(AAOATC)
Height zero has FI = zero.
Your “minimum function” is at a height of 1 and it has 0.62 bits of FI.
Corneel has plotted the relationship for you already.
Of course one changes the threshold as one goes along. One is merely using an alternative way of describing altitude. And of course that constitutes TSS.
People who avoid the TSS fallacy by NOT equating FI with a real-world probability:
Hazen
Szostak
Joe F
DNA_Jock
Dazz
Corneel
Tom English

People who are equating FI with a real-world probability and therefore committing the TSS fallacy.
gpuccio
colewd

People who understand, but enjoy sowing confusion
Mung

Glad to be of service.
Mung on June 1, 2018 at 7:46 pm said:

dazz: There’s no objective “minimum function” property in the ensemblento set the threshold to.

You don’t get an ensemble without setting the minimum degree of function threshold dazz.

Whether that’s “objective” or not is a red herring and I’ve already pointed out more than once that anyone can decide what minimum degree of function they want to use for the function they want to define. Do keep up.
Mung on June 1, 2018 at 7:48 pm said:

dazz: Are you assuming that sequences below the threshold are out of the ensemble?

LoL. Go read the paper dazz.

http://www.pnas.org/content/104/suppl_1/8574
Mung on June 1, 2018 at 7:49 pm said:

DNA_Jock: People who understand, but enjoy sowing confusion
Mung

Glad to be of service.

🙂

Are you sure you didn’t paint the target around me?
Mung on June 1, 2018 at 7:58 pm said:

DNA_Jock: Consider the traditional Weasel.
There are 27^28 strings in the ensemble. This is invariant.
Our metric is “Closeness to the quote”, i.e. 28 – the Hamming distance to the quote.

I agree there are 27^28 – I disagree that they all meet the minimum degree of function threshold, so I disagree that they are all in the ensemble. But we can discuss.

If there are no matches the hamming distance would be 0? So you would include even those as having a minimum degree of function? Got to run, but as I said to dazz, in this case, if all strings are included., the FI is 0.
Joe Felsenstein on June 1, 2018 at 8:01 pm said:

Mung: My bad. It’s side-trackery!

But Joe, asking how many matches is asking a question, not setting a threshold. You have to say how many matches are required to meet the minimum degree of function threshold.

Is it one match (DAWKINS)? Is it all but one match (FELSENSTEIN)?

Are you going to change the number of matches required at each iteration of the algorithm? That’s not how WEASEL works.

Weasel does its own thing, paying attention to matches. Our use of FI is just an attempt to see how far Weasel has gotten in finding “good” sequences, where “good”ness is the number of matches. The FI is just a way of expressing that in fancy informational language.

What problem does Mung have with that?
DNA_Jock on June 1, 2018 at 8:15 pm said:

Mung: I agree there are 27^28 – I disagree that they all meet the minimum degree of function threshold, so I disagree that they are all in the ensemble. But we can discuss.

If you have read Hazen, and still think that the ensemble is not 27^28, then there is little point in discussion. Check out the use of 4^n for RNAs and 26^10 for ten letter messages. Pretty clear that Hazen includes members with zero function in the denominator…
dazz on June 1, 2018 at 8:23 pm said:

Mung: LoL. Go read the paper dazz.

http://www.pnas.org/content/104/suppl_1/8574

From the paper:

“Typically, a small fraction, F(Ex), of all possible configurations of a system achieves at least the specified degree of function, ≥Ex. Accordingly, we define functional information in terms of F(Ex):
Embedded Image
Thus, in a system with N possible configurations (e.g., a sequence of n RNA nucleotides, which has N = 4n discrete possible sequences): where M(Ex) is the number of different configurations that achieves or exceeds the specified degree of function x, ≥Ex.”

No mention of configurations below the specified degree of function being out of the system. The system, or ensemble, is the totality of sequence space.

You’re obviously trying to wedge some made up “minimum function” into the system that simply isn’t there.

Again, if you want to calculate FI for the weasel, you need to use the weasel’s results, otherwise you’re just re-running the same stupid calculation over and over again:

You set the threshold at 1 match. What’s the FI? 0.5879 bits
The weasel reaches fitness 3!
What’s the FI for threshold 1? 0.5879 bits again! surprise!!
The weasel reaches fitness 13!
What’s the FI for threshold 1? 0.5879 bits again! surprise!!
The weasel reaches fitness 20!
What’s the FI for threshold 1? 0.5879 bits again! surprise!!
The weasel reaches fitness 24!
What’s the FI for threshold 1? 0.5879 bits again! surprise!!

….

I cook dinner, cheese burgers…
What’s the FI for threshold 1? 0.5879 bits again! OMFG! again! What are the odds?

ETA: correction, the FI for a threshold = 1match in the weasel is 0.6161 bits
petrushka on June 1, 2018 at 8:37 pm said:

dazz: Why do non functional / drifting sequences always evolve at the predicted drift pace with a fixation rate equal to the mutation rate instead of what one would expect from an engineer building a targeted sequence?

Just asking: Don’t functional sequences also drift, even if at a slower rate? I’m curious if there is list somewhere of sequences and rates of drift.
dazz on June 1, 2018 at 8:46 pm said:

petrushka: Just asking: Don’t functional sequences also drift, even if at a slower rate? I’m curious if there is list somewhere of sequences and rates of drift.

You’re asking the wrong guy, I’m a clueless hack, LOL
But I’m sure the pros can shed some light on that. I believe regulatory genes are pretty lenient to mutations and often evolve close to neutrally, and Larry Moran posted some calculations supporting the idea that neutral evolution is the null hypothesis, based on the human-chimp split
DNA_Jock on June 1, 2018 at 9:25 pm said:

colewd: If steps exist then over time these steps become visible from AA substitutions.

No; if the steps involved a selectable change in function, they won’t “become visible”.
dazz on June 1, 2018 at 9:35 pm said:

More from the paper:

Functional information is defined only in the context of a specific function x. For example, the functional information of a ribozyme may be greater than zero with respect to its ability to catalyze one specific reaction but will be zero with respect to many other reactions. Functional information therefore depends on both the system and on the specific function under consideration.

So much for the “doesn’t make sense to calculate FI for a sequence” trope

And here’s the complete paragraph, part of which Mung quoted here:

It is important to emphasize that functional information, unlike previous complexity measures, is based on a statistical property of an entire system of numerous agent configurations (e.g., sequences of letters, RNA oligonucleotides, or a collection of sand grains) with respect to a specific function. To quantify the functional information of any given configuration, we need to know both the degree of function of that specific configuration and the distribution of function for all possible configurations in the system. This distribution must be derived from the statistical properties of the system as a whole [as opposed, for example, to the statistical properties of populations evolving in a fitness landscape (37)]. Any analysis of the functional information of a specific functional sequence or object, therefore, requires a deep understanding of the system’s agents and their various interactions.

Which is exactly what Joe, Corneel and DNA_Jock have done with the weasel algo: at every step, you get a different configuration, and we calculate FI based on the degree of function of that specific configuration, by setting the threshold right there.
Mung on June 1, 2018 at 10:11 pm said:

dazz: Which is exactly what Joe, Corneel and DNA_Jock have done with the weasel algo: at every step, you get a different configuration, and we calculate FI based on the degree of function of that specific configuration, by setting the threshold right there.

Texas Sharps Shooter.
Joe Felsenstein on June 1, 2018 at 10:20 pm said:

Mung: Texas Sharps Shooter.

Not at all. Weaseloid programs’ behavior depends on the degree of matching, no matter where they actually end up going.
dazz on June 1, 2018 at 10:21 pm said:

Mung: Texas Sharps Shooter.

The ubiquitous creationist last resort, : “I know you are, but what am I?”

Often used after having his ass handed to him
Mung on June 1, 2018 at 10:53 pm said:

DNA_Jock: If you have read Hazen, and still think that the ensemble is not 27^28, then there is little point in discussion. Check out the use of 4^n for RNAs and 26^10 for ten letter messages. Pretty clear that Hazen includes members with zero function in the denominator…

Huh?

It’s pretty clear that the Hazen paper doesn’t use the term ensemble at all. And I have never stated that all of the possible sequences don’t matter or don’t figure into the calculation. They quite obviously do.

You’re clearly using the term ensemble to refer to something different than what I am using it to refer to. I think I’ve consistently used it to refer to the sequences above the threshold. I’ll go back though previous posts and see if I have somehow used the same term to refer to different things.

Hazen:

Functional information is determined by identifying the fraction of all sequences that achieve a specified outcome.

And the number of sequences that achieve a specified outcome are the numerator, and that’s what I am referring to.

If all your sequences achieve a specified outcome (i.e., they all have a hamming number that can be calculated for them) then what is your FI? You didn’t answer. You can answer that question no matter how right or wrong I may be.

Our metric is “Closeness to the quote”, i.e. 28 – the Hamming distance to the quote.

Every one of the sequences meets that metric. Right?
keiths on June 1, 2018 at 11:30 pm said:

DNA_Jock:

Our metric is “Closeness to the quote”, i.e. 28 – the Hamming distance to the quote.

Mung:

Every one of the sequences meets that metric. Right?

The metric is a metric, not a specific outcome. Think, Mung.
colewd on June 1, 2018 at 11:55 pm said:

DNA_Jock,

No; if the steps involved a selectable change in function, they won’t “become visible”.

Then you have a change in function if you can demonstrate your hypothetical. FI is information pointing to a specific function.
Mung on June 2, 2018 at 1:53 am said:

keiths: The metric is a metric, not a specific outcome.

So?
Mung on June 2, 2018 at 2:21 am said:

DNA_Jock: Your “minimum function” is at a height of 1 and it has 0.62 bits of FI.

And how long have I been saying that the “minimum function” is 1?

>ruby -e ‘puts -Math.log2(65.0/100)’
0.6214883767462701

So 65% of the possible randomly generated strings are “functional” in WEASEL.

If it helps, imagine that the pillars have been re-arranged to form a nice ziggurat:

That doesn’t help. That conjures up an image of easy to climb steps from the bottom to the top. I prefer to think of them as tiny islands in a vast sea not connected by any bridges. 😉

While your mental pictures (which I enjoyed) tell us how high these islands are, they do not tell us how wide they are or how far apart they are or how to get from one to the next. Clearly there are gaps because all these islands of the same height share the exact same FI and what reason do we have to believe taht jsut becasue they share the same height they are part of the same island?

It’s not like there is a step by step path to the top ala Dawkins’ Climbing Mount Impossible.

So do you think that as each generation comes and goes in a run of WEASEL that FI is being generated or is accumulating?

Thank your for your post.
Mung on June 2, 2018 at 2:23 am said:

dazz: Often used after having his ass handed to him

Learn to read, dazz.

DNA_Jock: Of course one changes the threshold as one goes along. One is merely using an alternative way of describing altitude. And of course that constitutes TSS.
Mung on June 2, 2018 at 2:28 am said:

Joe Felsenstein: Weaseloid programs’ behavior depends on the degree of matching, no matter where they actually end up going.

But I did not say that weaseloid programs in general commit the TSS. I said that dazz was doing so when he used the outcome to paint the target.
dazz on June 2, 2018 at 2:46 am said:

Mung: Learn to read, dazz.

obvious typo/ mishap. stop being pathetic dude
J-Mac on June 2, 2018 at 3:12 am said:

And the speculations continue…
I’m just curious: Who determines who is in the right?
Materialists would never allow for the majority to vote…
On the others hand ID can’t formulate their theory to the satisfaction of the majority… Maybe Intelligent Design is not adequate enough brand?
How about a change to something else, like Purposeful Design? Or something like that….
keiths on June 2, 2018 at 3:29 am said:

Mung, to DNA_Jock:

While your mental pictures (which I enjoyed) tell us how high these islands are, they do not tell us how wide they are or how far apart they are or how to get from one to the next. Clearly there are gaps because all these islands of the same height share the exact same FI and what reason do we have to believe taht jsut becasue they share the same height they are part of the same island?

It’s not like there is a step by step path to the top ala Dawkins’ Climbing Mount Impossible.

It’s Climbing Mount Improbable, and of course there are step-by-step paths to the top of the Weasel fitness landscape.

Damn, Mung. After all this time, you still don’t understand how Weasel works?
Joe Felsenstein on June 2, 2018 at 6:49 am said:

I sense that maybe Mung’s problem is that Mung thinks that when we say “function” we mean only nonzero function. That a Weasel phrase that has zero matches is somehow to be counted as dramatically different from one that has 1 match.
Tom English on June 2, 2018 at 9:28 am said:

Joe Felsenstein: Weaseloid programs’ behavior depends on the degree of matching, no matter where they actually end up going.

I in fact have Weaseloid programs that do no matching at all internally. They in fact do not store sequences internally. You know why that’s possible: in radical abstraction, there’s a Markov process over states 0, 1, …, L. If you think of those numbers as matches, well, that’s one interpretation. It wasn’t your preferred interpretation in “Wright, Fisher, and the Weasel.” What the program actually does to change state is sampling of binomial distributions, not matching of mutated sequences to a target sequence. You know the formal details. You perhaps have not considered that, given a “target” string at the end of a run, it’s easy to construct a sequence of parent strings that is consistent with the history of the Markov process.

So why am I making a big deal of this? I’ve been hearing for 15 years the inanity we’re hearing from gpuccio now, namely, that the Weasel program already has the target, and might as well just print it. As I said to Corneel earlier, the workings of the simulator are not the workings of the simulated process. I’m saying now that, because matching isn’t the essence of the process of cumulative selection — it’s just an image that Dawkins used in his book — there’s no need for matching of a target string in a Weaseloid program.
DNA_Jock on June 2, 2018 at 2:53 pm said:

Mung: That doesn’t help. That conjures up an image of easy to climb steps from the bottom to the top.

And in the specific case of my simple example, Weasel, that image is accurate.

I prefer to think of them as tiny islands in a vast sea not connected by any bridges.

Yes, you do. And we understand why that is your preference. 😉

While your mental pictures (which I enjoyed) tell us how high these islands are, they do not tell us how wide they are or how far apart they are or how to get from one to the next.

That’s right! We are discussing Functional Information, which is completely unaffected by such considerations!

Clearly there are gaps because all these islands of the same height share the exact same FI and what reason do we have to believe taht jsut becasue they share the same height they are part of the same island?

That’s a different topic of conversation, which involves multi-dimensional landscapes. I’m not sure I have the stamina to explain that to you.
vjtorley on June 2, 2018 at 3:58 pm said:

Hi RoyLT.

Unless we had some knowledge of how such things are made, the mathematics would not be very informative no matter how interesting.

Are you kidding me? If we found a signal from interstellar space consisting of a short, repeating sequence of the first 100 primes, you would NOT infer that an intelligent being sent that signal until you found out how it was produced?
newton on June 2, 2018 at 4:50 pm said:

vjtorley:
Hi RoyLT.

Are you kidding me? If we found a signal from interstellar space consisting of a short, repeating sequence of the first 100 primes, you would NOT infer that an intelligent being sent that signal until you found out how it was produced?

If anyone detected such a signal, the first step would be to eliminate all known man-made “ hows”. Knowing how something is not made often is the first step in knowing how it is made.
petrushka on June 2, 2018 at 5:40 pm said:

If we found an invisible pink unicorn we might posit magic.

What’s the point? Find it and then ask.
petrushka on June 2, 2018 at 5:46 pm said:

About signals from space. No likely method of transmission is powerful enough to carry information across interstellar space. Some hypothetical methods involve occluding entire stars.

So when meaningful signals have been detected, they have been from earth.

There are a few weird unknowns being examined.