Does gpuccio’s 150-safe ‘thief’ example validate the 500-bits rule?

At Uncommon Descent, poster gpuccio has expressed interest in what I think of his example of a safecracker trying to open a safe with a 150-digit combination, or open 150 safes, each with its own 1-digit combination. It’s actually a cute teaching example, which helps explain why natural selection cannot find a region of “function” in a sequence space in such a case.  The implication is that there is some point of contention that I failed to address, in my post which led to the nearly 2,000-comment-long thread on his argument here at TSZ. He asks:

By the way, has Joe Felsestein answered my argument about the thief?  Has he shown how complex functional information can increase gradually in a genome?

Gpuccio has repeated his call for me to comment on his ‘thief’ scenario a number of times, including here, and UD reader “jawa” taken up the torch (here and here), asking whether I have yet answered the thief argument), at first dramatically asking

Does anybody else wonder why these professors ran away when the discussions got deep into real evidence territory?

Any thoughts?

and then supplying the “thoughts” definitively (here)

we all know why those distinguished professors ran away from the heat of a serious discussion with gpuccio, it’s obvious: lack of solid arguments.

I’ll re-explain gpuccio’s example below the fold, and then point out that I never contested gpuccio’s safe example, but I certainly do contest gpuccio’s method of showing that “Complex Functional Information” cannot be achieved by natural selection.   gpuccio manages to do that by defining “complex functional information” differently from Szostak and Hazen’s definition of functional information, in a way that makes his rule true. But gpuccio never manages to show that when Functional Information is defined as Hazen and Szostak defined it, that 500 bits of it cannot be accumulated by natural selection.

The Thief Scenario

Here is gpuccio’s scenario, as most succinctly stated in comment #65 of his ‘Texas Sharpshooter post at UD (gpuccio gives there links to some earlier appearances of the scenario):

In essence, we compare two systems. One is made of one single object (a big safe). the other of 50 smaller safes.

The sum in the big safe is the same as the sums in the 150 smaller safes put together. that ensures that both systems, if solved, increase the fitness of the thief in the same measure.

Let’s say that our functional objects, in each system, are:

a) a single piece of card with the 150 figures of the key to the big
safe

b) 150 pieces of card, each containing the one figure key to one of the small safes (correctly labeled, so that the thief can use them directly).

Now, if the thief owns the functional objects, he can easily get the sum, both in the big safe and in the small safes.

But our model is that the keys are not known to the thief, so we want to compute the probability of getting to them in the two different scenarios by a random search.

So, in the first scenario, the thief tries the 10^150 possible solutions, until he finds the right one.

In the second scenario, he tries the ten possible solutions for the first safe, opens it, then passes to the second, and so on.

and gpuccio’s challenge is:

Do you think that the two scenarios are equivalent?

What should the thief do, according to your views?

My answers of course are, to the first question, “No”. To the second question, “Cheat, or hope that in this particular case you have an instance of the second scenario”.

My reaction: This is a nice teaching example showing why, in the first scenario, there is no hope of guessing the correct key, even once in the history of the universe.

In the first scenario there is no path to getting the safe open by successively opening parts of it. In the second case one can make an average of 5 guesses and open a safe, and when you have done this 150 times you get all the contents of the safes.

My question

Why did gpuccio think that I objected to gpuccio’s logic for the case which has nonzero function only in a set of sequences which is $10^{-150}$ of all sequences?  I was very clear in acknowledging that in such a case, natural selection cannot find the sequences that have nonzero function, if we start in a random sequence.

Repeating his argument does not help his case, because my point is not that this part of his argument is wrong. And this was made very clear in my previous post on gpuccio’s argument. But let me repeat, for those who did not happen read that post.

What gpuccio got wrong

Gpuccio had stated (here) that

… the idea is that if we observe any object that exhibits complex functional information (for example, more than 500 bits of functional information ) for an explicitly defined function (whatever it is) we can safely infer design.

Functional information was defined by Jack Szostak (2003) and by Szostak, Hazen, and Carothers (2007). It assumes that we have a set of molecular sequences (for example, the coding sequence of a single protein, or its amino acid sequences), and with each of them is associated a number, the function. For example, its ability to synthesize ATP if it is an ATP synthase.

In this original definition of FI, there is no assumption that almost all of the sequences have function zero.  Given that, there is no way to rule out the possibility that there are paths from many parts of the sequence space that lead to higher and higher function. And given that, we cannot eliminate the possibility that natural selection and mutation could follow such a path to arrive at the function that we observe in nature.

How gpuccio eliminates natural selection

Simply by changing the definition of Functional Information for gpuccio’s Complex Functional Information. And only calling it that when the amount of Function is zero for all sequences outside of the target set. That makes gpuccio’s definition of
Complex Functional Information very different from Szostak’s and Hagen’s.  Gpuccio’s restricted definition rules out all cases where there might be a path among sequences leading to the target sequences, a path which has function rising continually along that path.

This was explained clearly many times in my post and in the 2,000-comment thread that it generated. Apparently all that gpuccio and jawa can do is repeat their argument, one which does not show that 500 bits of Functional Information, defined as Szostak and Hagen define it, cannot be achieved by normal evolutionary processes such as natural selection.

ETA: corrected “150 bits” to “500 bits” (which is roughly 150 digits).

175 thoughts on “Does gpuccio’s 150-safe ‘thief’ example validate the 500-bits rule?

  1. I should add a reply to another assertion of error that gpuccio has made: I had argued that when the function was fitness itself, changes in different parts of a genome could occur, each changing fitness relatively independently of the others, so that there is a path upwards on the fitness surface. gpuccio declared that this was wrong because function necessarily involves the properties of one local region of the genome.

    No, it doesn’t, in the case where function is fitness.

  2. Thanks to Joe F for finding time to respond to gpuccio. Let’s hope he notices. I’d pass on a heads-up myself but somehow, if I try to post at Uncommon Descent, my comments disappear without trace.

    I know gpuccio has declined to post here in the past but let me assure him that any comment (or opening post) he cares to make here* will remain visible and unaltered. He should also be reassured that at least two of our admins, Mung and vjtorley**, are, or have been, regular contributors to UD and will, I’m sure, ensure that he will be treated fairly.

    *subject to the rules, legal stuff, spam, porn excepted. Comments that step outside “address the post, not the poster” may be moved by admins to “Guano” but nevertheless will remain visible.

    **Not sure whether Vincent still has posting rights at UD.

  3. I don’t know what any of this means.

    Darwinism is clearly dead logically, but since no one can explain what the third way is, its really just rebranding, with a new slogan, and updated logo:

    “Evolution, we love it!” #metoo

  4. phoodoo: I don’t know what any of this means.

    I’m not a mathematician but it seems that gpuccio’s basic argument is that mathematical manipulation of numbers (something to do with counting similarities in nucleotide sequences) proves that evolution doesn’t work. His argument appears to rely on several assumptions such as strict determinism, uniqueness of function and the need to search all sequence space.

  5. Alan Fox: I’m not a mathematician but it seems that gpuccio’s basic argument is that mathematical manipulation of numbers (something to do with counting similarities in nucleotide sequences) proves that evolution doesn’t work. His argument appears to rely on several assumptions such as strict determinism, uniqueness of function and the need to search all sequence space.

    One argument gpuccio uses is that if we see a protein-coding-locus that exists in a large group and its sequence is fairly-well conserved there, that this means that there are no other “islands of function” possible for the function that this locus carries out. Because there has been lots of time for those to be searched for.

    This misunderstands what types of sequences will arise by mutation for at that locus. It also misunderstands how modestly worse fitness can hold a mutant to low gene frequency in a population.

  6. phoodoo: I don’t know what any of this means.

    Darwinism is clearly dead logically

    What do you understand by the word “Darwinism”?

  7. Joe Felsenstein:…there are no other “islands of function” possible for the function that this locus carries out.

    Yes, I hate* that analogy for two reasons.

    It overlooks that fitness landscapes are not fixed but dynamic and can change in an instant (bolide, anyone,) diurnally, seasonally, annually, cyclically, continental drift, other organisms…

    The niche is multidimensional, not just three plus time.

    *as in “irritated by”.

  8. Yes it looks like Gpuccio simply repeats himself, and his arguments pretty much amounts to a blind declaration that the sequence space of some particular protein can be represented as a spike, or isolated island in a vast sea of nonfunctionality that could not be reached from some other possible function, or smaller protein, or what have you.
    And then he wants to be proven wrong with experiment. So he has this conjecture about what sequence space is like, and he insists we should think this conjecture is true until we can falsity it by experiment, and until we do that we should believe that evolution is impossible.

    Sorry, that’s not how science works.

  9. My interest is not in the islands-of-function argument but in gpuccio’s repeated assertion that the 500-bits-of-FI rule is a general one for detecting Design. Which in effect asserts that fitness surfaces that have paths of gradually-increasing fitness leading to the “island” are impossible. Not just rare, impossible. Impossible for what reason? None, except that gpuccio restricts the term CFI to cases where such paths are impossible. To apply gpuccio’s rule, you have to do all the work.

  10. phoodoo: Can’t be answered, Alan is playing more games.

    Irony?

    No need to be coy, phoodoo. To me, Darwinism is shorthand for adaptive evolution by means of selection. Darwin was unaware of the gene as the vehicle for inheritance and of Mendel’s work and struggled to account for speciation. But his basic idea of adaptive change by selection still holds up.

    Disagree?

  11. Rumraket: But I’m asking you, not Alan.

    ?

    Did you know the word koi means carp in Japanese?

    Small historical fact. Alan once quit as a moderator. He is now an admin, who moderates. Interesting story.

  12. Alan Fox,

    Alan Fox: Irony?

    No need to be coy, phoodoo. To me, Darwinism is shorthand for adaptive evolution by means of selection. Darwin was unaware of the gene as the vehicle for inheritance and of Mendel’s work and struggled to account for speciation. But his basic idea of adaptive change by selection still holds up.

    Disagree?

    I believe you are an interpreter from underneath.

  13. Joe Felsenstein: My interest is not in the islands-of-function argument but in gpuccio’s repeated assertion that the 500-bits-of-FI rule is a general one for detecting Design. Which in effect asserts that fitness surfaces that have paths of gradually-increasing fitness leading to the “island” are impossible. Not just rare, impossible

    I’ll repeat myself too, why the hell not. Puccio doesn’t even understand that one obvious fact follows from that: those 500 bits and their corresponding system must have been poofed into existence in one fell swoop. That’s a weird corollary for someone who believes in common descent.

  14. Alan Fox: It overlooks that fitness landscapes are not fixed but dynamic and can change in an instant (bolide, anyone,) diurnally, seasonally, annually, cyclically, continental drift, other organisms…

    The interesting question is whether it changes in arbitrary ways. Like the content of the genome changes in arbitrary ways.

    Because as we all know arbitrary can’t be falsified. May as well say goddidit.

  15. Mung: The interesting question is whether it changes in arbitrary ways.

    I’d say the niche only changes within the limits of the laws of physics.

    Like the content of the genome changes in arbitrary ways.

    Again, there’s a limit to what change can be in any one step, different sequences of the same four nucleotides, rather than the appearance of a brand-new nucleotide. Too large a change and you entrain lethality or incompatibility (Hopeful Monsters).

  16. Mung: Because as we all know arbitrary can’t be falsified. May as well say goddidit.

    Sure. For those of a religious bent, seems a reasonable way of squaring the circle.

  17. Mung: The interesting question is whether it changes in arbitrary ways. Like the content of the genome changes in arbitrary ways.

    Because as we all know arbitrary can’t be falsified. May as well say goddidit.

    You can say that no matter what happens. How is the interesting question.

  18. @ Mung

    Knowing you can post at UD, I wonder if you might let gpuccio know that Joe F has put this OP up.

  19. Joe Felsenstein,

    Hi Joe
    I just posted your op on gpuccio’s last post. Hopefully he will respond shortly. If you strip away the burden shifts and the straw men I think you guys are actually close to common ground.

  20. phoodoo: Did you know the word koi means carp in Japanese?

    Small historical fact. Alan once quit as a moderator. He is now an admin, who moderates. Interesting story.

    I have no idea what that has to do with my question, and it looks like you’re trying to avoid giving an honest answer.

  21. Moved several comments to guano. Discussion regarding copyright breaches taking place in moderation issues.

  22. OK, so I see gpuccio has commented at UD on Professor Felsenstein’s OP. I was going to copy/paste them here but to avoid copyright issues I’ll just link to it and the previous one.

    Can I again extend an invitation to gpuccio to join us here, which would make dialogue very much more straightforward.

  23. Reading gpuccio’s comment:

    …FI has always been defined, in ID, as the ratio of the target space to the search space…

    Which, I think, misses several points, of which I’ll raise two that seem mosr damning. First we are still considering needle-point peaks on a flat unconnected (and static) landscape. There is no reward for being very close with your key; it will only fit or not. Second, you do not have to exhaustively search the whole search space for a perfect fit in biology. You can manage with what you have till you stumble on something a little better that puts you ahead of the competition. There’s nothing of that in gpuccio’s model.

  24. Alan Fox,

    Second, you do not have to exhaustively search the whole search space for a perfect fit in biology. You can manage with what you have till you stumble on something a little better that puts you ahead of the competition.

    This is what the target space is. Sequences that do not have a perfect fit but provide the function.

  25. A lot of old hat at UD. But some things just remain funny:

    gpuccio@UD

    So, let’s say that somewhere in the genome of the bacterium there is some random mutation that contributes to the fitmess of the organism, maybe by making it resistant to some ugly penicillin secreted by other species of bacteria. Let’s say it’s a mutation involving two AAs (I feel generous today), IOWs about 8.6 bits of FI.

    […]

    Are we anyway nearer to find our 500 bits of FI for that function? Of course not.

    We have gained 8.6 bits of FI, there is no doubt. But they are not related in any possible way to the 500 bits that we have to find.

    Eh, didn’t we just acquire in one go a complete protein that is capable of performing a function that wasn’t there before (penicillin resistance)? Why didn’t that count again? Ah right, we are “tweaking an existing function” so it is not “new and original information”.

    So what of the next comment

    quoting Joe:

    3. Now, in both gpuccio’s and your comments, the requirement is added that all this occur in one protein, in one change, and that it be “new and original function”.

    gpuccio:
    You say that I have added the requirement that this should happen “in one change”. This is completely false. Wherever have I said such a thing?

    Where indeed. I can’t imagine.

  26. colewd: This is what the target space is.

    No idea what you mean here, Bill. Can you explain? What is the target space?

    Sequences that do not have a perfect fit but provide the function.

    And, again.? What does “not perfect but provides a function” mean? Functionality is effectively a continuum from zero to best, so far. Perfectionism is for Platonists.

  27. Alan Fox: What does “not perfect but provides a function” mean?

    Heh, Bill has had his reboot thing again. He appears to have forgotten that he was denying at length that such a thing existed in the case of the ATP synthase beta subunit.

  28. As has been noted here, gpuccio has answered the post and some comments in this thread. You will find his response at Uncommon Descent, in the thread on “Mechanosensing and Mechanotransduction: How cells touch their world” from comment 67 on, including comments 68, 69, 60, 71, 72. Then a slight break and then comments 75, and 76. Mostly very long comments, so a lot to consider.

    I am busy today and reading all that will take some time.

    In the meantime, as others here read it, things to ask about:
    1. Is gpuccio talking about gpuccio’s definition of FI when CFI is assessed? Or is gpuccio using the original Szostak / Hagen definition?
    2. Do gpuccio’s arguments support the idea that there is a reliable rule for Szostak/Hazen functional information that observation of 500 bits of it implies Design?
    3. Can Szostak/Hazen FI be assessed for fitness? for traits like speed of flying? If so, can changes in multiple parts of the genome contribute to those?

  29. Alan Fox,

    What is the target space?

    The sequences that can fold and perform some level of defined function less then wild type. As you would define them, Alan, they are the needles in the haystack.

  30. Joe Felsenstein:
    No, it doesn’t, in the case where function is fitness.

    Also for more defined functions. Just as there’s plenty of experiments where bacterial genomes “take” different paths towards improved growth, there’s also plenty of experiments where proteins “take” different paths towards improved functions, like breaking down an antibiotic. Recombination after a round of selection works surprisingly well in improving an activity precisely because it mixes somewhat independent beneficial mutations. Many roads lead to Rome.

  31. My two cents

    Joe Felsenstein:
    1. Is gpuccio talking about gpuccio’s definition of FI when CFI is assessed? Or is gpuccio using the original Szostak / Hagen definition?

    Actually, I think that gpuccio is using the Szostak/Hazen definition of FI, but he is doing one of the two following things:
    a) confusing the boolean {x >= Th} with the underlying function x
    b) insisting that the function must only take two values, functional and non-functional
    He justifies this with his “empirical” claim that pathways to high FI do not exist. Hence his confusion at your construing that such pathways are “impossible”; he has shifted the burden of proof.

    2. Do gpuccio’s arguments support the idea that there is a reliable rule for Szostak/Hazen functional information that observation of 500 bits of it implies Design?

    In no way. His “argument” rests on a series of misconceptions. He is clueless.

    3. Can Szostak/Hazen FI be assessed for fitness? for traits like speed of flying? If so, can changes in multiple parts of the genome contribute to those?

    With great difficulty, and yes, respectively.

  32. DNA_Jock,

    3. Can Szostak/Hazen FI be assessed for fitness? for traits like speed of flying? If so, can changes in multiple parts of the genome contribute to those?

    With great difficulty, and yes, respectively.

    Can you explain how changes to multiple parts of the genome can contribute to fitness change?

  33. colewd: What misconceptions?

    The ones I explained to him in 2014.
    If there is anything I wrote in the thread “An attempt at computing dFSCI for English language” that you would like to dispute, go ahead.
    But be very specific.

  34. colewd: Can you explain how changes to multiple parts of the genome can contribute to fitness change?

    Well, by way of example, a knock-out mutation in any essential gene will be homozygous lethal.
    Yeah, I know, not the question you meant to ask.
    Happens a lot.

  35. gpuccio has added a bunch of comments again, but here is my take so far:

    Joe Felsenstein: 1. Is gpuccio talking about gpuccio’s definition of FI when CFI is assessed? Or is gpuccio using the original Szostak / Hagen definition?

    I think he is using the Szostak – Hazen definition, but is unable to perceive of function as continuous. Like DNA_Jock noted, he takes traits to be binary “it works”/ “it is broken” characters. He explicitely says as much here:

    Moreover, as you clearly understand the thief scenario, I hope that you agree that FI is computed for one object or set of objects that is necessary to implement a well defined function.

    This is very important. The function (which is a particular form of specification) is the criterion that generates a binary particion in the search space. After defining explicitly the function, we can in principle say for each object in the search space if it is part of the subset that we call the target space, or not.

    Anytime when he is confronted with the view that complex traits have some continuous scale, he acknowledges this, but tries to downplay its importance:

    As I have said many times, that’s where NS can act in some measure, because some ladder exists once a function is in place, that can gradually improve it. We have examples in the few documented cases of effextive NS: peniciliin resistance, chloroquine resistance, nylonase, and similar. But that measure is extremely limited.

    That makes his argument some variant of the irreducible complexity argument.

    2. Do gpuccio’s arguments support the idea that there is a reliable rule for Szostak/Hazen functional information that observation of 500 bits of it implies Design?

    It is just some completely arbitrary very big number. I believe gpuccio has conceded such

    3. Can Szostak/Hazen FI be assessed for fitness? for traits like speed of flying? If so, can changes in multiple parts of the genome contribute to those?

    His view of organismal function is simply not open to gradual improvement of continuous characters, which is why he rejects that fitness can be set as a function in the FI calculation. What gpuccio sees are not organisms, but machines or mechanical devices: modules of interlocking parts that need to come together in one go, or otherwise will not function. It is all watchmaker again.

  36. re. “new and original”

    If I want to infer design for some complex and functional protein that appears in vertebrate, I must consider a protein that did not exist before. IOWs, the sequence of the protein must show no significant homology with other proteins that existed in pre-vertebrates.

    Why? Because of course, if a protein already existed, with great part of the necessary sequence, in pre-vertebrates, then there has been no significant addition of FI in the transition to vertebrate.

    See how “weasel” is the concept? It’s so trivial that there should be no nees even to say it.

    Of course these are weasel words. What happended to “the function must be explicitly defined” that gpuccio insists on? He just used the escape hatch to deny that enormous increases in the FI of a single predefined function can be attained by co-option of existing sequences, like in the example of the evolution of beta-lactamases he used himself.

  37. Corneel: [quoting gpuccio] I must consider a protein that did not exist before.

    Tornado-in-junkyard scenario that isn’t in any way, shape or form analogous to variation and selection. If the model is wrong…

  38. Gpuccio’s restricted definition rules out all cases where there might be a path among sequences leading to the target sequences, a path which has function rising continually along that path.

    Though gpuccio wont seem to address this I think I can guess what he’d say if he were pressed on the issue. I think he would appeal to the work of Doug Axe and others to suggest that sequence space is mostly empty of function, at least empty enough that no continuous path to function exists. He and others might also appeal to the numerous anecdotes we all hear as undergrads of innumerable single aa substitutions knocking out function of a protein.
    This argument would also fail but it would turn on very detailed knowledge of protein structure/function/evolution and would require an expert with time on their hands. Still, I’m surprised gpuccio hasn’t made it- at least it would move the discussion forward.

  39. Let’s look at Jack Szostak’s 2003 paper that first defined FI.

    A new measure of information — functional information — is required to account for all possible sequences that could potentially carry out an equivalent biochemical function, independent of the structure or mechanism used. By analogy with classical information, functional information is simply $-\log_2$ of the probability that a random sequence will encode a molecule with greater than any given degree of function. For RNA sequences of length n, that fraction could vary from $4^{-n}$ if only a single sequence is active, to 1 if all sequences are active.

    The illustration is a cone, with a plane passing through it partway up. The fraction being discussed is the fraction of all sequences above the plane.

    Imagine a pile of DNA, RNA or protein molecules of all possible sequences, sorted by activity with the most active at the top. A horizontal plane through the pile indicates a given level of activity; as this rises, fewer sequences remain above it. The functional information required to specify that activity is $-log_2$ of the fraction of sequences above the plane.

    more …

  40. (continuing)

    To clarify the matter even more, let’s look at the 2007 paper by Hazen, Griffin, Carothers and Szostak . They give three examples of FI. One of those involves the probability that a fire department will respond to a given message with a fixed number of letters.

    Given these variants, on the order of 1,000 combinations of 10 letters might initiate a rapid response to the approximate location of the fire. Thus,

    $I(1) \approx -\log_2 \left\{ M(E_x) \big/ \left[\sum_1^n 26^n \right] \right\}$

    Numerous additional 10-letter sequences convey some relevant information but would result in a lower probability of response ($0 < E_x < 1$): “FIREHELPME,” “DANGERFIRE,” or “BURNINGNOW.” A lower degree of function, $E_x$, will generally correspond to a larger number of effective letter sequences, $M(E_x)$.

    No sign there of any lower limit — the probability of response could be nonzero for all sequences. (By the way Hazen et al. slightly correct Szostak’s original statement of FI, without discussion, by having it use the probability that the function is greater than or equal to the given degree of function. This eliminates some absurdities, such as infinite FI for the sequence of highest function, and nonzero FI if the threshold is set at zero.)

    In conclusion, I see nothing in these papers that requires that there be some designated lower limit above zero for the function, which simply is any nonnegative number associated with the sequence. The sequence could be a haploid genome, and the function could be fitness itself.

    So Szostak and Hazen’s FI is not the same as gpuccio’s FI, which adds additional requirements. Hence gpuccio has not given any argument that shows that 500 bits of FI implies Design.

  41. colewd: Can you explain how changes to multiple parts of the genome can contribute to fitness change?

    Sure. Pull out all the protein coding parts and observe.

Leave a Reply