Does gpuccio’s 150-safe ‘thief’ example validate the 500-bits rule?

At Uncommon Descent, poster gpuccio has expressed interest in what I think of his example of a safecracker trying to open a safe with a 150-digit combination, or open 150 safes, each with its own 1-digit combination. It’s actually a cute teaching example, which helps explain why natural selection cannot find a region of “function” in a sequence space in such a case.  The implication is that there is some point of contention that I failed to address, in my post which led to the nearly 2,000-comment-long thread on his argument here at TSZ. He asks:

By the way, has Joe Felsestein answered my argument about the thief?  Has he shown how complex functional information can increase gradually in a genome?

Gpuccio has repeated his call for me to comment on his ‘thief’ scenario a number of times, including here, and UD reader “jawa” taken up the torch (here and here), asking whether I have yet answered the thief argument), at first dramatically asking

Does anybody else wonder why these professors ran away when the discussions got deep into real evidence territory?

Any thoughts?

and then supplying the “thoughts” definitively (here)

we all know why those distinguished professors ran away from the heat of a serious discussion with gpuccio, it’s obvious: lack of solid arguments.

I’ll re-explain gpuccio’s example below the fold, and then point out that I never contested gpuccio’s safe example, but I certainly do contest gpuccio’s method of showing that “Complex Functional Information” cannot be achieved by natural selection.   gpuccio manages to do that by defining “complex functional information” differently from Szostak and Hazen’s definition of functional information, in a way that makes his rule true. But gpuccio never manages to show that when Functional Information is defined as Hazen and Szostak defined it, that 500 bits of it cannot be accumulated by natural selection.

The Thief Scenario

Here is gpuccio’s scenario, as most succinctly stated in comment #65 of his ‘Texas Sharpshooter post at UD (gpuccio gives there links to some earlier appearances of the scenario):

In essence, we compare two systems. One is made of one single object (a big safe). the other of 50 smaller safes.

The sum in the big safe is the same as the sums in the 150 smaller safes put together. that ensures that both systems, if solved, increase the fitness of the thief in the same measure.

Let’s say that our functional objects, in each system, are:

a) a single piece of card with the 150 figures of the key to the big
safe

b) 150 pieces of card, each containing the one figure key to one of the small safes (correctly labeled, so that the thief can use them directly).

Now, if the thief owns the functional objects, he can easily get the sum, both in the big safe and in the small safes.

But our model is that the keys are not known to the thief, so we want to compute the probability of getting to them in the two different scenarios by a random search.

So, in the first scenario, the thief tries the 10^150 possible solutions, until he finds the right one.

In the second scenario, he tries the ten possible solutions for the first safe, opens it, then passes to the second, and so on.

and gpuccio’s challenge is:

Do you think that the two scenarios are equivalent?

What should the thief do, according to your views?

My answers of course are, to the first question, “No”. To the second question, “Cheat, or hope that in this particular case you have an instance of the second scenario”.

My reaction: This is a nice teaching example showing why, in the first scenario, there is no hope of guessing the correct key, even once in the history of the universe.

In the first scenario there is no path to getting the safe open by successively opening parts of it. In the second case one can make an average of 5 guesses and open a safe, and when you have done this 150 times you get all the contents of the safes.

My question

Why did gpuccio think that I objected to gpuccio’s logic for the case which has nonzero function only in a set of sequences which is 10^{-150} of all sequences?  I was very clear in acknowledging that in such a case, natural selection cannot find the sequences that have nonzero function, if we start in a random sequence.

Repeating his argument does not help his case, because my point is not that this part of his argument is wrong. And this was made very clear in my previous post on gpuccio’s argument. But let me repeat, for those who did not happen read that post.

What gpuccio got wrong

Gpuccio had stated (here) that

… the idea is that if we observe any object that exhibits complex functional information (for example, more than 500 bits of functional information ) for an explicitly defined function (whatever it is) we can safely infer design.

Functional information was defined by Jack Szostak (2003) and by Szostak, Hazen, and Carothers (2007). It assumes that we have a set of molecular sequences (for example, the coding sequence of a single protein, or its amino acid sequences), and with each of them is associated a number, the function. For example, its ability to synthesize ATP if it is an ATP synthase.

In this original definition of FI, there is no assumption that almost all of the sequences have function zero.  Given that, there is no way to rule out the possibility that there are paths from many parts of the sequence space that lead to higher and higher function. And given that, we cannot eliminate the possibility that natural selection and mutation could follow such a path to arrive at the function that we observe in nature.

How gpuccio eliminates natural selection

Simply by changing the definition of Functional Information for gpuccio’s Complex Functional Information. And only calling it that when the amount of Function is zero for all sequences outside of the target set. That makes gpuccio’s definition of
Complex Functional Information very different from Szostak’s and Hagen’s.  Gpuccio’s restricted definition rules out all cases where there might be a path among sequences leading to the target sequences, a path which has function rising continually along that path.

This was explained clearly many times in my post and in the 2,000-comment thread that it generated. Apparently all that gpuccio and jawa can do is repeat their argument, one which does not show that 500 bits of Functional Information, defined as Szostak and Hagen define it, cannot be achieved by normal evolutionary processes such as natural selection.

ETA: corrected “150 bits” to “500 bits” (which is roughly 150 digits).

175 thoughts on “Does gpuccio’s 150-safe ‘thief’ example validate the 500-bits rule?

  1. Joe Felsenstein: …the sequences below the threshold could have some nonzero function, but low enough that “a function below that threshold would be irrelevant in the system”, and not “naturally selectable”.

    He forgets about the godlike powers of natural selection, but at least he’s talking sequences.

  2. Entropy: Also, imagine losing the admiration of all of those idiots who don’t understand what’s going on, but get mesmerized by words like information, amino-acids, blast, NCBI, design, ubiquitin, etc.

    That’s why I signed up. I’m working on an IDiot’s Hymnal and I’ll be sure to work those in. Thanks!

  3. Mung: godlike powers of natural selection

    That would be total, helpless impotence. Natural selection does work, buddy. It can’t do anything and everything but it’s good enough for the job

  4. Entropy: Also, imagine losing the admiration of all of those idiots who don’t understand what’s going on, but get mesmerized by words like information, amino-acids, blast, NCBI, design, ubiquitin, etc.

    Yeah, note how every fucking post of Puccio is like cookie cutter IDiot BS: A wall of words with a bunch of pretty pictures describing some complex stuff, with a final dictum “and all that must have been design”… that’s how ’empirical’ science is done, don’t you know?

  5. Corneel,

    Corneel: I hear what you are saying, but I’d like to think that as long as there are some constructive comments as well, the net result will be a “good-ish” experience.

    dazz: Puccio is now calling DNA_Jock and Joe fools, while making a fool of himself with his retarded burden tennis and that crap about artificial selection in the Keefe & Szostak 2001 experiments. Obviously, even if it was artificial selection, the point is that the experiment shows his claims about rare functionality in sequence space wrong. If he was right, this purported ‘artificial selection’ would have failed to produce any results. He’s forced to assume that we have no reason to believe that there would be positive natural selection for better ATP synthase in the wild, so that artificial selection can make better ATP synthase but natural selection somehow can’t. Hilariously stupid.

    Entropy: The guy will reject anything contrary to any parts of his construct. If he allowed just for one mistake to be noted, he’d have to go back to the whiteboard, and he cannot afford it. So, going for something as stupid as “oh, but that was artificial selection” bothers him less than losing the whole construct he worked so hard to build.

    Also, imagine losing the admiration of all of those idiots who don’t understand what’s going on, but get mesmerized by words like information, amino-acids, blast, NCBI, design, ubiquitin, etc.

    The Skeptical Zone in a nutshell.

    Mung gives you two thumbs up.

  6. Mung: The belief appears to be that if it is artificially selectable it must be naturally selectable.

    For example, we would see all the various breeds of dogs even without artificial selection if only the natural environment were to mimic artificial selection.

    We’re talking about levels of function as they pertain to FI. Why would lower levels of function than the ones we currently observe be invisible to natural selection?

  7. dazz,

    We’re talking about levels of function as they pertain to FI. Why would lower levels of function than the ones we currently observe be invisible to natural selection?

    Why would they be visible to natural selection 🙂

  8. colewd:
    dazz,

    Why would they be visible to natural selection

    Consider Lenski’s LTE. When aerobic growth on citrate was achieved, did it start at a low level of function? or at a high level of function worth tons of FI?

    If low (which I would think would be the case), then clearly low levels of function are visible to natural selection as the experiment shows.
    If high, then a bunch of mutations get you tons of FI.

    Either way, Puccio is wrong

  9. dazz,

    If low (which I would think would be the case), then clearly low levels of function are visible to natural selection as the experiment shows.

    In the Lenski experiment the enzyme that could utilize citrate already existed.

  10. Mung: He forgets about the godlike powers of natural selection

    I think you mean the natural selection-like powers of God. xD

  11. colewd:
    dazz,

    In the Lenski experiment the enzyme that could utilize citrate already existed.

    I don’t know if that’s true, but it’s completely irrelevant. A new function arised, that wasn’t previously present. Don’t make pathetic excuses Billy, I know that’s asking too much of you.

    From no function to aerobic growth on citrate fixed in the population(s). Low or high function Bill? Pick your poison

  12. Gpuccio@UD:

    (And, of course, imaginary cooption, which, even if it existed, would be random in relation to the function to be found, because a priori anything can be coopted for something, but there is no reason that this should favor a specific new function).

    Imaginary?

    Cooption (or exaptation if you will) happened in the Lenski Experiment. That’s how the citrate mutant evolved. The citrate transporter gene was coopted to transport citrate when it was duplicated downstream of a promoter active with oxygen present. That is a new molecular and functional association between two separate entities (the transporter gene, and the promoter region) that did not previously work together. Nothing imaginary about that. And yes it was random in relation to the function to be found, and yet it happened.

  13. dazz,

    I don’t know if that’s true, but it’s completely irrelevant.

    It is very relevant to the 500 bit rule.

  14. dazz: In the Lenski experiment the enzyme that could utilize citrate already existed.

    Any new protein that evolves will evolve from a DNA sequence that already exists, so your excuse here doesn’t work.

  15. OMagain,

    Normally you’d go on to explain why.

    If you follow gpuccio’s work you will see that he is looking at the appearance of new proteins or multi protein complexes that are preserved over long periods of time. If Lenski had found a new enzyme with a unique sequence that would be a legitimate challenge to gpuccio’s claims. Especially an enzyme with 400 plus amino acids such as the citrate enzyme. This would add credence to Joe’s claim of selectable steps from another sequence.

    We must remember at the end of the day all these experiments are bacteria and when you move to eukaryotic cells and multicellular eukaryotic the discussion becomes more complex.

  16. Hahaha, so Billy asks for evidence that low levels of function are selectable, and his response to the evidence is to dismiss it for not being a “a unique sequence” that according to pucci would mean lots of FI (high level of function), no one could make this shit up… except creationists, of course

  17. dazz,

    Hahaha, so Billy asks for evidence that low levels of function are selectable, and his response to the evidence

    What evidence Dizzy 🙂

  18. colewd: If Lenski had found a new enzyme with a unique sequence that would be a legitimate challenge to gpuccio’s claims.

    In fact you have it the wrong way around. Lenski is the one that needs to be challenged, as he is the one who has published.

    gpuccio’s claims are spread out over hundreds of posts on dozens of OP’s at UD. If he ever wants to get serious he knows how to do it. Collecting it all up into a single place would be a start.

    colewd: This would add credence to Joe’s claim of selectable steps from another sequence.

    It’s well know that the space is promiscuous with regard to functionality, which is why most enzymes can be terrible at their job and still be “successful”. Search for “enzyme promiscuity”. Not that it’ll matter. Seems to add weight to the idea of selectable steps, if there are plenty of possible steps.

  19. OMagain,

    It’s well know that the space is promiscuous with regard to functionality, which is why most enzymes can be terrible at their job and still be “successful”. Search for “enzyme promiscuity”. Not that it’ll matter. Seems to add weight to the idea of selectable steps, if there are plenty of possible steps.

    Selectable steps are not the issue at hand. By definition selectable steps exist above the defined functional threshold. The question is where they will drift to either toward higher function (where there are degrees of fitness) or away from function toward non function or toward purifying selection. I suspect all are possible depending on the function selected.

  20. Rumraket: I think you mean the natural selection-like powers of God. xD

    Nah. Natural Selection weeds out the bad. God doesn’t. Thankfully.

  21. DNA_Jock: We go high.

    Indeed! It’s not that we lack evidence or the ability to put it into compelling arguments and refutations. Insults tend to distract and blur the focus.

  22. colewd:
    If you follow gpuccio’s work you will see that he is looking at the appearance of new proteins or multi protein complexes that are preserved over long periods of time.

    Sorry, but no. he’s looking at proteins that he thinks were new appearances (I checked, they were not new, I didn’t bring it up because I thought that was an honest mistake and it doesn’t really matter). But that doesn’t matter here Bill, the issue now is that thing about 500 bits.

    colewd:
    If Lenski had found a new enzyme with a unique sequence that would be a legitimate challenge to gpuccio’s claims.

    Sorry, but even small increases in FI would challenge gpuccio’s claims about those 500 bits. After all, Lenski’s experiments had been running for how long compared to the age of life on the planet? How many organisms where selection happens all the time? So, say that Lenski’s citrate thing got 6 bits of new information. How much to get 500? 500/6=83. So, 83 things the size of the E coli experiment would easily get 500 bits.

    colewd:
    Especially an enzyme with 400 plus amino acids such as the citrate enzyme. This would add credence to Joe’s claim of selectable steps from another sequence.

    Joe is not merely claiming that there’s selectable steps from other sequences. This has been theoretically calculated, checked and followed in natural environments, witnessed in nature and in the clinic, experimentally confirmed, for example by checking how quickly new antibiotic resistance evolves in the lab, experimentally confirmed from reconstruction of ancestral sequences and checking their activities against different substrates, confirmed by sequence comparison, confirmed from calculation of synonymous and non-synonymous substitution rates, etc.

    ETA: Joe has shown calculations time and again in this forum. I suspect that you don’t accept those calculations because of two main reasons: 1. you don’t understand any of it, and 2. you cannot even imagine that gpuccio might be wrong, despite Joe being an actual expert in sequence analyses, and gpuccio making mistakes often as obvious as confusing random mutations with ordered changes in a sequence.

  23. Alan Fox: Indeed! It’s not that we lack evidence or the ability to put it into compelling arguments and refutations. Insults tend to distract and blur the focus.

    What topic is this?

    Are people allowed to respond (other than you and Jock I mean)?

  24. gpuccio has declared (repeatedly) that gpuccio’s “definition of FI is abso[l]utely the same as Szostak’s”. Yet when I discuss the case where fitness is the function under discussion, and changes in it can occur in many places in the genome, gpuccio agrees that this is one possible function that could be discussed, but then dismisses that as “not a complex function”.

    The last I heard, if you state a general rule, it should apply to all cases, and can be tested by any one of those cases. To be general, it must work in all of them.

    gpuccio seems to be taking Szostak’s example of a cone with a plane passing through it as setting a threshold, below which natural selection is ineffective.

    gpuccio’s views on this are amply on display here, here, and here.

    gpuccio’s reluctance to conclude that differences in fitness are “detectable” by natural selection is also on display there.

    My interpretation of Szostak’s definition is that it can be applied to any sequence, and the FI can be computed for that sequence, no matter how high or low the value of the function is. There is no step in Szostak’s (or Hagen’s) definition that corresponds to setting a threshold below which natural selection will be ineffective. Passing the plane through the cone (in Szostak’s visual analogy) can be done at any height, and is done to obtain a value for the sequences at that height.

    gpuccio’s denial that 500 bits of FI could accumulate in such a case boils down to a simple rejection of natural selection.

  25. Joe Felsenstein: gpuccio’s denial that 500 bits of FI could accumulate in such a case boils down to a simple rejection of natural selection.

    What does natural selection have to do with opening safes?

  26. Hey Mung…

    Joe Felsenstein: Passing the plane through the cone (in Szostak’s visual analogy) can be done at any height, and is done to obtain a value for the sequences at that height.

    wink, wink, nudge nudge 🙂

  27. Joe Felsenstein,

    My interpretation of Szostak’s definition is that it can be applied to any sequence, and the FI can be computed for that sequence, no matter how high or low the value of the function is. There is no step in Szostak’s (or Hagen’s) definition that corresponds to setting a threshold below which natural selection will be ineffective.

    If you set the threshold to the point where below it the protein cannot fold I would say that natural selection would not be ineffective by definition. In the same way if it cannot bind to its target proteins again natural selection would be ineffective.

  28. colewd: If you set the threshold to the point where below it the protein cannot fold I would say that natural selection would not be ineffective [sic] by definition. In the same way if it cannot bind to its target proteins again natural selection would be ineffective.

    Ignoring the double negative, such a threshold does not exist. Here’s why: ALL polypeptides can “fold”. Most of them will “fold” to be highly structurally polymorphic. Many of them will then aggregate. None of this makes them magically invisible to natural selection. For example: maybe, for a given primary sequence, 5% of the polypeptides fold such that their N-terminal half creates a biotin-binding site. As a result, the DNA sequence that codes for this polypeptide is selectable, despite the fact that it produces 95% polymorphic junk.

  29. DNA_Jock,

    None of this makes them magically invisible to natural selection.

    What makes them invisible is when they don’t function. They are invisible until they find function. The problem is when function is 150 bits smaller the the non function where they currently reside.

  30. DNA_Jock: ALL polypeptides can “fold”.

    Including RNA. so what is in one case a folding problem in another case is an unfolding problem. 🙂

  31. Mung: Including RNA. so what is in one case a folding problem in another case is an unfolding problem.

    The folding problem is a scientific problem, namely, the problem of predicting the protein’s 3D structure from the sequence alone. The proteins, though, have no problem folding.

    In the case of RNA it’s a bit easier to predict the secondary and 3D structures. A bit. Again, RNA has no problem folding.

    If there’s an “official” unfolding problem for RNA, I don’t know.

    By the way, I recently learned that google has been advancing quite a bit in solving the protein folding problem using machine learning, with promising results.

  32. colewd: What makes them invisible is when they don’t function. They are invisible until they find function. The problem is when function is 150 bits smaller the the non function where they currently reside.

    They all have “function”. Everything does something.

  33. Mung: What does natural selection have to do with opening safes?

    Based on this description of the two scenarios, I understood Joe to be saying that the first was not at alike evolution, whereas the second captured something important about evolution.

    So, in the first scenario, the thief tries the 10^150 possible solutions, until he finds the right one.

    In the second scenario, he tries the ten possible solutions for the first safe, opens it, then passes to the second, and so on.

    That something important that is captured by the second case is that evolution can be accumulate; whereas a model like case 1 that just multiplies probabilities as if events were independent is a model that is wrong for evolution.

    And to answer your question about math before it is asked, try
    Markovian Models of Evolution, eg page 5.

    Note that my post assumes yours was a serious question about safes, not a setup for a semi-humorous equivocation such as: Humans activities affect evolution in that they affect reproductive success since eg religion is correlated with use of condoms.

  34. BruceS: Based on this description of the two scenarios, I understood Joe to be saying that the first was not at alike evolution, whereas the second captured something important about evolution.

    I was agreeing with gpuccio, in that opening all the safes is not hard in the second case, but in the first case (one safe) it would take far too long to open the safe by guessing the combination. Combinations do not reward being partially correct. When fitness does, success becomes much more possible.

  35. Mung: Nah. Natural Selection weeds out the bad. God doesn’t. Thankfully.

    Except for the first born of Egypt, everybody on earth except Noah’s family, and hundreds of other examples.

  36. Entropy: By the way, I recently learned that google has been advancing quite a bit in solving the protein folding problem using machine learning, with promising results.

    Ooh! Do tell!

  37. BruceS: That something important that is captured by the second case is that evolution can be accumulate; whereas a model like case 1 that just multiplies probabilities as if events were independent is a model that is wrong for evolution.

    But surely some events in evolution are independent.

  38. Entropy: Check it here.

    🙂

    Thanks for that.

    Now here is a tool (I assume it can’t yet do it reliably) that could answer the rarity question. First calibrate with known sequences with known functions. Feed these sequences to the algorithm to confirm its reliability and then we’re off. The possibilities are almost endless in that we could try variation of known proteins and check randomly generated novel proteins for tertiary structure without having to synthesise but with the opportunity to synthesise and test likely examples.

    This could settle the “islands of function” assertion for good and all.

  39. Looking in the thread where gpuccio was, to some extent, responding to points here, he seems to have given up on getting us Skeptics to see the light. About the last comment that addressesJoe and others here is this one from 4th December. A couple of remarks caught my eye.

    6) Biological proteins are complex machines. Most of them are well beyond the proposed threshold of 500 bits, whatever the method we use to evaluate indirectly their FI. That should however be obvious to anyone who is still capable of sound reasoning: how can anyone even imagine that a protein hundreds (or thousands) of AAs long, highly conserved, that folds in a very complex and specific way to generate extremely sophisticated molecular mechanisms, may be realized with less than 500 bits of information? When each specific AA site counts for 4.3 bits?

    I’m still not clear whether gpuccio is counting all-at-once” rather than stepwise in many rounds of variation and selection.

    7) A single complex protein falsifies the neo-darwinian theory. But of course there is much more than that in the biological worlds. Protein superfamilies are about 2000. Functional non coding RNAs are the true protagonists of the last years of research. Regulatory networks exhibit FI beyond our conceptions. And maybe if our interlocutors took the time to read, say, the OP here, or anything else equivalent, without prejudice and dogma, they could start to consider questions that they have been avoiding in their whole life.

    Still seems the usual “ooh, how complex!” argument. Apparently, gpuccio considers this a badge of honour. He seems to think Paley’s Watch is irrefutable.

    Ah well.

  40. Whatever happened to the notion that any sequence has a significant chance of being “functional”?

  41. gpuccio (and Kirk) are convinced (despite all the evidence to the contrary) that functional landscapes are sufficiently “mesa-like” that they are justified in ignoring slopes altogether. This is why I love referring him to Hayashi et al 2006, with its sloped foothills
    😮

    Pace Alan, I don’t think that computer folding algorithms will answer the rarity question. Szostak’s initial binders had 5 – 15% “yield”; that is, only 5 – 15% of any pure individual polypeptide was able to bind ATP, and even those may well of bound ATP via an induced fit mechanism. For any given (crappy) primary sequence, you will generate an enormous mixture of differently folded members. If a few of them can be induced to fit a ligand of interest, then you have function.

  42. DNA_Jock: Pace Alan, I don’t think that computer folding algorithms will answer the rarity question.

    Predictions about the future are bound to fail. I probably didn’t make my point clearly enough. Given* an algorithmic program that can predict tertiary structure from a sequence, in principle we can test any sequence for folding consistency without having to synthesise it.

    I suppose we also need to be at the stage of being able to make predictions of functionality once having established the tertiary structure. Maybe we need an approach of thinking of a potential function, designing a tertiary structure that might fulfil that function, then use the algorithm to establish the sequence that produces the designed tertiary sequence. Computer modelling rather than synthesising at random should speed up things considerably. It should also establish that sequence to function is many-to-one rather than one-to-one (or not, but I’m an optimist).

    *well let’s hope soon!

Leave a Reply