Does gpuccio’s 150-safe ‘thief’ example validate the 500-bits rule?

At Uncommon Descent, poster gpuccio has expressed interest in what I think of his example of a safecracker trying to open a safe with a 150-digit combination, or open 150 safes, each with its own 1-digit combination. It’s actually a cute teaching example, which helps explain why natural selection cannot find a region of “function” in a sequence space in such a case.  The implication is that there is some point of contention that I failed to address, in my post which led to the nearly 2,000-comment-long thread on his argument here at TSZ. He asks:

By the way, has Joe Felsestein answered my argument about the thief?  Has he shown how complex functional information can increase gradually in a genome?

Gpuccio has repeated his call for me to comment on his ‘thief’ scenario a number of times, including here, and UD reader “jawa” taken up the torch (here and here), asking whether I have yet answered the thief argument), at first dramatically asking

Does anybody else wonder why these professors ran away when the discussions got deep into real evidence territory?

Any thoughts?

and then supplying the “thoughts” definitively (here)

we all know why those distinguished professors ran away from the heat of a serious discussion with gpuccio, it’s obvious: lack of solid arguments.

I’ll re-explain gpuccio’s example below the fold, and then point out that I never contested gpuccio’s safe example, but I certainly do contest gpuccio’s method of showing that “Complex Functional Information” cannot be achieved by natural selection.   gpuccio manages to do that by defining “complex functional information” differently from Szostak and Hazen’s definition of functional information, in a way that makes his rule true. But gpuccio never manages to show that when Functional Information is defined as Hazen and Szostak defined it, that 500 bits of it cannot be accumulated by natural selection.

The Thief Scenario

Here is gpuccio’s scenario, as most succinctly stated in comment #65 of his ‘Texas Sharpshooter post at UD (gpuccio gives there links to some earlier appearances of the scenario):

In essence, we compare two systems. One is made of one single object (a big safe). the other of 50 smaller safes.

The sum in the big safe is the same as the sums in the 150 smaller safes put together. that ensures that both systems, if solved, increase the fitness of the thief in the same measure.

Let’s say that our functional objects, in each system, are:

a) a single piece of card with the 150 figures of the key to the big
safe

b) 150 pieces of card, each containing the one figure key to one of the small safes (correctly labeled, so that the thief can use them directly).

Now, if the thief owns the functional objects, he can easily get the sum, both in the big safe and in the small safes.

But our model is that the keys are not known to the thief, so we want to compute the probability of getting to them in the two different scenarios by a random search.

So, in the first scenario, the thief tries the 10^150 possible solutions, until he finds the right one.

In the second scenario, he tries the ten possible solutions for the first safe, opens it, then passes to the second, and so on.

and gpuccio’s challenge is:

Do you think that the two scenarios are equivalent?

What should the thief do, according to your views?

My answers of course are, to the first question, “No”. To the second question, “Cheat, or hope that in this particular case you have an instance of the second scenario”.

My reaction: This is a nice teaching example showing why, in the first scenario, there is no hope of guessing the correct key, even once in the history of the universe.

In the first scenario there is no path to getting the safe open by successively opening parts of it. In the second case one can make an average of 5 guesses and open a safe, and when you have done this 150 times you get all the contents of the safes.

My question

Why did gpuccio think that I objected to gpuccio’s logic for the case which has nonzero function only in a set of sequences which is 10^{-150} of all sequences?  I was very clear in acknowledging that in such a case, natural selection cannot find the sequences that have nonzero function, if we start in a random sequence.

Repeating his argument does not help his case, because my point is not that this part of his argument is wrong. And this was made very clear in my previous post on gpuccio’s argument. But let me repeat, for those who did not happen read that post.

What gpuccio got wrong

Gpuccio had stated (here) that

… the idea is that if we observe any object that exhibits complex functional information (for example, more than 500 bits of functional information ) for an explicitly defined function (whatever it is) we can safely infer design.

Functional information was defined by Jack Szostak (2003) and by Szostak, Hazen, and Carothers (2007). It assumes that we have a set of molecular sequences (for example, the coding sequence of a single protein, or its amino acid sequences), and with each of them is associated a number, the function. For example, its ability to synthesize ATP if it is an ATP synthase.

In this original definition of FI, there is no assumption that almost all of the sequences have function zero.  Given that, there is no way to rule out the possibility that there are paths from many parts of the sequence space that lead to higher and higher function. And given that, we cannot eliminate the possibility that natural selection and mutation could follow such a path to arrive at the function that we observe in nature.

How gpuccio eliminates natural selection

Simply by changing the definition of Functional Information for gpuccio’s Complex Functional Information. And only calling it that when the amount of Function is zero for all sequences outside of the target set. That makes gpuccio’s definition of
Complex Functional Information very different from Szostak’s and Hagen’s.  Gpuccio’s restricted definition rules out all cases where there might be a path among sequences leading to the target sequences, a path which has function rising continually along that path.

This was explained clearly many times in my post and in the 2,000-comment thread that it generated. Apparently all that gpuccio and jawa can do is repeat their argument, one which does not show that 500 bits of Functional Information, defined as Szostak and Hagen define it, cannot be achieved by normal evolutionary processes such as natural selection.

ETA: corrected “150 bits” to “500 bits” (which is roughly 150 digits).

175 thoughts on “Does gpuccio’s 150-safe ‘thief’ example validate the 500-bits rule?

  1. Joe Felsenstein: Let’s look at Jack Szostak’s 2003 paper that first defined FI.

    I think there are 2 problems with this definition. The first is the arbitrary cut off for activity or nonactivity. I’m sure there are enzyme ‘knockouts’ that have residual activity but its not enough to be useful to the cell. In another context it might be enough for survival and in the earliest life very poor catalysts might have been fine since the competition didnt have anything better. Shouldnt any and all catalytic ability count as function? If thats the case functional space just got a lot bigger.

    The second problem concerns the notion of function independent of context. Proteins exert their function by binding specifically to things – substrates, small molecules, other proteins etc. In a sense every random polypeptide is functional in that there is in principle something in the universe in can bind to. Should a polypeptide be considered functional only if that other entity exists in the same compartment in the same cell? The vast majority of the 5 million or so antibodies in our lymph nodes at any one time will never bind to anything. Does that mean none of them are functional?

  2. RodW,

    RodW: In another context it might be enough for survival and in the earliest life very poor catalysts might have been fine since the competition didnt have anything better.

    Exactly. The first niche had to be devoid of competition.

    ETA other than intra-species, of course.

  3. RodW: In a sense every random polypeptide is functional in that there is in principle something in the universe in can bind to.

    Exactly^2

    You don’t know what function might exist till you test it, everything against everything!

  4. Joe Felsenstein,

    In conclusion, I see nothing in these papers that requires that there be some designated lower limit above zero for the function, which simply is any nonnegative number associated with the sequence. The sequence could be a haploid genome, and the function could be fitness itself.

    It may not require it but is this realistic?

  5. colewd: It may not require it but is this realistic?

    For the very earliest living organisms it is. See RodW’s comment just above.

  6. In a more recent comment gpuccio finally acknowledges that it is possible to choose fitness itself as the “function”.

    gpuccio then goes on to insist that this must be detectable fitness and cites posts where gpuccio argues that most changes won’t be detectable. (I’ll note that I am more aware than gpuccio of what population genetics says about what levels of change in fitness are detectable by natural selection).

    I said that the changes could be anywhere in the genome. OK, that is not quite true — I overstated. Not in the junk DNA that makes up much of our genomes (in spite of what most creationists and ID advocates argue). But, as others have already noted here, the changes could be in any region of the genome that is even moderately conserved, such as any coding sequence or the other regulatory sequences.

    The point is that when the “function” is fitness itself, a particularly relevant function, there are many places in the genome where changes in sequence can affect that. There is no definitive requirement for the changes to all be concentrated in one gene. I’m glad that gpuccio has come around on this point.

  7. colewd:

    Joe Felsenstein,

    In conclusion, I see nothing in these papers that requires that there be some designated lower limit above zero for the function, which simply is any nonnegative number associated with the sequence. The sequence could be a haploid genome, and the function could be fitness itself.

    It may not require it but is this realistic?

    Very. Genomes exist, and they have fitnesses. We get to discuss the genomes and their fitnesses if we want to. You can’t stop me (cackles maniacally).

  8. RodW: Though gpuccio wont seem to address this I think I can guess what he’d say if he were pressed on the issue. I think he would appeal to the work of Doug Axe and others to suggest that sequence space is mostly empty of function, at least empty enough that no continuous path to function exists. He and others might also appeal to the numerous anecdotes we all hear as undergrads of innumerable single aa substitutions knocking out function of a protein.
    This argument would also fail but it would turn on very detailed knowledge of protein structure/function/evolution and would require an expert with time on their hands. Still, I’m surprised gpuccio hasn’t made it- at least it would move the discussion forward.

    Gpuccio is lazier than you imagine.
    His argument is that it is his empirical observation that such pathways (from low activity to the observed super-duper activity) do not exist.
    This is a burden shift, in that — for his technique to be valid — such pathways must be impossible.
    Finally, he protects himself against data that prove him wrong (Keefe & Szostak 2001, Hayashi et al 2006, etc.) by insisting that these studies were designed and therefore demonstrate the power of, I kid you not, “intelligent selection”. Nothing to do with what natural selection might be capable of. No siree.

  9. RodW [discussing Szostak’s definition of FI – JF]: I think there are 2 problems with this definition. The first is the arbitrary cut off for activity or nonactivity …. Shouldnt any and all catalytic ability count as function? If thats the case functional space just got a lot bigger.

    Szostak is discussing all possible cutoff levels you might use. If we are discussing one particular sequence, we set the cutoff level at its level of function to compute its FI.

    The second problem concerns the notion of function independent of context. Proteins exert their function by binding specifically to things – substrates, small molecules, other proteins etc. In a sense every random polypeptide is functional in that there is in principle something in the universe in can bind to. Should a polypeptide be considered functional only if that other entity exists in the same compartment in the same cell? The vast majority of the 5 million or so antibodies in our lymph nodes at any one time will never bind to anything. Does that mean none of them are functional?

    You are bringing realism in, please stop. 😉

    This is an oversimplified model, of course. The issue is whether, gpuccio has an argument that, in that framework, shows that 500 bits of FI is a reliable indicator of Design. Which it isn’t.

    If gpuccio could show this generality to be true in the model, that would be worthy of note. But gpuccio has not done so.

  10. Joe Felsenstein,

    Very. Genomes exist, and they have fitnesses. We get to discuss the genomes and their fitnesses if we want to. You can’t stop me (cackles maniacally).

    So you’re starting point is a living functioning genome?

  11. DNA_Jock,

    This is a burden shift, in that — for his technique to be valid — such pathways must be impossible.

    If your argument is to force your opponent to prove a negative we are all waisting our time.

  12. Joe Felsenstein,

    If gpuccio could show this generality to be true in the model, that would be worthy of note.

    Thanks for returning this discussion to rational thought 🙂

  13. I agree that it is perfectly valid to treat the whole genome as the sequence space, and organism fitness as the function. I have been walking through the changes in FI as a gene gets duplicated, then one copy acquires a point mutation that (say) changes the enzyme’s pH optimum.
    The results are really goofy.
    Oh.
    I’ll get me coat.

  14. Joe Felsenstein,

    Given that, there is no way to rule out the possibility that there are paths from many parts of the sequence space that lead to higher and higher function.

    You have softened this prove a negative position in a recent comment. Gpuccio has explained since your post why natural selection is unlikely to help navigate through sequence space in certain protein configurations. You have not addressed his argument.

  15. colewd:
    Joe Felsenstein,

    I don’t see anything that challenges his claim and he has made comments since then.

    I’m sorry to hear you can’t see that. I was asking if he had any way of showing that his rule that 500 bits of (Szostak-Hazen) FI implied Design. He makes additional assumptions beyond theirs, so the answer is no.

    I haven’t seen anything said by gpuccio since then that would invalidate my argument.

  16. colewd: You have softened this prove a negative position in a recent comment. Gpuccio has explained since your post why natural selection is unlikely to help navigate through sequence space in certain protein configurations. You have not addressed his argument.

    Negatives can be proven. For example we can prove that you cannot trisect an angle using only a compass and a protractor. gpuccio argued that observation of 500 bits of Szostak/Hazen FI proved a negative, that it could not be achieved by natural selection and mutation, at least not in the time available in our universe.

    Unlikeliness “in certain protein configurations” is not the issue. A general proof of impossibility is the issue.

  17. Joe Felsenstein

    Yes, I have done so. In the OP.

    colewd:I don’t see anything that challenges his claim and he has made comments since then.

    You must have missed it:

    “…Apparently all that gpuccio and jawa can do is repeat their argument, one which does not show that 500 bits of Functional Information, defined as Szostak and Hagen define it, cannot be achieved by normal evolutionary processes such as natural selection…”

    Joe Felsenstein appealed to the neverending story of the unlimited and omnipotent natural selection…

    That’s how 1+1 = 3 because 1+1+ natural selection =3
    gpuccio is apparently aware of Joe’s claim:

    “What are the limits of Natural Selection? ”

    Mechanosensing and Mechanotransduction: how cells touch their world.

  18. Joe Felsenstein,

    Unlikeliness “in certain protein configurations” is not the issue. A general proof of impossibility is the issue.

    Gpuccio is inferring design not proving design. The only one working on a proof is Eric MH.

  19. Joe Felsenstein,

    Not necessarily. Are you starting from the assumption of a tardigrade genome?

    If you start from bacteria then the information jumps are enormous first to single celled eukaryotic cells and then to multicellular life. In each of these cases observable selectable steps are very large.

  20. Puccio on how he sets the threshold for his FI numerology:

    gpuccio: In particular, as I have said many times, and I hope you will notice sooner or later, when discussing the neo-darwinian algorithm the function must be at least as strong as to be naturally selectable.

    Since he uses extant highly conserved sequences to set the threshold, and he proclaims anything below that would be invisible to natural selection, he’s essentially right by definition. Creotard question begging at it’s best

  21. Gpuccio’s argument is little more than the coin-flip probability argument. Arriving at a pre-defined goal (opening a safe with random spins of the dial) is so improbable as to be almost impossible. But that assumes that there is a specific goal in mind, which there isn’t.

    It’s like arguing that a male and a female getting together and producing a child with a predefined DNA sequence for the entire genome is highly improbable. But that argument assumes that there is only one functional DNA sequence.

    He also plays the search space/islands of function card. What he neglects to understand is that a better analogy for these so called “islands of function” would be “wave-crests of function” within a storm tossed seascape.

  22. colewd:
    DNA_Jock,

    If your argument is to force your opponent to prove a negative we are all waisting our time.

    What you keep missing is that Puccio’s argument relies on such impossibility being true, because it’s nothing more than a negative argument. . IOW, your typical creationist bullshit

    The onus is on him, and even if he could show it’s impossible, we would not be justified in inferring “design”, just tha natural selection fails. This is basic reasoning that almost invariably escapes the religiously inclined

  23. Joe Felsenstein: The point is that when the “function” is fitness itself, a particularly relevant function, there are many places in the genome where changes in sequence can affect that.

    So how many possible sequences are we talking about? 4^3,000,000,000?

  24. Mung: So how many possible sequences are we talking about? 4^3,000,000,000?

    I don’t see how that is important. We might have, say, 100,000,000 places in the human genome where changes could affect fitness. but we won’t ever have all possible combinations of those variations around at the same time. My point, originally, was that changes in those places won’t, most of them, interact with each other strongly and act like a safe with a 200,000,000-bit key.

  25. gpuccio, here, now says that in the cases he considers, the sequences below the threshold could have some nonzero function, but low enough that “a function below that threshold would be irrelevant in the system”, and not “naturally selectable”.

    How low is that? For example if we have a population of size 1,000,000 individuals, which is not unreasonably big for lots of organisms, how small can a selection coefficient be and still have the allele by “naturally selectable’?

  26. Alan Fox,

    How do you know this?

    Bacteria- DNA mostly protein coding
    Yeast/Fungus- Introns, Exons, Chromosomes, cell nucleus
    Multicellular- Every thing in yeast fungus plus the ubiquitin system

    These are the observed facts. The information leaps are millions of bits for each step.

  27. Joe Felsenstein,

    I don’t see how that is important.

    You need the right order to build a multicellular vertebrate animal. None of us know how many of the 2 to 3 billion nucleotides are required. From a human perspective 3 billion bits to go from a zygote to a human is very efficient software 🙂

  28. colewd: From a human perspective 3 billion bits to go from a zygote to a human is very efficient software

    From my perspective, pointless stupid analogies are a very inefficient way of arguing

  29. colewd’s argument about jumps in complexity may be relevant to some argument, but certainly not to the issue of whether gpuccio can show that 500 bits of Szostak/Hazen FI implies Design.

    For that, it is way OT.

  30. Joe Felsenstein: gpuccio, here, now says that in the cases he considers, the sequences below the threshold could have some nonzero function, but low enough that “a function below that threshold would be irrelevant in the system”, and not “naturally selectable”.

    Yeah, but how would he know that? That’s just an assertion. Does Gpuccio know for a fact that the fitness landscape of any protein is such that the hill occupied by the extant function could not be reached from some other place in that landscape with a different function, and that the organisms that carry that sequence didn’t in fact evolve it from some other functional sequence?

    And how does he know how broad the base of the hill is, and what the selection coefficients for sequences around the base and lower parts of the slopes look?

    Unless he’s practically omniscient, how would he know these things? The number of sequences he would have to test experimentally, and the number of different genetic contexts in which he would have to test them, and the number of environments in which the host organism could live in is practically incalculable.

    What looks to us like a nonfunctional sequence in the context of the particular genome of the host organism and it’s extant environment, could have been a functional sequence 145 million years ago where other genes in that same organism’s genome were also different and therefore had different epistatic interactions with the pertinent sequence, while also interacting with factors in the environment that were also different back then.

    It is just not possible to look at something in an organism today and claim to know it could not have evolved. The 500 bits rule is bunk, it is a question begging assertion that could not even be validated experimentally in 20 times the known age of the universe. The number of sequence, genetic and environmental combinations that would have to be tested is an absurdity, and no amount of pseudo-mathematical jargon is going to be able to substantiate it.

  31. Well, gpuccio doesn’t think that natural selection can do much of anything, so once you have the 500 bits, you don’t need to worry about whether NS or NS+M could get you there, as it can’t get anything much of anywhere, in gpuccio’s opinion.

  32. colewd: Multicellular- Every thing in yeast fungus plus the ubiquitin system

    These are the observed facts.

    You can do your own alcoholic fermentation???

    You must be a happy man.

  33. So gpuccio dismisses the whole genome scenario for not having anything to do with the 500 bit threshold rule. Why? because, according to him, it involves a ‘rather simple’ function. That sounds a bit ambiguous, doesn’t it?
    But how is the function of a whole bacteria much simpler than any of its proteins? Is he making shit up as he goes along? certainly looks like he is

  34. gpuccio@UD

    I noticed that I belong to the select group of commenters that warrant serious attention. That is flattering indeed. Can I repeat Alan’ s invitation to continue the discussion here at TSZ? Other commenters are welcome too of course (is Origenes still there?).

    Some quick remarks:

    The protein not only was not naturally selectable, it was indeed deleterious.

    This is funny. If it was deleterious than the effect was naturally selectable, by definition. It just happended to have a negative fitness effect, not a positive one.

    a) I am just as aware that function is a continuous variable as Szostak and Corneel are.

    The 150-digit safe example does not communicate that 😉
    No, you are definitely trying to interpret continuous variables as dichotomous ones, as you show very clearly with the “no selectable effect” rhetoric.

    Do you believe in cooption? OK, jus show how it would work in this specific case. Suggest something real, something verifiable, not just vague and bad idea like cooption. Answer simple questions, like: what is coopted, and why?

    We discussed some nice examples in the previous thread, for example this one contributed by Rumraket here: evolution of a novel lactate dehydrogenase in Apicomplexa. The main beef I have with your take on examples like these is that you only count the FI of the AA that were mutated, but fail to see that the entire duplicated sequence counts towards the novel biological function.

    But I sense that Joe wants to keep this thread focused on your 500-bit limit rule, so you should focus on this: given the fact that Hazen-Szostak have defined FI for continuous traits with varying degrees of function and given the fact that you acknowledge that natural selection can increase the FI for such traits, how does the observation of 500 bits of functional information establish that a certain configuration cannot have been achieved by natural processes? It doesn’t, right? You need to first prove that these situations never arise in biological systems, right?

  35. Corneel: If it was deleterious than the effect was naturally selectable, by definition.

    The definition of deleterious is “naturally selectable”?

    Why would Gpuccio want to post here? So posters could tell him to fuck off, as long as they also said something else?

  36. phoodoo: The definition of deleterious is “naturally selectable”?

    Yep, in genetics it is: “A harmful, or deleterious, mutation decreases the fitness of the organism.”

    phoodoo: Why would Gpuccio want to post here? So posters could tell him to fuck off, as long as they also said something else?

    I hear what you are saying, but I’d like to think that as long as there are some constructive comments as well, the net result will be a “good-ish” experience.

  37. Puccio is now calling DNA_Jock and Joe fools, while making a fool of himself with his retarded burden tennis and that crap about artificial selection in the Keefe & Szostak 2001 experiments. Obviously, even if it was artificial selection, the point is that the experiment shows his claims about rare functionality in sequence space wrong. If he was right, this purported ‘artificial selection’ would have failed to produce any results. He’s forced to assume that we have no reason to believe that there would be positive natural selection for better ATP synthase in the wild, so that artificial selection can make better ATP synthase but natural selection somehow can’t. Hilariously stupid.

  38. dazz,

    The guy will reject anything contrary to any parts of his construct. If he allowed just for one mistake to be noted, he’d have to go back to the whiteboard, and he cannot afford it. So, going for something as stupid as “oh, but that was artificial selection” bothers him less than losing the whole construct he worked so hard to build.

    Also, imagine losing the admiration of all of those idiots who don’t understand what’s going on, but get mesmerized by words like information, amino-acids, blast, NCBI, design, ubiquitin, etc.

  39. And of course, if Puccio didn’t have a double standard so typical of creotard charlatans, he would realise that if we must show in extreme detail that there’s a gradual path in sequence space to get those complex systems, since he claims such paths don’t exist, he should provide equally convincing experimental evidence that these systems appeared fully formed. But no, he’s content with his IDiotic numerology. Heh

  40. Joe Felsenstein: I don’t see how that is important.

    If the number of sequences is not important then why did you say that the number of sequences is important?

    Take, for example, this post:

    Joe Felsenstein: The illustration is a cone, with a plane passing through it partway up. The fraction being discussed is the fraction of all sequences above the plane.

    Your version of FI seems to have little to nothing to do with the Hazen/Szostak version.

Leave a Reply