Gpuccio’s Theory of Intelligent Design

Gpuccio has made a series of comments at Uncommon Descent and I thought they could form the basis of an opening post. The comments following were copied and pasted from Gpuccio’s comments starting here


To onlooker and to all those who have followed thi discussion:

I will try to express again the procedure to evaluate dFSCI and infer design, referring specifically to Lizzies “experiment”. I will try also to clarify, while I do that, some side aspects that are probably not obvious to all.

Moreover, I will do that a step at a time, in as many posts as nevessary.

So, let’s start with Lizzie’s “experiment”:

Creating CSI with NS
Posted on March 14, 2012 by Elizabeth
Imagine a coin-tossing game. On each turn, players toss a fair coin 500 times. As they do so, they record all runs of heads, so that if they toss H T T H H H T H T T H H H H T T T, they will record: 1, 3, 1, 4, representing the number of heads in each run.

At the end of each round, each player computes the product of their runs-of-heads. The person with the highest product wins.

In addition, there is a House jackpot. Any person whose product exceeds 1060 wins the House jackpot.

There are 2500 possible runs of coin-tosses. However, I’m not sure exactly how many of that vast number of possible series would give a product exceeding 1060. However, if some bright mathematician can work it out for me, we can work out whether a series whose product exceeds 1060 has CSI. My ballpark estimate says it has.

That means, clearly, that if we randomly generate many series of 500 coin-tosses, it is exceedingly unlikely, in the history of the universe, that we will get a product that exceeds 1060.

However, starting with a randomly generated population of, say 100 series, I propose to subject them to random point mutations and natural selection, whereby I will cull the 50 series with the lowest products, and produce “offspring”, with random point mutations from each of the survivors, and repeat this over many generations.

I’ve already reliably got to products exceeding 1058, but it’s

possible that I may have got stuck in a local maximum.

However, before I go further: would an ID proponent like to tell me whether, if I succeed in hitting the jackpot, I have satisfactorily refuted Dembski’s case? And would a mathematician like to check the jackpot?

I’ve done it in MatLab, and will post the script below. Sorry I don’t speak anything more geek-friendly than MatLab (well, a little Java, but MatLab is way easier for this)


Now, some premises:

a) dFSI is a very clear concept, but it can be expressed in two different ways: as a numeric value (the ratio between target space and search space, expressed in bits a la Shannon; let’s call that simply dFSI; or as a cathegorical value (present or absent), derived by comparing the value obtained that way with some pre define threshold; let’s call that simply dFSCI. I will be specially careful to use the correct acronyms in the following discussion, to avoid confusion.

b) To be able to discuss Lizzie’s example, let’s suppose that we know the ratio of the target space to the search space in this case, and let’s say that the ratio is 2^-180, and therefore the functional complexity for the string as it is would be 180 bits.

c) Let’s say that an algorithm exists that can compute a string whose product exceeds 10^60 in a reasonable time.

If these premises are clear, we can go on.

Now, a very important point. To go on with a realistic process of design inference based on the concept of functionally specified information, we need a few things clearly definied in any particulare example:

1) The System

This is very important. We must clearly define the system for which we are making the evaluation. There are different kinds of systems. The whole universe. Our planet. A lb flask. They are different, and we must tailor our reasoning to the system we are considering.

For Lizzie’s experiment, I propose to define the system as a computer or informational system of any kind that can produce random 500 bits strings at a certain rate. For the experiment to be valid to test a design inference, some further properties are needes:

1a) The starting system must be completely “blind” to the specific experiment we will make. IOWs, we must be sure that no added information is present in the system in relation to the specific experiment. That is easily realized by having the system assembled by someone who does not know what kind of experiment we are going to make. IOWs, the programmer of the informational system just needs to know that we need random 500 bits string, but he must be completely blind to why we need them. So, we are sure that the system generates truly random outputs.

1b) Obviously, an operator must be able to interact with the system, and must be able to do two different things:

– To input his personal solution, derived from his presonal intelligent computations, so that it appears to us observers exactly like any other string randomly generated by the system.

– To input in the system any string that works as an executable program, whose existence will not be known to us observers.


2) The Time Span:

That is very important too. There are different Time Spans in different contexts. The whole life of the universe. The life of our planet. The years in Lenski’s experiment.

I will define the Time Span very simply, as the time from Time 0, which is when the System comes into existence, to Time X, which is the time at which we observe for the first time the candidate designed object.

For Lizzie’s experiment, it is the time from Time 0 when the specific informational system is assembled, or started, to time X, when it outputs a valid solution. Let’s say, for instance, that it is 10 days.


3) The specified function

That is easy. It can be any function objectively defined, and objectively assessable in a digital string. For Lizzies, experiment, the specified function will be:

Any string of 500 bits where the product calculated as described exceeds 10^60


4) The target space / search space ratio, expressed in bits a la Shannon. Here, the search space is 500 bits. I have no idea how big the target space is, and apparently neither does Elizabeth. But we both have faith that a good mathemathician can compute it. In the meantime, I am assuming, just for discussion, that the target space if 320 bits big, so that the ratio is 180 bits, as proposed in the premises.

Be careful: this is not yet the final dFSI for the observed string, but it is a first evaluation of its higher threshold. Indeed, a purely random System can generate such a specified string with a probability of 1:2^180. Other considerations can certainly lower that value, but not increase it. IOWs, a string with that specification cannot have more than 180 bits of functional complexity.


5) The Observed Object, candidate for a design inference

We must observe, in the System, an Object at time X that was not present, at least in its present arrangement, at time 0.

The Observed Object must comply with the Specified Function. In our experiment, it will be a string with the defined property, that is outputted by the System at time X.

Therefore, we have already assessed that the Observed Object is specified for the function we defined.


6) The Appropiate Threshold

That is necesary to transorm our numeric measure of dFSI into a cathegorical value (present / absent) of dFSCI.

In what sense the threshold has to be “appropriate”? That will be clear, if we consider the purpose of dFSCI, which is to reject the null hypothesis if a random generation of the Oberved Object in the System.

As a preliminary, we have to evaluate the Probabilistic Resources of the system, which can be easily defined as the number of random states generated by the System in the Time Span. So, if our System generates 10^20 randoms trings per day, in 10 days it will generate 10^21 random strings, that is about 70 bits.

The Threshold, to be appropiate, must be of many orders of magnitude higher than the probabilistic resources of the System, so that the null hypothesis may be safely rejected. In this particular case, let’s go on with a threshold of 150 bits, certainly too big, just to be on the safe side.

7) The evaluation of known deterministic explanations

That is where most people (on the other side, at TSZ) seem to become “confused”.

First of all, let’s clarify that we have the duty to evaluate any possible deterministic mechanism that is known or proposed.

As a first hypothesis, let’s consider the case in which the mechanism is part of the System, from the start. IOWs the mechanism must be in the System at time 0. If it comes into existence after that time because of the deterministic evolution of the system itself, then we can treat the whole process as a deterministic mechanism present in the System at time 0, and nothing changes.

I will treat separately the case where the mechanism appears in the system as a random result in the System itself.

Now, first of all, have we any reason here to think that a deterministic explanation of the Observed Object can exist? Yes, we have indeed, because the nature itself of the specified function is mathemathical and algorithmic (the product of the sequences of heads must exceed 10^60). That is exactly the kind of result that can usually be obtained by a deterministic computation.

But, as we said, our System at time 0 was completely blind to the specific problem and definition posed by Lizzie. Therefore, we can be safely certain that the system in itself contains not special algorithm to compute that specific solution. Arguing that the solution could be generated by the basic laws physics is not a valid alternative (I know, some darwinist at TSZ will probably argue exactly that, but out of respect for my intelligence I will not discuss that possibility).

So, we can more than reasonably exclude a deterministic explanation of that kind for our Observed Object in our System.

7) The evaluation of known deterministic explanations (part two)

But there is another possibility that we have the duty to evaluate. What if a very simple algorithm arose in the System by random variation)? What if that very simple algorithm can output the correct solution deterministically?

That is a possibility, although a very ulikely one. So, let’s consider it.

First of all, let’s find some real algorithm that can compute a solution in reasonable time (let’s say less than the Time Span).

I don’t know if such an algorithm exists. Im my premise c) at post #682 I assumed that it exists. Therefore, let’s imagine that we have the algorithm, and that we have done our best to ensure that it is the simplest algorithm that can do the job (it is not important to prove that mathemathically: it’s enough that it is the best result of the work of all our mathemathician friends or enemies; IOWs, the best empirically known algorithm at present).

Now we have the algorithm, and the algorithm must obviously be in the form of a string of bits that, if present in the System, wil compute the solution. IOWs, it must be the string corresponding to an executable program appropriate for the System, and that does the job.

We can obviously compute the dFSI for that string. Why do we do that?

It’s simple. We have now two different scanrios where the Observed Object could have been generated by RV:

7a) The Observed Object was generated by the random variation in the System directly.

7b) The Observed Object was computed deterministically by the algorithm, which was generated by the random variation in the System.

We have no idea of which of the two is true, just as we have no idea if the string was designed. But we can compute probabilities.

So, we compute the dFSI of the algorithm string. Now there are two possibilities:

– The dFSI for the algorithm string is higher than the tentative dFSI we already computed for the solution string (higher than 180 bits). That is by far the most likely scenarion, probably the only possible one. In this case, the tentative value of dFSI for the solution string, 180 bits, is also the final dFSI for it. As our threshold is 150 bits, we infer design for the string.

– The dFSI for the algorithm string is lower than the tentative dFSI we already computed for the solution string (lower than 180 bits). There are again two possibilities. If it is however higher than 150 bits, we infer design just the same. If it is lower than 150 bits, we state that it is not possible to infer design for the solution string.

Why? Because a purely random pathway exists (through the random generation of the algorithm) that will lead deterministically to the generation of the solution string, with a total probability of the whole process which is higher than our threshold (lower than 150 bits).


8) Final considerations

So, some simple answers to possible questions:

8a) Was the string designed?

A: We infer design for it, or we infer it not. In science, we never know the final truth.

8b) What if the operator inputted the string directly?

A: Then the string is designed by definition (a conscious intelligent being produced it). If we inferred design, our inference is a true positive. If we did not infer design, our inference is a false negative.

8c) What if the operator inputted the algorithm string, and not the solution string?

A: Nothing changes. The string is designed however, because it is the result of the input of a conscious intelligetn operator, although an indirect input. Again, if we inferred design, our inference is a true positive. If we did not infer design, our inference is a false negative. IOWs, our inference is completely independent from how the designer designed the string (directly or indirectly)

8d: What if we do not realize that an algorithm exists, and the algorithm exists and is less complex than the string, and less complex than the threshold?

A: As alreday said, we would infer design, at least until we are made aware of the existence of such an algorithm. If the string really originated randomly througha random emergence of the algorithm, that would be a false positive.

But, for that to really happen, many things must become true, and not only “possible”:

a) We must not recognize the obvious algorithmic nature of that particular specified function.

b) An algorithm must really exist that computes the solution and that, when expressed as an executable program for the System, has a complexity lower than 150 bits.

I an absolutely confident that such a scenario can never be real, ans so I believe that our empirical specificity of 100% will be always confirmed.

Anyways, the moment that anyone shows tha algorithm with those properties, the deign inference for that Object is falsified, and we have to assert that we cannot infer design for it. This new assertion can be either a false negative or a true negative, depending on wheterh the solution string was really designed (directly or indirectly) or not (randomly generated).

That’s all, for the moment.

AF adds “This was done in haste. Any comments regarding errors and ommissions will be appreciated.”




263 thoughts on “Gpuccio’s Theory of Intelligent Design

  1. Crocodiles managed to find their way to isolated islands in the Pacific. One should be careful with one’s choice of metaphor. In the duplicated gene scenario, the original unmodified gene provides the floatation while the copy drifts. And selection isn’t specific in its demand for new function. It doesn’t have targets.

  2. gpuccio: “4) The ability to positively select and expand each new string according to the calculated product if, and only if, the product is higher than 10^60.

    Let’s remember that, apparently, the probability to get a string whose product is higher than 10^60 is 1:2^394 (information kindly provided by olegt). “

    Why don’t you just replace Elizabeth’s whole simulation with a (2^394) sided die?


  3. those with higher fitness are positively selected and those with lower are negatively selected.

    Just to clarify this, as I can see it being misrepresented, more conventionally, these would be termed beneficial and deleterious.

    Positive and negative are frequently used in relation to the causal implementation of the fitness differential – ‘positive selection’ may be seen as a preservative cause of selection upon a particular allele, ‘negative’ as an eliminatory one. But they are really inseperable, the yin and yang of a fitness differential. You can’t do one without doing the other, because ultimately what counts is the effect on allele frequency – the proportions in the population.  If I decide to keep the 60% of my record collection I like best, or throw away the 40% I like least, it amounts to the same thing – a ‘population’ enriched in the qualities preserved and/or removed by this ‘selective agent’ (who happens to be capable of making an active choice, but that’s by no means an essential qualification). Similarly, a bird missing moths due to camouflage would have exactly the same effect as one that can see them all clearly but just ‘prefers’ more brightly coloured ones.

    At any point in an evolutionary series, the terms ‘positive’ and ‘negative’, as I used them above, relate to the gradient caused by present differential fitness, not the causal reason there is a differential at all, which may involve either preservation of the ‘favoured’ fraction’ or elimination of the ‘disfavoured’, and amount to the same thing.

  4. gpuccio: Not only the Weasel phrase, but the whole text of Hamlet, if that text is alredy written in the algorithm. Is that new dFSCI? Obviously not. 

    In an evolutionary algorithm, Hamlet would represent all the multidimensional mountains and valleys that make up the fitness landscape.  And yes, a map contains a lot of specified information. 

    gpuccio: It was easier to just print the output directly from the program.

    Yes, it would be, but it wouldn’t demonstrate the ability of the evolutionary algorithm to navigate the landscape. Nor does being able to navigate Hamlet demonstrate that the biological realm is such a navigable landscape. 


  5. 909 gpuccio September 30, 2012 at 12:36 am

    So, let’s go back to Lizzie’s algorithm. The important point now is: the algorithm can measure a product for each new string, but it must not be able to select for that product unless and until it is higher than 10^60. Indeed, that product level is our threshold for the new biochemical function to appear. And only if the new biochemical function appears, it can be positively selected and expanded by NS.

    When the FAIL is so strong, one facepalm is not enough.

    I hope this comment will stimulate gpuccio as some sort of negative selection.

  6. gpuccio: “I have clearly shown that Lizzie’s algorithm is an “implementation” of IS, and not a “model” of NS. That is the important point. “

    What you have shown is that you are attempting to redefine both the term “evolution” and the term “algorithm”.

    An algorithm need not have any knowledge of its parameters at all.

    Here is an algorithm to sum two numbers: A + B = C.

    There are no numerical values at all yet it is clear how the algorithm works and it is also clear that no predefined “intelligent” solution is “inside” the algorithm.

    Here’s Elizabeth’s algorithm embedded into a callable function that we invoke:

    Lizzie_MutateAndSelect( Population, Size, ptrFunction_Environment );

    Note that the “Lizzie” “algorithm” itself knows no specifics at all.

    It just processes elements of a population, whatever that population may be.

    We could also use the “Lizzie” algorithm to perform “Methinks it is like a weasel”, without changing the algorithm but simply changing the “environment”.




  7. Mung: Here’s a model: Mendel’s Accountant “Mendel’s Accountant (MENDEL) is an advanced numerical simulation program for modeling genetic change over time…” Has anyone on your side of the aisle that you know of created something similar?

    Yes. More particularly, the flaw in the Mendel’s Accountant has been determined. The calculation of “working fitness” is broke. From Mendel’s Accountant:

    do i=1,total_offspring
    work_fitness(i) = work_fitness(i)/(randomnum(1) + 1.d-15)
    end do

    The effect of ‘divide by random’ is to eliminate the vast majority of the signal from genetic or phylogenetic fitness.

    If you look at through the Evolutionary Computation thread, there is a great deal of discussion and results from independent tests, including our own Mendel’s Bookkeeper.

  8. What Sanford did with his Mendel’s Accountant was to try and ‘prove’ his preconceived result. He didn’t look at his own work skeptically, and when it gave the results he was looking for, his work was done. 

    What we attempted with Mendel’s Bookkeeper was to create a simulation to help determine under what circumstances evolutionary algorithms run to extinction, and when they don’t, especially an exploration of the critical point between the two. 

  9. GA Models are invalid except when Creationists build them to disprove evolution or promote some End-is-Nigh tooth-gnashing?

    Incidentally, I downloaded Mendel’s Accountant, and it brought with it a parasite, a browser hijacker. That does not invalidate it, just a warning.

  10. As keiths has already noted, gpuccio can’t stake out a position and defend it. Therefore it seems pointless to try and explain the flaws in his current argument because tomorrow he will abandon it and make up another excuse. There is some entertainment value in this chase as his arguments shift away from Dembski’s puffed-up math and toward the good old tornado-in-the-junkyard territory.

    So, strictly for entertainment purposes, I will point out where he is wrong (again). I’m just curious where he will dart next. 

    His latest argument boils down to placing restrictions on fitness function. In his understanding, it should be zero almost everywhere (a sea of junk) except on a small island of functionality. As you cross from the sea to the island, the fitness function jumps discontinuously (a cliff). Once you get to the island, you can walk to its top point by following the gradient, which is smooth. Natural selection will work on the island, but it can’t get you to the island.

    It’s a nice theory, gpuccio, but it doesn’t work in practice. You can see what a fitness landscape looks like when biologists actually try to measure it in some real systems. Here is a figure from a PLoS ONE article depicting such a landscape (on a log vertical scale) inferred from experimental data. It looks nothing like your theory. Here is how the authors summarize it: 

    According to the estimated parameters, the landscape was plotted as a smooth surface up to a relative fitness of 0.4 of the global peak, whereas the landscape had a highly rugged surface with many local peaks above this relative fitness value.

    Y. Hayashi et al., “Experimental rugged fitness landscape in protein sequence space,” PLoS ONE 1, e96 (2006). doi:10.1371/journal.pone.0000096

    There is no level surface of strictly dysfunctional junk on that map. Most proteins are not very functional, but some are less dysfunctional than others, and they win in a competition judged by natural selection. When you start in a random place on the map, you have an easy time getting to islands of proteins that are much better than the starting point. There the landscape gets rugged and it’s not easy to jump from peak to peak.

    Note also that the map has more than one island of high functionality.

    This is a picture that is almost diametrically opposite to your theory. Read the article. Think it through. Get back to the drawing board.

  11. gpuccio, once you knee-jerk reaction subsides, let us know what you think of their fitness landscape that rises continuously at low levels of fitness. How does that square with your theory? What are your next steps? 

  12. I have had a number of opportunities over the years to watch live examples in which an ID/creationist bends and breaks scientific concepts to agree with sectarian beliefs. It is an agonizing process of word-gaming as every word is reworked repeatedly to mean something completely different. New words are often invented on the spot.

    The end result might turn out to be a statement of a scientific concept that looks superficially similar to the real concept; but to the ID/creationists, the words all have different meanings. So the ID/creationist declares agreement with the science and claims to love science; but the “science” that the ID/creationist loves has nothing to do with reality.

    The website of AiG has many videos in which AiG “scientists” are following exactly this bending and breaking routine. They are simultaneously hilarious and nauseating to watch.

  13. GP

    1) Anything that is selected must be able to influence reproductive success.

    2) New functions that can influence reproductive success in a complex replicator are usually very complex, because they have to integrate themselves in a complex, integrated scenario (the replicator itself and its metabolism).

    3) In biology, this is made worse by the observation that new biochemical functions are usually supported by new sequences and structures, unrelated to what already exists.

    4) Complex functions are not decostructible into simpler, small, functional selectable steps. There is no logical argument for that to be generally true, and there are not empirical examples that this is generally possible, least of all in biology.

    5) Therefore, NS cannot generate new dFSCI. This is an intrinsic limit of the process.

    On the other end, if we use an implementation of IS that creates selectable intermediates at each single bit variation, and therefore a continuous functional space to some target, we can easily reach that target. Saying that this is a model of NS, and that therefore we can draw conclusions about NS and biology from that, is simply a lie.

    Ah, good old civility! People are being dishonest, now?

    I honestly think that a model of NS is any process that has replication and a consistent differential probability that some forms of the replicated entities will be replicated compared to others in the current population. You are saying it is not a model of NS because biology is a bit complicated, and (you say) novelty is nearly always accompanied by a saltational leap in the dark, because (you say) reality does not possess explorable paths between one function and another, and (you say) complex functions are irreducibly complex, not just now, but at all points in their history. Which is all possible, but a tad vague and waffly. There is plenty that evolution cannot explore, for those and many more reasons. So? It can still trickle through the cracks that it can explore. It boils down to your conviction that the space around every current ‘function’ is absolutely sealed up tight. Nothing could have got there, and it can’t go anywere else. Or just some current functions. In which case, which? It’s protein baraminology.

    Actually, Lizzie hamstrung herself slightly by excluding some ‘biological reality': recombination. This process has a mahoosive impact upon the ability of a ‘search’ to explore protein space. One could make it more ‘biological’ yet by introducing a great deal more ‘lethal’ space, and a more rugged landscape, but still the combination of fragments from different strings is a powerful means of widening the net of the ‘search’.

    The proportion of times such leaps in the dark land on functional space is entirely dependent on the density of function, which you cannot intuit by reading Hamlet.

  14. 4) Complex functions are not decostructible into simpler, small, functional selectable steps. There is no logical argument for that to be generally true, and there are not empirical examples that this is generally possible, least of all in biology.

    5) Therefore, NS cannot generate new dFSCI. This is an intrinsic limit of the process.

    This is the hill GP has chosen to die on? Really? Really, a bare-faced assertion that unguided natural selection intrinsically is impotent in the real world? Really, no natural selection at all? That’s it? That’s GP’s ID position? Well, that’s logically compatible with certain types of christian (and muslim, I think) theology which claim that every single tiniest happenstance is directed by god as part of god’s plan. In which case, I want to meet that god face to face to spit in its eye for its “intelligent design” of E coli 0157, which tortured our son nearly to death. Fuck that god, or “designer”, or whatever GP wants to imagine it to be.

  15. It’s pitiful, isn’t it? After all the hand-waving about ‘dFSCI’ and ‘intelligent selection’, gpuccio’s ‘theory’ boils down to nothing more than this:

    Natural selection cannot generate complex functions. Why? Because the functional areas of the fitness landscape don’t logically have to be connected. Since they don’t have to be connected, I’m entitled to assume that they never are connected. Natural selection cannot bridge the gaps that I’m assuming are there. QED.

  16. Allan:

    It would be possible to derive a ‘random’ way of deriving the fitness function for a GA, rather than just ‘intelligently’ choosing it. Take the depth of snow in your yard on December 14th last, add the license number of the next vehicle to pass, the number of times the word “obvioulsy” appears in UD comments … then simply evaluate strings according to their approach to this peak, and don’t copy (breed from) the worst.


  17. The travelling salesman problem has a randomly generated fitness landscape. There is no necessity for a connectsble fitness gradient.

  18. Hey, that’s cute.  I never would have guessed that anyone could make that typo even once by accident, much less repeatedly …

    Obvioulsy, it’s an “intelligently designed” spelling improvement! 😉 

  19. This is the same claim that KF always makes – that all biological functions are isolated islands surrounded by oceans of non-function. A claim that is not backed up with any evidence, and contradicted by plenty.

    If I understand some of GP’s demands correctly, one of the things he is asking for (apart from an UN-navigiable landscape!) is an environment with intrinsic selection – in other words differential survival has to be an emergent property of the agents behaviour in an environment and not something that is explicitly coded for.

    Perhaps he imagines fitness functions as being analogous to some no-corporeal entity that floats around, using its intelligence to select agents that it likes … ?

    I have a half baked idea for minimal a simulation with implicit selection, if I get time I’ll try and write some code.

  20. All hail the flagellum! And its relative, the Type III Secretory System! They live on neighbouring islands, surrounded by Irreducible Sea, and enhance bacterial abilities to mess with us. [And cloroquine resistance … plasmids as vectors of antibiotic resistance … ]

  21. I think once you’ve decided that evolution can’t get to here, you can just dispense with all simplifications that try and analyse how evolution could have got to here. It can’t because ‘evolution does not work like that’. Evolution in or towards modern, complex, integrated organisms cannot be desconstructed because those organisms themselves cannot be deconstructed. Which is fallacious reasoning.

    Computational resources do not permit a full simulation of organisms and environments. One might imagine that it should include both a genotype and a phenotype. The genotype should code for proteins and RNA under gene expression control, which should fold according to chemical law and catalyse reactions in a realistic manner, and interact in virtual cells. Perhaps multicellular phenotypes should build up, and interact with a coevolutionary environment of other replicators and …

    Hang on, though .. why? All the issues GP points out relate to phenotype. Phenotype is an arena of complex interaction between different parts of the genotype and with the environment … but still, you don’t have to model it explicitly (unless it is relevant to your study). All that phenotype is, in the evolutionary world, is the proxy for genotype with which selective agents interact. Because the currency of evolution is DNA (even when ‘epigenetics’ is invoked), that is all that passes down multiple generations. All you need is a means of ‘evaluating’ the digital genotype. Selection ‘evaluates’ via the genotype’s expressed tastiness, or ease of capture, or something. There is no doubt that many traits are multigenic, and interaction is complex (you think biologists may have missed that?). So you can include whatever you decide is phenotypically important in the matter of epistatic and pleiotropic interaction – but you can shortcut computing phenotypic expression, and just model gene interactions directly.

    But it still boils down to DNA, which is just a long string of bases. However intricate the interaction of parts of these strings in life, the replication process ‘flattens’ the complexities and simply copies the whole lot. Likewise, death removes the whole lot.

    So perhaps what GP could do, rather than hand-waving concerns, is to build a simple model and then ‘complexify’ it. Build in the biologically real things that he believes model-builders omit unjustifiably, and demonstrate how they sap the system of evolutionary power (if that is what he thinks will happen in complex, more ‘real’ digital organisms).

  22. Behe’s argument boils down to the claim that multiple, co-dependent mutations are extremely unlikely. Which at face value is true. All versions of ID depend on this reasoning, because all versions of ID boil down to gaps arguments, with design as the fall through when evolution can’t be described in pathetic detail.

    As soon as someone demonstrates that four co-depenndent mutations can occur in a tightly controlled lab experiment with a tiny population, the argument shifts from the mutation count to the magnitude of the functional gain. Nevermind that the original goalpost involved mutation count. I suppose the next edition of Behe’s Edge will correct this. Along with correcting the argument that gain of function is rare.

    My point would be that because ID is based entirely on gaps, its entire argument consists in goalpost shiftng.

    The flagellum was once comprised of modules that required all to be present in order to have any selectable function. But there are approximately two dozen different flagella incorporating various subsets of the rotary version’s components.

  23. Zachriel: In an evolutionary algorithm, Hamlet would represent all the multidimensional mountains and valleys that make up the fitness landscape. And yes, a map contains a lot of specified information.

    gpuccio: could you please explain how your concept of multidimensional fitness landscape can help find the right functional sequence for a new protein domain with a new protein structure and a new biochemical function, for instance a new enxymatic activity, from unrelated DNA strings?

    As we said above, the algorithm only demonstrates that *some* complex landscapes are navigable (sometimes in surprising ways), not that the biological realm is navigable. However, it does address a source of specified information. The complexity of the genome can increase due to its interaction with the environment. There’s no inherent barrier as suggested by your comments.


  24. gpuccio: could you please explain how your concept of multidimensional fitness landscape can help find the right functional sequence for a new protein domain with a new protein structure and a new biochemical function, for instance a new enxymatic activity, from unrelated DNA strings?

    The algorithm also allows exploration of how evolutionary algorithms can evolve entirely new sequences by mixing and matching existing components. This also shows why evolutionary search is much, much faster than random assembly—basic structures have multiple uses in completely different contexts. (e.g. the “ing” in “king” can also act to turn a verb into a gerand or participle, “saying”).


  25. gpuccio: could you please explain how your concept of multidimensional fitness landscape can help find the right functional sequence for a new protein domain with a new protein structure and a new biochemical function, for instance a new enxymatic activity, from unrelated DNA strings?

    Keep in mind that proteins are far more flexible than words. Many folds are made up of just a few amino acids, with the rest of the protein just a scaffold. Multiple folds can often perform the same function. Many proteins have multiple domains and multiple functions, and rearranging these domains can create novel structures. There is significant phylogenetic evidence of domain evolution including domain rearrangements.

    Yang & Bourne, The Evolutionary History of Protein Domains Viewed by Species Phylogeny, PLOS One 2009.


  26. I do wonder what GP has in mind when he says “new protein domain” or “new biochemical function”? Does he have one that he considers ‘new’, and definitively inaccessible by the probabilistic resources available to any ancestors – something that can be investigated, rather than his personal, very general assumptions about the structure of protein space and its distribution of function? 

    The only proteins we need to concern ourselves with are the ones that exist, and their accessibility from other points in the space that (on the evolutionary assumption) were occupied by ancestral sequences with similar or other functions. So it would help to have a concrete example, rather than this ‘function is universally restricted to tiny, widely-separated islands’ nonsense, which is empirically demonstrated to be untrue.

  27. We went over the origin of protein domains a couple years ago on Mark Frank’s blog. Gpuccio bailed out when it was pointed out that a significant percentage of random folds have useful biological function (something that junk DNA lacks). Useful to the point of being able to rescue otherwise disabled bacteria.

    The key element in this study is that the rescuing sequences have no sequences in common with the original, knocked out sequence. The haystack is jam packed with needles.

    GP has also managed to maintain silence on the Lenski experiment, through an 800 post thread, on how a designer would replicate or improve the first 20,000 generations, in which critical, but non-adaptive, mutations accumulated.

    Now that it has absolutely been demonstrated that neutral mutations can accumulate and enable later functional mutations, and now that it has absolutely been established that new functionality can arise without weakening existing function, I would expect some apologies from those who have been arguing otherwise.

  28. a significant percentage of random folds have useful biological function (something that junk DNA lacks). Useful to the point of being able to rescue otherwise disabled bacteria.

    That would be this I keep blethering on about?

    The essence of proteins is modularity. And most structural modules have many ways of being formed. And because recombination of DNA fragments is an almost inevitable consequence of trying to keep hold of all this fine-yarn knitting, modules get swapped within the same protein and between proteins with great ease. It’s not just whole genes that are duplicated but fragments too (indeed, nothing demarcates a gene boundary to mitosis/meiosis). This also serves to make much wider leaps around protein space than a simplistic point-mutation model, mental or computational, may suggest. If space is widely explored from current position, and the generality of space of a type readily reached from there is significantly function-rich, it’s hard to see where GP’s assumed barriers lie, other than good old IC, which is more to do with the interaction of multiple genes than the ‘exploratory unlikelihood’ of single ones.

  29. Your link is broken due to a point mutation which took you off the island of functionality.

    Oh, bugger! I’ve posted that link about 4 times and screwed it at least twice. I spend most of my life trying not to get washed off the island of functionality. :)

  30. Mung: The components are instances of some protein domain. 

    Some components are simpler motifs or repeated sections. Novel folds can be found in random sequences, and a majority of folds appear to have diverged from a small number of ancestors.

    Zachriel: Keep in mind that proteins are far more flexible than words. 

    Mung: That’s debatable, and probably irrelevant.

    We were referring to fragility. Words are much more likely to be broken by a substitution.

    Mung: Who cares if they can be re-arranged?

    Scientists trying to reconstruct their natural history. And, for the purposes of this discussion, it shows how and why evolutionary processes can be so effective at finding functional proteins.

  31. Mung: Generating CSI with Faulty Logic

    I take a fair coin and toss it 5 times: HTTHT
    That’s one of the possible set 2^5 sequences (1/32).
    Now I copy that sequence, and modify one position at random.
    Say the first position: TTTHT
    Is that sequence a member of the original set of 2^5 sequences?


    Mung: What is the relevance, if any, for calculating CSI?

    Do tell.

  32. Who cares if they can be re-arranged?

    It matters for one’s inferences about the searchability of protein space (it’s not really a ‘search’, but it still finds stuff!). A sequence that can only be amended by point mutation finds more of the wider space closed off to it, since it can only probe very local regions, and especially if on an adaptive peak is more likely to be surrounded by detrimental sequence and less likely to encounter ‘new function’, however that may be defined. Even if stepwise paths are available, their exploration is slow. Recombinations probe further faster, allowing access to regions that could not be traversed stepwise due to lack of a non-detrimental path, and shaking them out of local maxima.  

    An important, quite subtle point about recombination is that it enables huge leaps to be taken with some care. If a protein suffered an insertion of 30 completely random bases, it is much more likely to be disabled than if a 30-base fragment of a different, working protein were inserted. The individual elements of a recombination event have been through the filter of natural selection, and will be inherently less disruptive than some ‘purely random’ sequence of the same size, even if they come from a protein with a completely different ‘function’.

    Behe’s CCC calculation suffers from a lack of consideration of recombination too, incidentally. It’s a very important, sometimes underappreciated evolutionary force. Which one may argue is itself a Design feature, but one can’t hold that position and simultaneously dismiss its evolutionary capacity.

    Lots more stuff to sneer at if you Google ‘protein domain evolution’.

  33. gpuccio: (Thank you, Zachriel, for providing me a quick refernce to one of my favourite papers. I am afraid, however, that you have not really understood what it mean. It is not about “evolution” of the domains, but rather about their emergence in natural history. I quote:

    “Notwithstanding, these data suggest that a large proportion of protein domains were invented in the root or after the separation of the three major superkingdoms but before the further differentiation of each lineage. When tracing outward along the tree from the root, the number of novel domains invented at each node decreases”. Emphasis mine.)

    Right. So, according to the paper, 800+ domains were ‘invented’ *after* the divergence of the superkingdoms. It’s appears to be the usual evolutionary pattern of adaptive radiation followed by increasing specialization.

    gpuccio: What you say about recombination as sense, and I can agree. But I don’t think it can solve the fundamental problems about completely new information. Anyway, it can be reasonable to try to evaluate the real powers of recombination on some empirical basis, but it is certainly true, as you say, that it is a “sometimes underappreciated evolutionary force”. Underappreciated and not much supported by evidence, although often generically invoked.

    Keep in mind that your “don’t think” encompasses all evolutionary algorithms. Evolutionary algorithms, such as Word Mutagenation, can show you how and why recombination is such a powerful force for novelty. 


  34. Zachriel: And, for the purposes of this discussion, it shows how and why evolutionary processes can be so effective at finding functional proteins.

    Joe: Intelligently designed evolutionary processes or blind watchmaker evolutionary processes? Or is cowardly equivocation still the best you and your ilk can provide?

    In this case, we’re concerned with typical evolutionary algorithms based on  random mutation and recombination.


  35. kairosfocus: Are you and/or anyone of your acquaintance taking me up on the offer of an up to 6,000 word essay for UD presenting your view and empirically warranted grounding of the blind watchmaker thesis style account of origins?

    Try Darwin’s Origin of Species (1859). It’s a bit dated and longer than 6,000 words, (the 6th edition is 190,000 words), but Darwin considered it just a long abstract, and it still makes for a powerful argument. 


  36. Joe: Biased towards a goal, which means you are talking about Intelligently designed evolutionary processes.

    Or navigate a fitness landscape (which may or may not be dynamic), which is sufficient to understand certain basics of the process, such as the ability of recombination to create novelty.

    Zachriel: Try Darwin’s Origin of Species (1859). It’s a bit dated and longer than 6,000 words, (the 6th edition is 190,000 words), but Darwin considered it just a long abstract, and it still makes for a powerful argument.

    Joe: But the evidence gathered since then has not borne out his “powerful argument”, which makes it impotent.

    There is considerable evidence, much of it in Origin of Species, that shows that natural selection can be a mechanism of adaptation. Artificial selection shows that there are selectable intermediaries between quite different forms. Peter and Rosemary Grant’s work on finches in the Galápagos Islands shows how this works in nature.


  37. Joe: Artificial selection is NOT natural selection-

    No, it’s not. However, it does show that selectable intermediaries exist, and that selection for a simple trait, such as size, will result in multiple genetic and physiological changes. 

    Joe: NS requires that the change be random/ due to chance and no one ahs demonstrated that wrt finches

    Natural selection works on existing variations. The change isn’t random, but due to changes in the environment.

    Joe: Please define your use of “fitness” and also how it is you determined that recombination is a blind watchmaker process.

    In an evolutionary algorithm, the fitness landscape is explicitly defined, and recombination is random. 


  38. gpuccio,

    It’s taken me a couple of days to catch up on the discussion that took place over the weekend. Good stuff all around. I don’t intend to restate what’s already been presented so eloquently by many of the TSZ participants, but there are a couple of points I’d like to address. The first is this comment of yours, which you repeated in one or two other forms:

    Any fitness function in ant GA is intelligent selection, and in no way it models NS.

    Others here have noted that you are confusing the model and what is being modeled. Labeling all GA fitness functions as “active selection” misses the point of what the fitness function is modeling. We observe differential reproductive success in real biological systems. We further observe that this differential is correlated with the varying ability of individual organisms to utilize the resources of the environment. All the fitness function is modeling is that environment. It’s obviously an extremely simple model, relative to real world environments, with far fewer dimensions, but it serves the purpose of simulating a situation in which phenotypic variation affects reproductive success.

    This is where it is important to avoid mixing levels of abstraction. The specifics of how organisms are rated relative to each other, what threshold is used to stop the run, and the goal of the programmer are immaterial. The essence of the model is to determine if certain observed mechanisms operating an environment where different genomes result in different rates of reproductive success will result in populations with certain characteristics. It turns out that they do.

    Of course it is important to understand the limitations of a model. It may be that some important aspect of reality isn’t included and that aspect might invalidate conclusions drawn from the model. However, based on simulations like those done for Lizzie’s CSI problem and biological experiments such as Lenski’s, it does appear that the mechanisms of the modern synthesis are quite capable of generating high functional complexity, by your definition.

    I realize that you don’t call it dFSCI because you require dFSCI to not have a “deterministic” explanation by definition, but that just further demonstrates that dFSCI is a measure of knowledge (or ignorance) not of any objective characteristic.

  39. gpuccio,

    There is only one way to model NS in an informational environment, like a PC or a PC network. I have proposed it many times, but darwinists seem to be horrified by the simple thought of it. However, I will rpeat it here for you:

    1) Take an informational environment, set up in a completely blind way in respect to the experiment we are going to make. IOWs, the people who set up the envirnment must not know how we will use it. However, the environment, like all natural informational environment, must have specific resources, like RAM, mass memory, procedures, and so on.

    2) Introduce a computer virus in the environment, that can replicate and use the resouces of the environment. It should also include a mechanism of random variation to its code, like random errors when it copies itself (the mechanism can be tweakable to allow for testing different rates of random variation).

    3) And then, just wait.

    That’s it. No active measurement of fucntion, no active expansion of the measured function. After all, we are modeling NS, not IS.

    It sounds like you would consider an evolvable Corewars style system to be a real model of natural selection, essentially a simulation where, like in the real world, the only goal is survival and reproduction. That might be an interesting programming challenge.

    I’m curious, how would you measure functional complexity in such an environment? Would it simply be the length in bits of the digital organisms? If an organism with sufficient functional complexity to meet your dFSCI threshold were to appear, would you consider it to have dFSCI or would the fact that it arose through evolutionary mechanisms, which might even be tracked mutation by mutation, mean that the dFSCI medal could never be earned?

  40. gpuccio,

    NS occurs only in replicators who use environmental resources. It is NA only if all the observed effect is determined by modifications in the fundamental functions of the replicators (metabolism, survival and replication itself) so that differential reproduction is observed.

    Can you see that this is not what is happening in Lizzie’s algorithm?

    a) There are no autonomous replicators, and therefore no use of envoronmental resources.

    The genomes are modeled as replicators. Are you claiming that only actual running programs would be a valid model? If so, why?

    b) The function is predefined, and has nothing to do with replication or with using environmental resource.

    That’s not the purpose of the fitness function. As I tried to explain in a previous comment, you’re mixing the implementation details with the purpose of the model.

    One could consider the environment to be “predefined” in the real world, in the sense that it exists before each generation of a population. It nonetheless provides the criteria by which organisms are evaluated with respect to each other.

    c) The function is measured, with utmost precision, by the algorithm.

    Well, in most GAs reproduction is stochastic, just as in real environments. I’m not sure what the precision of the fitness function has to do with the usefulness of the model.

    d) The function is actively rewardes (either negatively or positively) by the algorithm, according to its measurement, although it generates no difference in reproducing fitness.

    This demonstrates some confusion about the purpose of the fitness function. The fitness function is part of the model of differential reproductive success. Along with the selection mechanism, it determines reproductive success. The overall process, in most GAs, is stochastic.

    So, Lizzie’s algorithm implements Intelligent Selection, and not Natural Selection.

    A distinction without a difference. The model shows that the mechanisms of the modern synthesis are quite capable of generating functional complexity in excess of that required by your dFSCI.

  41. onlooker quotes gpuccio (above) and asks:

    The genomes are modeled as replicators. Are you claiming that only actual running programs would be a valid model? If so, why?


    That quote is from September 29th, when gpuccio still believed that

    as I said, all forms of “fitness functions” are IS. But you can indirectly observe NS in true informational replicators, as I suggested in the computer-virus experiment.

    It’s yet another symptom of his inability to distinguish the model from the thing being modeled. There is an enormous difference between his computer virus experiment, which is a real Darwinian process that happens to take place on a computer, versus a model of a Darwinian process, in which replicators are modeled, variation is modeled, and differential reproduction is modeled (via a fitness function).

    The criticisms seem to have finally sunk in, and gpuccio changed his position on September 30th (without acknowledging that he was doing so, as usual, until I called him on it). He now accepts that fitness functions are okay:

    I will show first why Lizzie’s GA is an example of IS, and not of NS. I will then show how it should be modified to drop IS and generically “model” NS…

    The important point now is: the algorithm can measure a product for each new string, but it must not be able to select for that product unless and until it is higher than 10^60…

    I think we should ask gpuccio to assign version numbers to his arguments. Otherwise it’s impossible to keep track of all the flip-flops.

  42. Joe: NS requires that the change be random/ due to chance

    WTF?? No it doesn’t. NS could operate perfectly well on variations introduced by a designer and biased in any way.

  43. It is pretty clear that ID/creationists over at UD are still trying to word-game their “arguments” against evolution; that word-gaming process has been getting increasingly contorted ever since the 1970s.  We are seeing the same thing at AiG and ICR.

    Their “arguments” have indeed evolved – from creation science to cdesign proponentsists to ID – but the most fundamental feature of that evolutionary process has been its strict avoidance of getting the real science right.

  44. Joe: Nature doesn’t select and natural selection could never produce a toy poodle even given selectable intemediates.

    That’s the point, of course. There are selectable intermediaries between wolves and toy poodles, and selection for very general traits (size, curly hair, docility) can result in the evolution of complex genetic and physiological changes.

    Joe: So, yes there needs to be variation and it needs to be random/ ie a chance event.

    There has to be variation for natural selection to work, but it doesn’t have to be the result of a chance event. It may already exist in the population as part of the inherited variation. It could be inserted into the genome by magic, and natural selection would still work. (However, for many characters, they form a standard distribution consistent with random variables.)

    Joe: The variation is the change and according to the modern synthesis is entirely by chance. Changes due to the environment would be directed changes ala Dr Spetner’s built-in responses to environmental cues”- IOW more evolution by design.

    You seem to be confusing the sources of variation with natural selection, such as when you said “NS requires that the change be random/ due to chance”. Natural selection acts on existing variations, from whatever source. 


  45. Mung: Where did the protein domains come from that are required for recombination?

    Some novel protein domains are available to completely random processes. However, the natural history is not well-documented.


  46. GP:

    What you say about recombination as sense, and I can agree. But I don’t think it can solve the fundamental problems about completely new information. Anyway, it can be reasonable to try to evaluate the real powers of recombination on some empirical basis, but it is certainly true, as you say, that it is a “sometimes underappreciated evolutionary force”. Underappreciated and not much supported by evidence, although often generically invoked.

    I’m not sure what you think isn’t supported by evidence – nor what you really mean by ‘completely new information’. The front end of gene A attaching to the back end of gene B is ‘completely new information’, even though the partners pre-existed. And the catalytic activity of the new combination is not compelled to be that of either source, still less some halfway hybrid between them.

    Recombination clearly happens, by several different mechanisms throughout the living world, including us every time we make a gamete, as crossover does not know where the genes are. Where within a gene and alignment is good, we simply swap front and back of the same gene, but nonetheless this introduces a rate change in exploration of space (or it does modelled in bloody GAs, anyway!***). Behe’s CCC argument relies on serial mutation 1 then 2 or 2 then 1, with no benefit till both occur in the same individual. Calculations show it to be of low (though not vanishing) probability. But since 1 and 2 must necessarily be at different positions in the gene, recombination can occur between them, increasing the chance substantially, even though recombination will cause occasional loss of 1-2 links.

    Other mechanisms clearly occur that cause fragments to be moved greater distances in the genome. The existence of long areas of sequence identity (or close enough to be revealed by statistical test) in different genes – the very thing that enables us to declare homology of a ‘domain’ – is regarded as evidence of the within-genome common descent of that sequence by duplication, which necessarily involves a recombination event. Other explanations for that homology are pretty ad hoc – one could infer that it was moved (or tooled in situ) by a Designer – but what would distinguish identity from such a source from that caused by known mechanisms of recombination?

    *** If you are right about GAs, it is one hell of a coincidence that a method of exploring certain kinds of digital space using only the biological observables of differentials in birth and death, mutation and (optionally) recombination should have such power that they are popular tools in engineering and maths, as well as biological applications unrelated to modelling evolutionary mechanism (eg phylogeny) … and yet you think the algorithm has NO power in the very realm that inspired it – biology? And despite working fine in other statistical fields, according to some anti-common-descenters in ID their use in tree-building leads inevitably to false phylogenies …! Every time they are applied to the biology that inspired them, they apparently fall to bits. One hell of a coincidence.

  47. Joe: Evolution by design.

    Nevertheless, it demonstrates the existence of selectable intermediaries. 

    Zachriel: There has to be variation for natural selection to work, but it doesn’t have to be the result of a chance event.

    Joe: It cannot be planned/ directed and still be natural selection. 

    The source of variation doesn’t have to random for natural selection to still occur. Darwin posited a non-random theory of Pangenesis, for instance. You’re confusing two different processes, the sources of variation and selection. 


Leave a Reply