Natural selection can put Functional Information into the genome

It is quite common for ID commenters to argue that it is not possible for evolutionary forces such as natural selection to put Functional Information or Specified Information) into the genome. Whether they know it or not, these commenters are relying on William Dembski’s Law of Conservation of Complex Specified information. It is supposed to show that Complex Specified Information cannot be put into the genome. Many people have argued that this theorem is incorrect. In my 2007 article I summarized many of these objections and added some of my own.

One of the sections of that article gave a simple computational example of mine showing natural selection putting nearly 2 bits of specified information into the genome, by replacing an equal mixture of A, T, G, and C at one site with 99.9% C.

This post is intended to show a more dramatic example along the same lines.

Suppose that we have a large population of wombats and we are following 100 loci in their genome. We will make the wombats haploid rather than diploid, to make the argument simpler (diploid wombats would give a nearly equivalent result). At each locus there are two possible alleles, which we will call 0 and 1. We start with equal gene frequencies 1/2 and 1/2 of these two alleles at each locus. We also assume no association (no linkage disequilibrium) between alleles at different loci. Initially the haploytypes (haploid genotypes) are all combinations from 00000…000 to 11111…111, all equiprobable.

Let’s assume that the 1 allele is more fit than the 0 allele at each locus. The fitness of 1 is 1.01, and the fitness of 0 is 1. We assume that the fitnesses are multiplicative, so that a haploid genotype with M alleles 1 and 100-M alleles 0 has fitness 1.01 raised to the Mth power. Initially the number of 1s and 0s will be nearly 50:50 in all genotypes. The fraction of genotypes that have 90:10 or more will be very small, in fact less than 0.0000000000000000154. So very few individuals will have high fitnesses.

What will happen to these multiple loci? This case results in the gene frequency of the 1 allele rising at each locus. The straightforward equations of theoretical population genetics show that after 214 generations of natural selection, the genotypes will now have gene frequency 0.8937253. The fraction of genotypes having 90:10 or more will then be 0.500711. So the distribution of genotypes has moved far enough toward ones of high fitness that over half of them have 90 or more 1s. If you feel that this is not far enough, consider what happens after 500 generations. The gene frequencies at each locus are then 0.99314, and the fraction of the population with more than 90 1s is then more than 0.999999999.

The essence of the notion of Functional Information, or Specified Information, is that it measures how far out on some scale the genotypes have gone. The relevant measure is fitness. Whether or not my discussion (or Dembski’s) is sound information theory, the key question is whether there is some conservation law which shows that natural selection cannot significantly improve fitness by improving adaptation. My paper argued that there is no such law. This numerical example shows a simple model of natural selection doing exactly what Dembski’s LCCSI law said it cannot do. I should note that Dembski set the threshold for Complex Specified Information far enough out on the fitness scale that we would have needed to use 500 loci in this example. We could do so — I used 100 loci here because the calculations gave less trouble with underflows.

I hope that ID commenters will take examples like this into account and change their tune.

Let me anticipate some objections and quickly answer them:

1. This is an oversimplified model, you were not realistic. Dembski’s theorems were intended to show that even in simple models, Specified (or Functional) Information could not be put into genomes. It is therefore appropriate to check that in such simplified models, where we can do the calculation. For if natural selection is in trouble in these simple models, it is in trouble more generally.

2. You have not allowed for genetic drift, which would be present in any finite population. For simplicity I left it out and did a completely deterministic model. Adding in genetic drift would complicate the presentation enormously, but would still result in the achievement of a population with all 11111…1111 genotypes after only a modest number more of generations.

3. If fitness differences are due to inviability of some genotypes, fitnesses could not exceed 1. Yes, but making the 0 allele have fitness 1/1.01 = 0.9900990099… and the 1 allele have fitness 1 could then be used, and the results would be exactly the same, as long as the ratio of fitnesses of 0 and 1 is still 1:1.01.

4. You just followed gene frequencies — what about frequencies of haplotypes? This case was set up with multiplicative fitnesses so that there would never be linkage disequilibrium, so only gene frequencies need to be followed.

I trust also that people will not raise all sorts of other matters (the origin of life, the bacterial flagellum, the origin of the universe, quantum mechanics, etc.) To do so would be to admit that they have no answer to this example, which shows that natural selection can put functional information into the genome.

254 thoughts on “Natural selection can put Functional Information into the genome

  1. petrushka:
    My point would be that even if we have the technology to construct a ribosome, we do not have the ability to design one from scratch.

    The Intelligent Designer has a tough row to hoe. Without being alive and without ever seeing a living organism, he must conceive of the possibility, navigate through all that impossibly sparse sequence space to those 500 bit islands, and build the stuff out of quarks and such.

    And this is considered more probable that chemical evolution. All I want to know is why design proponents assert that they know it can be done. I’d like to see an example of where it has been done.

  2. petrushka:
    My point would be that even if we have the technology to construct a ribosome, we do not have the ability to design one from scratch.

    So when we develop that ability- to design one from scratch- would that then “prove” ID?

  3. Joe G: So when we develop that ability- to design one from scratch- would that then “prove” ID?

    No. it would be another case of a known material, natural, entity designing. You know, like ALL known instances of proven design

  4. We use the same principles to determine design regardless of the “nature” of the designer.

    In the absence of direct observation, or designer input, the only possible way to make any scientific determination about the design(s) or the specific process(es) used, is by studying the design and all relevant evidence.

  5. It appears that natural selection is just a statistical artifact. And it may be an artifact that is indishtinguishable from genetic drift.

    And that means one should be skeptical of any claims of it putting functional information anywhere.

  6. I am out on my own limb here, but I put biological design in the same category as faster than light travel, time travel and perpetual motion. I don’t speak for mainstream science on this. I speak for myself.

    I don’t think there is (or is going to be) any way to design living things and complex components of living things from scratch. I see the issue as one of complexity and emergence and horizons.

    This isn’t magic. It’s simply a statement that you can’t anticipate all the properties of complex dynamic systems.All inventions are extensions of existing inventions. Aside from trivial structures whose properties can be calculated, most new things evolve by extending what is known to work and trying. Call it intelligent evolution.

    The difference between intelligent evolution and what ID advocates call design, is that there is no magic. ID advocates seem to think there is some way to bypass cut and try. They are particularly fond of asserting that biological designers can magically poof into existence 500 bit sequences without cumulative selection.

    I don’t think it can be done, and I’d like to see actual evidence that it is even possible. In this regard, design is less likely thatn time travel. Physicists can at least produce thought experiments that outline possible methods of time travel, but no one in the ID movement has produced a thought experiment that details how the designer works, how he knows what sequences are functional, how he obtained the list of functional sequences, where in the universe he stores the database of functional sequences (whose numbers must exceed the number of particles in the universe).

  7. Actually we don’t use the same principles to detect design regardless of the designer. When doing forensics we always consider the motives and capabilities of potential designers.

    Before Galileo and Newton it was commonplace to attribute phenomena to intelligent agents, but I can’t think of a single instance where this has worked out.

    Aside from claims made regarding OOL and evolution, can anyone cite a single example where a phenomenon has been attributed to an unknown agent and where this attribution has been commonly accepted?

    I mean, do the police ever attribute crimes to ghosts or disembodied spirits? Archaeologists? Paleontologists? Geologists? Astronomers? Chemists? Physicists?

  8. It’s functional, even if it’s drift. to be fixed it must be as functional as what it replaced, or nearly so. It’s still natural selection, even if it isn’t uphill.

  9. Joe G:
    It appears that natural selection is just a statistical artifact. And it may be an artifact that is indishtinguishable from genetic drift.

    And that means one should be skeptical of any claims of it putting functional information anywhere.

    This will mystify anyone familiar with population genetics theory, which shows clearly the difference between the two forces. It apparently “appears” somewhere in a way that escaped theoreticians for a century.

  10. Elizabeth,

    I posted a comment in the Hazen thread, saying that its seems in some sense, Hazen’s I[Ex] shares some relationship to Shannon’s (U). Whereas in Hazen’s argument, as the possible number of equal or better solutions approach 0, that the p(x) in Shannon’s uncertainty argument would also likely be approaching 0. But the difference between these two equations is in the assumptions.
    For Hazen’s equation, trying to imagine and pin down all of the equal or better functions, and then imagine and pin down all of the total possible functions, would require more assumptions and is more cumbersome then simply measuring population frequency and compressibility.

  11. Joe Felsenstein,
    You say:

    “In addition, I am not calculating measures of complexity of the population.”

    This has been made clear from your 2007 paper and this post. However, complexity is the first measurement in the Complex Specified Information argument.

    You also state:

    “So even if natural selection isn’t “putting” the information into the genome, its presence has the dramatic effect of invalidating the argument that the presence of CSI proves that natural processes other than Design could not be responsible for the adaptation.”

    If you were not calculating population frequencies to satisfy or disqualify the complexity signature then I am not understanding how the presence of CSI was identified. You seem to pivot, bob and weave a little between CSI and term functional information.

  12. My apologies Joe- I wasn’t referring to “theory”, I was referring to what we actually observe and how it is actually applied.

    Perhaps you can tell us how to determine, just by looking at an existing population, if its members are there due to natural selection, genetic drift, sheer dumb luck, cooperation- or any other possible “mechanism”.

  13. petrushka:
    It’s functional, even if it’s drift. to be fixed it must be as functional as what it replaced, or nearly so. It’s still natural selection, even if it isn’t uphill.

    To be fixed- those magical words- but anyway to be natural selection it has to be differential reproduction DUE TO heritable random generation.

  14. Actually we don’t use the same principles to detect design regardless of the designer. When doing forensics we always consider the motives and capabilities of potential designers.

    But in the absence of direct observation, or designer input, the only possible way to make any scientific determination about the design(s) or the specific process(es) used, is by studying the design and all relevant evidence.

    Many crimes are unsolved and that does not mean they ain’t crimes- and we don’t know the designers of many ancient artifacts, we can only speculate and try to determine if humans had the materials to pull it off- we know the capability of the designer by the evidence it left behind.

  15. Joe G:
    Venter has synthesized a ribosome- NS has never been observed to do such a thing.

    I’d be interested in your evidence of an extra cosmic designer designing biological structures.

  16. junkdnaforlife,

    “However, complexity is the first measurement in the Complex Specified Information argument.”

    Will you please measure the complexity, and the specified information, in a chicken egg, and show your work?

  17. Joe G: But in the absence of direct observation, or designer input, the only possible way to make any scientific determination about the design(s) or the specific process(es) used, is by studying the design and all relevant evidence.

    Many crimes are unsolved and that does not mean they ain’t crimes- and we don’t know the designers of many ancient artifacts, we can only speculate and try to determine if humans had the materials to pull it off- we know the capability of the designer by the evidence it left behind.

    If not humans, then who commits crimes, and who designed and produced ancient artifacts?

    Do you have any evidence that something other than humans and some animals have ever designed anything?

  18. Umm we have to follow the evidence.

    Do you have any evidence that blind and undirected processes can design anything?

  19. I’d be interested in your evidence that blind and undirected processes designing biological structures.

  20. petrushka: I’d be interested in your evidence that an intelligent designer can produce the kind of complex biological structures you refer to.
    I’d like to know your source for the claim that intelligent designers can design complex structures in living things without using some form of evolution.

    Any of the modern attempts to produce self-replicating molecules from the basic amino acids is design. The design is implicit in the selection of the chemical environment, including temperatures, quantities of materials, filtering, cycling frequencies, the ordering of steps. The number of contingent steps spanned by intelligent intervention in order to produce the outcome provides the required injection of CSI.

    Here’s one of my favourite quotes from David Berlinski in response to reports in advances to create self-replicating RNA:

    If the steps leading to the appearance of the pyridimines in a pre-biotic environment are not yet plausible, then neither is the appearance of a self-replicating form of RNA. Experiments conducted by Tracey Lincoln and Gerald Joyce at the Scripps Institute have demonstrated the existence of self-replicating RNA by a process of in vitro evolution. They began with what they needed and purified what they got until they got what they wanted.

  21. SCheesman: Here’s one of my favourite quotes from David Berlinski in response to reports in advances to create self-replicating RNA:

    David Berlinski. Is there anything he does not know?

  22. Joe G:
    Umm we have to follow the evidence.

    Do you have any evidence that blind and undirected processes can design anything?

    You have no evidence then?

  23. SCheesman: Any of the modern attempts to produce self-replicating molecules from the basic amino acids is design. The design is implicit in the selection of the chemical environment, including temperatures, quantities of materials, filtering, cycling frequencies, the ordering of steps. The number of contingent steps spanned by intelligent intervention in order to produce the outcome provides the required injection of CSI.

    The problem here, as I see it, is that by that reasoning, any empirical attempt by Intelligent Scientists to find out whether, under a specific set of circumstances, life could begin/evolve supports design!

    But we do not apply this reasoning to other aspects of science. If we can do something in a lab (find an elementary particle, for instance) we consider that support for the hypothesis that it could occur outside the lab, without Intelligent Scientists accelerating stuff in a cyclotron, for instance.

    The point surely is that if we find circumstances under which life emerges or evolves spontaneously, then we have evidence that there exist circumstances in which life can emerge or evolve. The next step is to find out whether those conditions, or similar, were present on early earth, not to dismiss the findings just because the conditions were selected by Intelligent Scientists, surely?

  24. Creodont: Can you point me to where the complexity, and the specified information, in a chicken egg was measured?

    I hate to have to explain a joke, but since there is also a basic truth involved, I will write out the detailed, mathematical formula in simplified terms:

    Sum (Egg CSI) = Sum (Chicken CSI)*

    (* for derivation of chicken CSI, examine original egg)

  25. YOU don’t have any evidence then?

    YOU don’t even have a way to test your position.

  26. Creodont: Can you point me to where the complexity, and the specified information, in a chicken egg was measured?

    In a more serious vein, a good deal of the answer really is encapsulated in the genome of the chicken, being the instructions encoded for the production of all the required chicken parts necessary in the production of a chicken egg. I say “a good deal” because, plainly, given a single fertilized chicken egg cell a chicken will not spring unbidden, unless the considerable infrastructure of a mother chicken is not also present in the proper spatial relationship to nuture its development. So while it might be possible to quantify the amount of information by counting the number of letters in its genetic code, the it would seem that that is but a subset of the total required “information” to produce a chicken. That’s the best answer I can give you.

  27. Any of the modern attempts to produce self-replicating molecules from the basic amino acids is design.

    I agree with Elizabeth that the fact that experiments involve controls (and are designed) does not mean the the underlying phenomena being studied is under the continuous control of demiurges.

    What science does (and the study of OOL is included) is look for regularities. The kind of regularities that allowed Newton to derive the motions of planets from the flight of cannonballs.

    But you ignored my question. What is your response to my question regarding the possibility of design without evolution? How does the designer know what sequences are functional? How does the designer reach the supposedly isolated islands of function? How did he achieve this knowledge? What is the design process?

  28. I’m going to expand a bit.

    ID advocates have posed the problem of sequence design as something comparable to breaking the combination of a lock with a 150 bit combination (sometimes 500 bits). a lock that will open only when the correct combination is entered and still not yield any feedback if you get 10 or 100 or 149 bits right. No incremental fitness; no sideways fitness.

    Modern cipher keys are like this, and civilization depends on the inability of intelligent agents to break ciphers. This is precisely the heart of my argument. Our whole financial world is dependent on the inability of intelligent designers to do precisely what ID advocates assert they can do.

    The alternative hypothesis is that biological function is not like this. The mainstream hypothesis is that sequences evolve, that they can be connected by incremental change.

    If functional sequences are isolated like combination locks or cipher keys, then design is impossible. If they are connectable, and incremental change is possible, then evolution is possible and ID is ruled out by Occam’s razor.

  29. junkdnaforlife:
    Joe Felsenstein,
    You say:

    “In addition, I am not calculating measures of complexity of the population.”

    This has been made clear from your 2007 paper and this post. However, complexity is the first measurement in the Complex Specified Information argument.

    “Complexity” may mean different things to different people. In regard to compressibility of the pattern, Dembski made clear in No Free Lunch that compressibility was one, but only one, of the ways to define a Specification. For example, in section 2.4 of that book he describes compressibility (from algorithmic information theory) as “another example of extremal sets … that successfully eliminate chance ….” (p. 58)

    On page 148 he says that

    The specification of organisms can be cashed out in any number of ways. Arno Wouters cashes it out globally in terms of the viability of whole organisms. Michael Behe cashes it out in terms of the minimal function of biochemical systems. Darwinist Richard Dawkins cashes out biological specification in terms of the reproduction of genes.

    I am using a scale of the dosage of alleles that positively affect the fitness. That is similar to Wouters’s and Dawkins’s specifications.

    You also state:

    “So even if natural selection isn’t “putting” the information into the genome, its presence has the dramatic effect of invalidating the argument that the presence of CSI proves that natural processes other than Design could not be responsible for the adaptation.”

    If you were not calculating population frequencies to satisfy or disqualify the complexity signature then I am not understanding how the presence of CSI was identified. You seem to pivot, bob and weave a little between CSI and term functional information.

    I am not sure what you are arguing. I am using a scale of fitness (or the dosage of 1s in the genome, with a rejection region on that scale (as Dembski does). He defines Complex Specified Information as being in the top 10-to-the-minus-150th of that region, under some null distribution. Functional Information (as defined by Hazen et al.) is defined as minus the log-to-the-base-2 of the probability of being as far, or farther, out into that tail. I took the null distribution to be equiprobability of all sequences, which corresponds to gene frequencies of 0.5 at all loci, When Functional Information reaches about 500 bits, that means the genotype has Complex Specified Information.

    I am not trying to “duck and weave”. These are broadly consistent notions. One measures how far out you are, the other tells whether you have reached a preset level.

  30. Joe G:
    YOU don’t have any evidence then?

    YOU don’t even have a way to test your position.

    Since you don’t know what my position is, how do you know if I have a way to test it or not?

    Do you always play childish games and avoid answering questions about your claims? Are my questions making you uncomfortable? I hope you realize that avoiding my questions and the questions from others is demonstrating the vacuity of your claims.

  31. Elizabeth: The problem here, as I see it, is that by that reasoning, any empirical attempt by Intelligent Scientists to find out whether, under a specific set of circumstances, life could begin/evolve supports design!

    No. Scientists can replicate an environment which approximates a natural one. This might include a system around a hydro-thermal vent which could include cycling and various concentration gradients etc, and that is fair game. In the same way geochemists regularly duplicate conditions for the formation and evolution of mineral species. What is important is the accounting of how many “decisions” are required along the route to produce required effect, and whether the probabilitstic resources are available to explore them. If abiogenesis really is possible, we should be able to demonstrate it in a lab; if not to “completion”, then at least an important fraction of the way along, and do it under conditions that are “reasonable” given our understanding of early-earth geochemistry.

  32. The “accounting”, as I see it, falls into two parts; one is simply the production of the information-bearing molecules of sufficient size. Every step which requires filtering/purifying, or changes in the application of energy necessary to create a new chemical bond, or the availability of necessary catalysts or constituents, should be accounted for. Then there is the fact that not just any order is required in the final self-replicating molecules, and are enough produced to ensure reasonable odds of finding the “right” ones?

    In geochemistry the material availability and the evolution of conditions as pressures and temperatures change directly feed into the output minerals. So the same sort analysis should extend to abiogenesis, except the ordering becomes a new issue.

  33. SCheesman: In a more serious vein, a good deal of the answer really is encapsulated in the genome of the chicken, being the instructions encoded for the production of all the required chicken parts necessary in the production of a chicken egg. I say “a good deal” because, plainly, given a single fertilized chicken egg cell a chicken will not spring unbidden, unless the considerable infrastructure of a mother chicken is not also present in the proper spatial relationship to nuture its development. So while it might be possible to quantify the amount of information by counting the number of letters in its genetic code, the it would seem that that is but a subset of the total required “information” to produce a chicken. That’s the best answer I can give you.

    In other words, neither the complexity nor the specified information (using those terms) has been measured in a chicken egg. Has either been measured in anything else, by a means that doesn’t just count letters that are a labeling system applied by humans? Does a spider have more CSI than a whale because there are more letters in the word spider?

    I would also like to know how information can be called specified if a designers pre-production specifications are not known?

  34. SCheesman: No. Scientists can replicate an environment which approximates a natural one. This might include a system around a hydro-thermal vent which could include cycling and various concentration gradients etc, and that is fair game. In the same way geochemists regularly duplicate conditions for the formation and evolution of mineral species. What is important is the accounting of how many “decisions” are required along the route to produce required effect, and whether the probabilitstic resources are available to explore them. If abiogenesis really is possible, we should be able to demonstrate it in a lab; if not to “completion”, then at least an important fraction of the way along, and do it under conditions that are “reasonable” given our understanding of early-earth geochemistry.

    Well, the reason that there are ample “probabilistic resources” once self-replication-with-variation has got started is that the decision-nodes themselves are replicated.

    But it is perfectly true that we do not yet have a convincing account of OOL. My hunch is that within my lifetime we will see self-replicating proto-cells emerge from non-self-replicating chemicals in the lab.

    And I’m sixty today!

    Feeling pretty healthy though, and looking forward to cake…..

  35. Happy birthday Elizabeth!

    My own take on the OoL question is that we don’t really know what it is we should be ‘creating’, nor which, out of the many nooks and crannies available on a 12,000-mile globe, we should be creating it in. Catalysis and energy sources are likely to be key ingredients – because of 2LoT – and the possible combinations are vast. And I’m inclined to think that membranes are vital, too, giving us an additional biophysical problem. Membrane gradients are the prime ‘energy converter’ in Life.

    Evidently, it needs to be some kind of replicator, and the prime candidate for that is RNA. But we haven’t got too far with that. Oro in 1961 demonstrated how adenine can be synthesised abiotically from hydrogen cyanide, but getting that into stable polymers is one hell of a task.

    I am interested in the role of ATP (adenosine triphosphate), however. The 3D shape of this molecule is vital for its role as a nucleic acid monomer. It is precisely because you can cleave off two of the phosphates and then attach the remaining one to the ribose of another that gives us both the energy to elongate RNA, and the physical ‘stuff’ of the RNA polymer itself. But the fact that the adenine comes off the chain at an angle is vital for that physical role. Plenty of sugar triphosphates with various side-chains could conceivably polymerise, but only ones with this flat side chain sticking out at such an angle can be stacked in the familiar way.

    But there is another constraint – when such a single strand of RNA turns back on itself, it can form a ‘hairpin’ length of double helix. The chains are oppositely oriented (exactly as in DNA), so the bases going one way edge-align with those going the other way ‘upside down’ as it were – like a line of people with right arms extended: if it looped back, fingertips could touch but palms would be facing the opposite way. This restricts what can go the other side to stabilise the helix – polyadenine cannot form hairpins, but a stretch of polyadenine can form a hairpin with a stretch of polyuracil, because the charge distributions of the molecules, when oriented upside down, are complementary and form hydrogen bonds.

    This pairing is carried forward into DNA, which is ALL about pairing – RNA can form all kinds of structures, but DNA is much more tightly restricted. Since modern RNA is all made from DNA, it is constrained to only hold the bases A, U, C and G. The range of RNA is limited by structural constraints in DNA.

    But in ‘RNA world’, there is no such restriction. Where an RNA double helix forms, it must form between some kind of complementary pairs. But where unrestricted, the monomers can be anything – including ones that would disrupt a double helix, because they don’t come off at the ‘magic’ angle that ATP and its relatives do.

    So the problem is, potentially, exacerbated. It may be that there are replicative reasons why RNA was constrained to A, U, C and G. Equally, there may not, and this opens up a vast variety of sequence space in which replicative ability can be found, but which needs bases in ribozymes that are not part of the current restricted RNA base set. And the ‘first replicator’ could be one of them.

    So I’m inclined to be sceptical that we can ever successfully create a ‘proof of concept’ of the (assumed) ‘natural’ OoL.

  36. Creodont: In other words, neither the complexity nor the specified information (using those terms) has been measured in a chicken egg. Has either been measured in anything else, by a means that doesn’t just count letters that are a labeling system applied by humans? Does a spider have more CSI than a whale because there are more letters in the word spider?

    The question is really equivalent to “how many instructions does it take to create a chicken”, so it’s not surprising that such a measurement is not available on Wikipeadia. In addition, it’s not so easy to equate a certain length of the genome to a single instruction; sure some sections code for proteins that might perform a certain function, but the purpose of long stretches may be involved in such things as ensuring the shape of chromosomes, or any number of other things. Some might no longer have a function (junk DNA, anyone?). Finally, not all “specified information” need be complex-specified-information. It might, in principle, be possible to create some effective measure for individual molecular machines or pathways where just a few instructions are required for their construction. This question should be of interest to anyone interested in evolution. The information had to get there somehow, and it had to be added over time, if we assume we all came from carbon and oxygen and other atoms in the first place.

  37. So I wonder which came first: the instructions for making the chicken, or the reader/interpreter of the instruction.

  38. SCheesman:

    it’s not so easy to equate a certain length of the genome to a single instruction

    I don’t think it’s really that helpful to view the genome as ‘instructions’ at all. There is a ‘computer-like’ quality to it – the multiple sites of activator and repressor sequences, for instance, function like logic gates, the bases amount to a quaternary code comparable to binary, and the genetic translation system looks a bit like ASCII – but it is chemical ‘stuff’. It has linear form, but the phenotypes built from it are decidely non-linear.

    The problem for any attempt to get a handle on ‘complexity’, or ‘information’, from the linear set of bases in DNA, is that context is everything. Just as a particular binary computer sequence could represent part of a picture of a horse, or a Beethoven sonata in MP3, or a virus, so does any piece of DNA depend upon what is ‘reading’ it as to what it does. And not just that; even a piece of coding sequence is entirely dependent upon where it ends up in the folded protein as to what its ‘function’ is. You can’t tell from looking at the genome, even though the ‘information’ is kind of in there. But it is also in the actual state of activator and repressor molecules in any given cell, it is in the physics of protein folding, it is in the complex chaos of multiple interactions between genes and gene products and the wider environment.

    Genotype is linear. That’s why it is possible to amend it ad lib without having any real effect on its ‘complexity’ – most linear sequences of bases of a given length are just as complex as each other, in a trivial sense. The complexity that interests us – complex organisms – is pulled out of the linear sequence, folded by physics and integrated by better survival of successful integrations. But this is phenotype, and it is not inherited. There is nothing fundamental preventing the genotypes for complex phenotypes being inherited by the sequence-disinterested process of DNA replication – even if the complexity of the phenotypes is on an upward trajectory, the DNA copying process copies AAAAAAAAAAAAAAAAAAA and ATACAGTGACGATAGCAGAT with entirely even hand, even if the first means nothing and the second is a a vital instruction in the making of a heart (it probably isn’t; I made it up). But given that even simple protein folds are the kind of thing that takes a supercomputer (or Folding@Home parallel processing of networked spare pc capacity, a la SETI), trying to determine a serious ‘function’ heuristic, still less a ‘designed function’ one, for anything more complex seems like a non-starter.

  39. SCheesman: The question is really equivalent to “how many instructions does it take to create a chicken”, so it’s not surprising that such a measurement is not available on Wikipeadia. In addition, it’s not so easy to equate a certain length of the genome to a single instruction; sure some sections code for proteins that might perform a certain function, but the purpose of long stretches may be involved in such things as ensuring the shape of chromosomes, or any number of other things. Some might no longer have a function (junk DNA, anyone?). Finally, not all “specified information” need be complex-specified-information. It might, in principle, be possible to create some effective measure for individual molecular machines or pathways where just a few instructions are required for their construction. This question should be of interest to anyone interested in evolution. The information had to get there somehow, and it had to be added over time, if we assume we all came from carbon and oxygen and other atoms in the first place.

    ID proponents invented and use the term CSI (complex specified information), and adamantly claim that it can be easily measured. I think it’s reasonable to ask what the measurement of CSI is in easily available things, like chicken eggs for instance. I could have asked for the measurement of CSI in the nervous system or reproductive organs of a gynandromorphic Plebejus anna.

  40. Creodont: ID proponents invented and use the term CSI (complex specified information), and adamantly claim that it can be easily measured.

    Well, I think the concept is useful, but I don’t think it is so easy to measure. The construction of anything surely is reducible to a discrete number of steps or specifications, and if something requires a lot of steps, why not say that is a complex specification? The number of bits in a binary code is not an unreasonable start, since it represents a sequence of “yes” and “no” decisions, but extending that to a genetic code, as Allan Miller notes above, is hardly straightforward, in the best of circumstances. And I repeat, I do think this sort of analysis is valuable, no matter where you stand in w.r.t. ID vs naturalistic origins.

  41. Creodont,

    I believe you are conflating time-dependant systems and time-independent systems, (Hazen et al makes this distinction I believe in his I(Ex) argument).

    To begin this difficult outline, we might choose to isolate at shell level, whereas the predominant element appears to be Ca, (source population earth’s crust). We might first consider all possible microstates of Ca that would be considered an egg shell/all possible microstates for Ca and take that value as x, then draw on Shannon, u = -log(p(x)), (however not sure how to handle Boltz k in this system). Nonetheless, if correctly calculated, a very high value for u might be be an argument that [C]SI is satisfied, it follows then to consider S is dependent on establishing C etc

    p.s. Happy B-day Dr. Liz Liddle

  42. SCheesman: Well, I think the concept is useful, but I don’t think it is so easy to measure. The construction of anything surely is reducible to a discrete number of steps or specifications, and if something requires a lot of steps, why not say that is a complex specification? The number of bits in a binary code is not an unreasonable start, since it represents a sequence of “yes” and “no” decisions, but extending that to a genetic code, as Allan Miller notes above, is hardly straightforward, in the best of circumstances. And I repeat, I do think this sort of analysis is valuable, no matter where you stand in w.r.t. ID vs naturalistic origins.

    Reducible to a discrete number of steps, or to a number of steps, sounds more like Darwinian theory than ID, and I see no need or reason to add the word specification, if for no other reason that any so-called specification is unknown unless the assumed designer’s pre-production specifications are known. I think it’s safe to say that no such thing will ever be known.

    The word complex is already used by scientists and adds nothing to ID claims, and also doesn’t add anything new to science. As with other arguments from ID proponents you appear to be suggesting that the terminology used in the study of evolutionary biology should be changed for no reason other than to give credit to an assumed designer that has never been shown to exist. It’s not a change of terminology that is needed from ID proponents, it’s a working hypothesis, definitions that make sense, and testable evidence that supports their claims, if they want to be taken seriously.

  43. junkdnaforlife:
    Creodont,

    I believe you are conflating time-dependant systems and time-independent systems, (Hazen et al makes this distinction I believe in his I(Ex) argument).

    To begin this difficult outline, we might choose to isolate at shell level, whereas the predominant element appears to be Ca, (source population earth’s crust). We might first consider all possible microstates of Ca that would be considered an egg shell/all possible microstates for Ca and take that value as x, then draw on Shannon, u = -log(p(x)),(however not sure how to handle Boltz k in this system). Nonetheless, if correctly calculated, a very high value for umight be be an argument that [C]SI is satisfied, it follows then to consider S is dependent on establishing C etc

    p.s. Happy B-day Dr. Liz Liddle

    Your response is interesting, although I feel as though you’re just trying to test me. Honestly, the only thing in your response that matters is the way you used the word “satisfied”, in reference to CSI. CSI is a poorly defined term that has been shown to be useless in determining anything that relates to evolution, or biology in general. Before anything in regard to the term CSI can be “satisfied” in a scientific sense, the term CSI must be specifically defined in a manner that makes the term useful, and actual CSI (whatever it is) must be measurable, and the measurement itself must be useful. I think it is very telling that no ID proponent seems to be able to measure the assumed CSI in something as common as a chicken egg, or in any other organism.

  44. junkdnaforlife

    We might first consider all possible microstates of Ca that would be considered an egg shell/all possible microstates for Ca and take that value as x, then draw on Shannon, u = -log(p(x)), (however not sure how to handle Boltz k in this system).

    I think you may be confusing two kinds of entropy, and extending Shannon way beyond the ‘informatic’ realm! The observation that there is an interesting distribution of calcium on the surface of the earth, which on further investigation collects surprisingly in thin spherical sheets that turn out to be alternately inside and outside chickens, does not really tell us much about how that distribution came about! :0)

  45. SCheesman:

    The number of bits in a binary code is not an unreasonable start, since it represents a sequence of “yes” and “no” decisions, but extending that to a genetic code …

    I think the vital word is “sequence”. Linear DNA sequence is important for chemical reasons, not informatic ones. You can only copy DNA to DNA or RNA in one direction (adding to the ‘sugar’ end, or 3′), and ribosomes only condense amino acids to protein by adding to the -NH2 end. Both of these constraints arise, in part, from stereochemistry – the simple fact that, as carbon has 4 binding directions that arrange tetrahedrally, molecules made from them with different atoms in each direction (including the ribose sugar of DNA/RNA, and amino acids) have distinct ‘ends’. When you polymerise the monomers, your enzymes/ribozymes that do this, being themselves built from carbon, have to ‘pick’ a direction and stick with it – and there may be energetic resons why one direction is preferable to the other.

    The result – a folded protein or ribozyme – is just a globular chemical catalyst. The fact that it is made from a chain is secondary to the fact that this chain is folded.

    But it is misleading to think that this sequence is anything more than analogous to sequential ‘information’ in comms packets or computer code/files.

    A different meaning of sequence (and another computer analogy) arises when we think of the complex control that is applied to genetic expression during a life. The ordered expression of genes is vital for proper replication, and the ‘serial’ phenotype – the transitions from blastocyst to embryo to juvenile to adult – is just as ‘complex’ as a time-specific ‘slice’ taken through the current stage reached by the genetic ‘program’. And of course it never got to any stage without going through the one before. It always starts (comparatively) simple.

    But really, this is all fixation upon phenotype. In one sense, phenotype hardly matters, so long as it ‘works’. Genes – the linear sequence of DNA – are replicated, life after life. In animals, the germ line genes remain dormant and broadly undifferentiated (except that they move towards generating gametes). The rest is just a massive shell of dead-end cells containing those same genes. Those shells can become enormously elaborate, and they play a vital role in the survival of the germ line. But they don’t have to be complex; they don’t even have to exist. Single-celled organisms get along fine without them. Obviously, since we are such somatic shells, we regard them as very important. But in evolutionary terms, they are just another way of making a living.

    Coming up with a metric for their elaborateness would be interesting if we could do it. I’d go for a ‘psuedo-Kolmogorov’ measure – ‘complexity’ is the length of time it would take to execute a computer program that could determine, without actually needing to build the organism, whether the DNA genotype will be viable. Might be easier to just build the organism!

  46. Creodont: I think it is very telling that no ID proponent seems to be able to measure the assumed CSI in something as common as a chicken egg, or in any other organism.

    What is telling is your misunderstanding of the issues. Forget CSI for a moment. How much information is in a chicken egg? The same as in a chicken. What is that information except the instructions and machinery necessary to create a chicken? This is hardly different than asking to quantify the information required to create life. Give me the instructions to create life from non-life, and I’ll be well along the way telling you the quantity of “Chicken-Specifying-Information”.

    CSI is about origins, as others on this and other threads have pointed out. It is virtually incoherent to ask how much CSI is in a chicken egg. What you can answer are questions like “What is the global increase in CSI in the universe when a chicken lays an egg?” The answer to that question, pretty well without exception, is zero, or negative. Of course “evolution” requires it to be, on average, greater than or equal to zero, or you have stasis or “devolution”.

Leave a Reply