# Natural selection can put Functional Information into the genome

It is quite common for ID commenters to argue that it is not possible for evolutionary forces such as natural selection to put Functional Information or Specified Information) into the genome. Whether they know it or not, these commenters are relying on William Dembski’s Law of Conservation of Complex Specified information. It is supposed to show that Complex Specified Information cannot be put into the genome. Many people have argued that this theorem is incorrect. In my 2007 article I summarized many of these objections and added some of my own.

One of the sections of that article gave a simple computational example of mine showing natural selection putting nearly 2 bits of specified information into the genome, by replacing an equal mixture of A, T, G, and C at one site with 99.9% C.

This post is intended to show a more dramatic example along the same lines.

Suppose that we have a large population of wombats and we are following 100 loci in their genome. We will make the wombats haploid rather than diploid, to make the argument simpler (diploid wombats would give a nearly equivalent result). At each locus there are two possible alleles, which we will call 0 and 1. We start with equal gene frequencies 1/2 and 1/2 of these two alleles at each locus. We also assume no association (no linkage disequilibrium) between alleles at different loci. Initially the haploytypes (haploid genotypes) are all combinations from 00000…000 to 11111…111, all equiprobable.

Let’s assume that the 1 allele is more fit than the 0 allele at each locus. The fitness of 1 is 1.01, and the fitness of 0 is 1. We assume that the fitnesses are multiplicative, so that a haploid genotype with M alleles 1 and 100-M alleles 0 has fitness 1.01 raised to the Mth power. Initially the number of 1s and 0s will be nearly 50:50 in all genotypes. The fraction of genotypes that have 90:10 or more will be very small, in fact less than 0.0000000000000000154. So very few individuals will have high fitnesses.

What will happen to these multiple loci? This case results in the gene frequency of the 1 allele rising at each locus. The straightforward equations of theoretical population genetics show that after 214 generations of natural selection, the genotypes will now have gene frequency 0.8937253. The fraction of genotypes having 90:10 or more will then be 0.500711. So the distribution of genotypes has moved far enough toward ones of high fitness that over half of them have 90 or more 1s. If you feel that this is not far enough, consider what happens after 500 generations. The gene frequencies at each locus are then 0.99314, and the fraction of the population with more than 90 1s is then more than 0.999999999.

The essence of the notion of Functional Information, or Specified Information, is that it measures how far out on some scale the genotypes have gone. The relevant measure is fitness. Whether or not my discussion (or Dembski’s) is sound information theory, the key question is whether there is some conservation law which shows that natural selection cannot significantly improve fitness by improving adaptation. My paper argued that there is no such law. This numerical example shows a simple model of natural selection doing exactly what Dembski’s LCCSI law said it cannot do. I should note that Dembski set the threshold for Complex Specified Information far enough out on the fitness scale that we would have needed to use 500 loci in this example. We could do so — I used 100 loci here because the calculations gave less trouble with underflows.

I hope that ID commenters will take examples like this into account and change their tune.

Let me anticipate some objections and quickly answer them:

1. This is an oversimplified model, you were not realistic. Dembski’s theorems were intended to show that even in simple models, Specified (or Functional) Information could not be put into genomes. It is therefore appropriate to check that in such simplified models, where we can do the calculation. For if natural selection is in trouble in these simple models, it is in trouble more generally.

2. You have not allowed for genetic drift, which would be present in any finite population. For simplicity I left it out and did a completely deterministic model. Adding in genetic drift would complicate the presentation enormously, but would still result in the achievement of a population with all 11111…1111 genotypes after only a modest number more of generations.

3. If fitness differences are due to inviability of some genotypes, fitnesses could not exceed 1. Yes, but making the 0 allele have fitness 1/1.01 = 0.9900990099… and the 1 allele have fitness 1 could then be used, and the results would be exactly the same, as long as the ratio of fitnesses of 0 and 1 is still 1:1.01.

4. You just followed gene frequencies — what about frequencies of haplotypes? This case was set up with multiplicative fitnesses so that there would never be linkage disequilibrium, so only gene frequencies need to be followed.

I trust also that people will not raise all sorts of other matters (the origin of life, the bacterial flagellum, the origin of the universe, quantum mechanics, etc.) To do so would be to admit that they have no answer to this example, which shows that natural selection can put functional information into the genome.

This entry was posted in Uncategorized by Joe Felsenstein. Bookmark the permalink.

Been messing about with phylogenies, coalescents, theoretical population genetics, and stomping bad mathematical arguments by creationists for some years.

## 254 thoughts on “Natural selection can put Functional Information into the genome”

1. I’m being argumentative because I think you have falsely characterized functional sequences as isolated. I’m waiting for the evidence.

2. olegt: So, you are starting with a population that is initially near a peak of its fitness. The fitness landscape changes and the peak shifts away from its previous position. The population will follow the gradient toward the new (local) maximum. Once it reaches the maximum, it will stay there. Right?

Yes.

3. SCheesman: Why do you think I see a problem with that?

Because earlier you wrote this:

SCheesman: OK, then I need to add minor and limited. Eventually the gradient returns to the mean, like a random walk in a large bowl. This is not what evolution requires, in a nutshell.

We’ve agreed that a change in the fitness landscape would lead the population to the new location of the fitness maximum. Why do you think it is at odds with Darwinian evolution?

4. petrushka: I’m being argumentative because I think you have falsely characterized functional sequences as isolated. I’m waiting for the evidence.

I am coming to realize that you see everything as absolutes. I suggest that some functional sequences are isolated, and you accuse me of denying that any are connected. I say that there are things that simple RM+NS cannot accomplish, and you accuse me of ignoring experiments that change the proportions of alleles in a population. Even Elizabeth has offered examples of things she struggles with, but with you there is no difficulty anywhere, not a shred of skeptism or hesitation. Who is the “true believer” here? Why don’t you show me evidence of the power of RM+NS to evolve anything more than trivial changes, ones that demonstrably improve the fitness of a population over their form “in the wild”, or ones that create de novo some new multi-comonent biological system that confers a new survival benefit? As part of the work, relate (as does Lenski, whose work I greatly admire) changes to specific genomic variations.

5. What about it? No one can demonstrate that such a thing is possible- ya I know that you think there is fossil evidence to support it but you need genetic/ biological evidence, not speculations based on what you think you see in the fossil record.

6. It still doesn’t do anything- it doesn’t construct new, useful multi-protein configurations.

So, sure, you can have that, because with that you still have nothing.

Good work…

7. Tell me how the inner ear is not a complex new function created by incremental change.

8. No, I’ve been answering your questions long enough now. Tell me how RM+NS is sufficient to explain the evolution of the inner ear by incremental change. You’re good at posing questions. How good are you at answering them? What environmental changes led to the change? Why is the current system superior to the previous one, based on fitness? What specific evolutionary mechanisms were responsible? What were the key mutational changes and how did they proceed?

9. Actually you haven’t answered the question I’ve been asking. I really have only one big question, and that is how do you know that function is isolated? What is your evidence?

If you have answered that, you could point me to the answer. Perhaps I missed it.

Now I think on the question of the inner ear you are simply being absurd. We have ample evidence that variation and selection can modify the shape and size of bones, even within the span of a few hundred years. Dog breeds are a prime example.

But anyway, if you actually believe that the inner ear evolution is in question, I at least know something more about your thought processes.

10. petrushka: Actually you haven’t answered the question I’ve been asking. I really have only one big question, and that is how do you know that function is isolated? What is your evidence?If you have answered that, you could point me to the answer. Perhaps I missed it.Now I think on the question of the inner ear you are simply being absurd. We have ample evidence that variation and selection can modify the shape and size of bones, even within the span of a few hundred years. Dog breeds are a prime example.But anyway, if you actually believe that the inner ear evolution is in question, I at least know something more about your thought processes.

The evolution of any system that has been described as irreducibly complex. You might be able to change the size of dog’s bones by breeding, but that is a universe away from creating an entire system of hearing and balance (which is what the inner ear provides, with numerous multi-component systems), which somehow, thorugh a series of lucky accidents, is supposed to have originated from a jaw.

I you don’t like my examples, use Elizabeth’s – OOL and the ribozome. These give you no inner doubts of the efficacy of natural processes?

11. The evolution of any system that has been described as irreducibly complex. You might be able to change the size of dog’s bones by breeding, but that is a universe away from creating an entire system of hearing and balance

What part of the inner ear system is not represented by transitional forms?

12. petrushka: What part of the inner ear system is not represented by transitional forms?

Any part that doesn’t fossilize, I expect. You tell me what is.

13. SCheesman: “They pose the problem differently; in their case you need to specify all the bits to achieve a particular function, like a combination lock, or a series of right/left turn directions to get you to a specific destination. ”

I think we can agree that all of the steps, (i.e. “information”), don’t have to be there for it NOT to work.

All the bit combinations however, are “explored” in “n life-form/generations”.

That’s the take away point, that the “key” we are looking for can be obtained in “n steps”, not “(2^n) steps”.

When that last bit, in the last generation flips, we have our functionality, and we have it in “n” bits, not (2^n) bits.

14. Toronto: This OP is about natural selection leading to Functional Information.I think we can agree that all of the steps, (i.e. “information”), don’t have to be there for it NOT to work.All the bit combinations however, are “explored” in “n life-form/generations”.That’s the take away point, that the “key” we are looking for can be obtained in “n steps”, not “(2^n) steps”.When that last bit, in the last generation flips, we have our functionality, and we have it in “n” bits, not (2^n) bits.

The problem in the OP in no way corresponds to the “key in the lock” problem. As it is laid out it is exactly correct. It starts with a non-zero fitness and there is a gradient each step of the way. For the key in the lock problem there is no gradient, as only if all of the bits match exactly is the fitness non-zero. Personally, I have no problem assenting to the proposition, that in situations where the model as described at the start of this OP faithfully maps onto the real-life genome, then the results will occur as predicted and fitness will increase with mathematical precision. This OP, if I am not mistaken, is pretty well identical to the “Me Thinks It is Like a Weasel” problem, translated into bits.

15. SCheesman: The problem in the OP in no way corresponds to the “key in the lock” problem. As it is laid out it is exactly correct. It starts with a non-zero fitness and there is a gradient each step of the way. For the key in the lock problem there is no gradient, as only if all of the bits match exactly is the fitness non-zero. Personally, I have no problem assenting to the proposition, that in situations where the model as described at the start of this OP faithfully maps onto the real-life genome, then the results will occur as predicted and fitness will increase with mathematical precision. This OP, if I am not mistaken, is pretty well identical to the “Me Thinks It is Like a Weasel” problem, translated into bits.

We’ve gone over this many times already. This thread is about situations where fitness gradients lead a population to a fitness peak. It is emphatically not about completely random landscapes. Can we stop chasing this rabbit trail? If you are interested in defending the idea of random fitness landscapes, why not open a separate thread?

16. SCheesman,

SCheesman: “The problem in the OP in no way corresponds to the “key in the lock” problem.”

1) The only way the “key in the lock” problem CAN be solved is by “information”. according to ID.
2) Can this “information” be generated by natural selection as addressed by this OP?

Imagine a population of (65,535 – 1) where your target window is 16 bits.

There is a very good chance that one out of those individuals is only one bit AWAY from your ID “IC target”.

From that one individual, we can expect the “target” to be reached within 65,535 offspring.

With two surviving offspring from each generational parent along this lineage, we would expect a doubling of the population every generation leading to the target being reached from the base generation within 16 generations providing only one bit changes from the base for each offspring.

Mathematically, IC is not a problem, but at that point, the UPB is meaningless and Dembski’s argument needs biological help, not mathematical.

17. SCheesman,

It’s only 16 offspring at a single bit change when evolution as defined by evos is used.

18. olegt: We’ve gone over this many times already. This thread is about situations where fitness gradients lead a population to a fitness peak. It is emphatically not about completely random landscapes. Can we stop chasing this rabbit trail? If you are interested in defending the idea of random fitness landscapes, why not open a separate thread?

I was not criticizing the thread for the problem it posed, merely trying to explain to Toronto the difference in the two situations. Apparently I am the only one here trying to shine light of the differences.

19. Anyway, do all of you feel this is played out? Lets start fresh elsewhere. I may or may not jump into the next thread(s).

20. I think this thread is played out; I may come in on a newer one at some point.

21. SCheesman,

SCheesman: “I was not criticizing the thread for the problem it posed, merely trying to explain to Toronto the difference in the two situations. Apparently I am the only one here trying to shine light of the differences.”

The weakness for ID is that “IC” is NOT a problem when it comes to its generation by natural (non-ID), processes as claimed by this post.

Even starting from “random” scratch in a population, if the population is large enough, the “random” start argument means that an n bit window, will statistically have a single population member 1 bit away from “IC” functionality in a population of (2^n) individuals.

Starting with that single member, (i.e. now an island of functionality), a single bit change from that base, can appear statistically, within 16 of its offspring.

22. SCheesman: I was not criticizing the thread for the problem it posed, merely trying to explain to Toronto the difference in the two situations. Apparently I am the only one here trying to shine light of the differences.

You’re the only one here who thinks that random landscapes have any relevance to this thread. They don’t.

23. As Eric Holloway has disinterred Bill Dembski’s “explanatory filter” recently and Joe Felsenstein reminded us of his first OP here at TSZ, I thought it was worth featuring it in case Eric would like to explain to skeptics how David embski’s filter works and how it can be usefully applied.

Over to you, Dr Holloway!

ETA Eric’s claim:

Well, I guess we’ll all see if the ID tech pans out in the market place. Or not, if it doesn’t 🙂

The technology is a direct derivation from Dembski’s explanatory filter, and it definitely works. I have mathematical proof, empirical verification, and a working implementation. The question is, just how useful is it? Time will tell.

24. “…One of the sections of that article gave a simple computational example of mine showing natural selection putting nearly 2 bits of specified information into the genome, by replacing an equal mixture of A, T, G, and C at one site with 99.9% C.”

How does this even apply to biology???

25. How does this even apply to biology???

Are you familiar with the substance known as DNA?

26. J-Mac: Yes. So?

So, that. If you’re familiar with DNA, you’ll know why polymorphic A, C, T and G at a site becoming all-C is biologically relevant.

27. That’s not what I’m talking about…
I’m talking about real biology, rather than speculative like this:
…Suppose that we have a large population of wombats and we are following 100 loci in their genome…

Why suppose? Why not do the real deal science?

28. Thanks, Alan Fox, for “pinning” my 2012 post. Folks, I didn’t ask anyone to do that, but I continue to agree with what I wrote then. Will await with interest any substantive discussion of that. So far, since the “pinning”, there has been none yet.

29. Yes natural selection can, and what is your point?

You merely shift the CSI to a higher level, i.e. the selection function.

The question still is: where did the CSI come from? Shifting it around does not answer the question.

Additionally, even if the selection function contains the CSI, there are limits on how quickly it can be mined. See Winston’s work on mining oracles.

30. Will NCSE publish a rebuttal from someone with an actual background in Information Theory? Or is that just a playground reserved for like-minded if clueless participants?

Anticipating objections is great if you can actually do it. But real objections are always better than fake ones. Btw, what is your “fitness” function?

And what on earth is “Functional Information”?!? Say of some famous painting? Could it be more than clueless evolutionistas and the confused IDistas sharing a gibberish language?

31. EricMH:
Yes natural selection can, and what is your point?

You merely shift the CSI to a higher level, i.e. the selection function.

The question still is: where did the CSI come from?Shifting it around does not answer the question.

OK, so I’m pleased that we are agreed that William Dembski’s 2002 argument, that used the Law of Conservation of Complex Specified Information to argue that one could not achieve CSI if one didn’t start with it, does not work. (I explained why in my 2007 article linked in the 2012 OP above). So it provides no warrant for the view that a Designer must have intervened in the process of evolution once the fitnesses of the genotypes are there.

Your present argument is basically Dembski and Marks’s “active information” argument. My reply to that will be found in 2009 at Panda’s Thumb. Basically the laws of physics and chemistry enforce a certain degree of smoothness on the fitness surface, owing simply to the weakness of long-range interactions. And that is reflected in the empirical behavior of biological systems — a single mutation has a much smaller effect on the organism than the total chaos that would result from changing every base in its DNA genotype simultaneously. If that is due to Design, then it pushes the Designer back to the setting up of the basic laws of physics. And biologists don’t need to worry about that, since they start by taking our universe, as it is, as the starting point and then ask whether how adaptations can happen.

As to where the information comes from, my own guess is that it will turn out to be showable that it comes from changes in entropy as energy flows through biological systems.

(By the way, in the most recent post at Panda’s Thumb, commenter “MaryKaye” has a lovely guest post about how the neural net chess program AlphaZero increases its knowledge of how to play chess, simply by playing itself. It is definitely a gain of information, which was not put into the program by its programmers.)

32. Joe Felsenstein: OK, so I’m pleased that we are agreed that William Dembski’s 2002 argument, that used the Law of Conservation of Complex Specified Information to argue that one could not achieve CSI if one didn’t start with it, does not work.

I am pleased to note that you are misreading what I wrote (that rhymes!) 🙂

NEways, the rest of what follows is similarly incoherent. As I told Tom, please submit a manuscript to Bio-C so we can filter for coherency, and then with good conscience I can take time away from my family to respond to all these misunderstandings.

Otherwise, it’s just one constant game of whac-a-strawman on these blogs 😀

33. You are agreed with me that natural selection can take a population from an initial state in which CSI is not present, and move its mix of genotypes into the specified set, and continue to move them further and further into that set.

I don’t see that this is a misinterpretation of you when you wrote “yes it can” as to whether natural selection could put functional information into the genome.

William Dembski 2002 was trying to show, by his indirect Conservation Law argument, that this could not happen.

34. EricMH: Additionally, even if the selection function contains the CSI, there are limits on how quickly it can be mined. See Winston’s work on mining oracles.

You say that as though it was some sort of problem for evolution. If my quick skim of the linked article is right, their “optimal” search algorithms extract about 1 bit of information per query, while more actual-evolution-like searches extract much less — like 0.1 to 0.0001 bits per query. But that seems like plenty.

The bacterial population of Earth is something in the vicinity of 10^30 (see https://www.nature.com/articles/s41579-019-0158-9). Assuming, say, an average of one reproduction per day (corrections from anyone who knows more about this — Hi, Joe! — are welcome), over say 3 billion years, gives about 10^12 generations since life originated. If the population was at current levels that whole time, that’d be 10^42 bacteria over evolutionary history. It was probably actually lower for at least a fair bit of that time, so say (warning: wild guess) 10^40 total. If we consider each of those bacteria a “query”, and assume 0.0001 bits per query, that means evolution could extract 10^36 bits of information from the fitness function.

That’s a lot. More than enough. Way, way, way more than enough.

It’s also clearly an overestimate, for a number of reasons. The actual fitness landscape isn’t as smooth as a Hamming oracle, a lot of the actual extracted information probably had more to do with adapting to specific circumstances (/chasing moving fitness peaks) than making any sort of global progress, etc.

But in order to make an argument that evolution can’t extract information fast enough, you need an analysis that gives a result showing that evolution can’t extract information fast enough and is realistic enough that its results can be expected to have something to do with reality. An analysis that shows evolution can extract information much faster than needed, but is highly unrealistic, completely fails to fit this requirement.

35. EricMH: You merely shift the CSI to a higher level, i.e. the selection function.

The “selection function” is just the fact that some carriers have higher fitness than others.

So the information comes from the fact that some are better at surviving and reproducing than others. Done, case closed.

36. Gordon Davisson:
But in order to make an argument that evolution can’t extract information fast enough, you need an analysis that gives a result showing that evolution can’t extract information fast enough and is realistic enough that its results can be expected to have something to do with reality. An analysis that shows evolution can extract information much faster than needed, but is highly unrealistic, completely fails to fit this requirement.

I suppose one could pile on constraints until .0001 bits/query begins to fall short of what evolution would need to keep up with changing environments, and then show that the required constraints are not realistic.

As a starting point, we might observe that the mass extinction event people are currently bringing about results from environments changing faster than evolutionary processes can track.

37. Rumraket: The “selection function” is just the fact that some carriers have higher fitness than others.

So the information comes from the fact that some are better at surviving and reproducing than others. Done, case closed.

This.

If Eric wants to argue that the laws of physics were set up just right so that life could evolve by random mutation, genetic drift and natural selection, then fine. Please inform all ID supporters to go pester physicists.

38. Nonlin.org: And what on earth is “Functional Information”?!? Say of some famous painting? Could it be more than clueless evolutionistas and the confused IDistas sharing a gibberish language?

Lol
I have to agree…

Both sides are playing a mathematical game that has no application in biology…
They think that A, C, T and Gs are just pieces of scrabble you can rearrange randomly and, if one day in a billion years, they arrange themselves in an order found in nucleotides found in DNA, new function, or better yet, new body plan is bound to evolve…
They seem to forget that A C T and Gs are actually dynamic chemicals held together by chemical bonds, which are governed by the laws of physics…

This is a perfect example of speculative science that has no application in real biology…

39. J-Mac: They think that A, C, T and Gs are just pieces of scrabble you can rearrange randomly and, if one day in a billion years, they arrange themselves in an order found in nucleotides found in DNA, new function, or better yet, new body plan is bound to evolve…

Thanks, I had no idea I believe that! Can you please let me know what other stupid crap you think we believe?

This site uses Akismet to reduce spam. Learn how your comment data is processed.