by Joe Felsenstein and Michael Lynch
The blogs of creationists and advocates of ID have been abuzz lately about exciting new work by William Basener and John Sanford. In a peer-reviewed paper at Journal of Mathematical Biology, they have presented a mathematical model of mutation and natural selection in a haploid population, and they find in one realistic case that natural selection is unable to prevent the continual decline of fitness. This is presented as correcting R.A. Fisher’s 1930 “Fundamental Theorem of Natural Selection”, which they argue is the basis for all subsequent theory in population genetics. The blog postings on that will be found here, here, here, here, here, here, and here.
One of us (JF) has argued at The Skeptical Zone that they have misread the literature on population genetics. The theory of mutation and natural selection developed during the 1920s, was relatively fully developed before Fisher’s 1930 book. Fisher’s FTNS has been difficult to understand, and subsequent work has not depended on it. But that still leaves us with the issue of whether the B and S simulations show some startling behavior, with deleterious mutations seemingly unable to be prevented from continually rising in frequency. Let’s take a closer look at their simulations.
Basener and Sanford show equations, mostly mostly taken from a paper by Claus Wilke, for changes in genotype frequencies in a haploid, asexual species experiencing mutation and natural selection. They keep track of the distribution of the values of fitness on a continuous scale time scale. Genotypes at different values of the fitness scale have different birth rates. There is a distribution of fitness effects of mutations, as displacements on the fitness scale. An important detail is that the genotypes are haploid and asexual — they have no recombination, so they do not mate.
After giving the equations for this model, they present runs of a simulation program. In some runs with distributions of mutations that show equal numbers of beneficial and deleterious mutations all goes as expected — the genetic variance in the population rises, and as it does the mean fitness rises more and more. But in their final case, which they argue is more realistic, there are mostly deleterious mutations. The startling outcome in the simulation in that case is there absence of an equilibrium between mutation and selection. Instead the deleterious mutations go to fixation in the population, and the mean fitness of the population steadily declines.
Why does that happen? For deleterious mutations in large populations, we typically see them come to a low equilibrium frequency reflecting a balance between mutation and selection. But they’re not doing that at high mutation rates!
The key is the absence of recombination in these clonally-reproducing haploid organisms. In effect each haploid organism is passed on whole, as if it were a copy of a single gene. So the frequencies of the mutant alleles should reflect the balance between the selection coefficient against the mutant (which is said to be near 0.001 in their simulation) versus the mutation rate. But they have one mutation per generation per haploid individual. Thus the mutation rate is, in effect, 1000 times the selection coefficient against the mutant allele. The selection coefficient of 0.001 means about a 0.1% decline in the frequency of a deleterious allele per generation, which is overwhelmed when one new mutant per individual comes in each generation.
In the usual calculations of the balance between mutation and selection, the mutation rate is smaller than the selection coefficient against the mutant. With (say) 20,000 loci (genes) the mutation rate per locus would be 1/20,000 = 0.00005. That would predict an equilibrium frequency near 0.00005/0.001, or 0.05, at each locus. But if the mutation rate were 1, we predict no equilibrium, but rather that the mutant allele is driven to fixation because the selection is too weak to counteract that large a rate of mutation. So there is really nothing new here. In fact 91 years ago J.B.S. Haldane, in his 1927 paper on the balance between selection and mutation, wrote that “To sum up, if selection acts against mutation, it is ineffective provided that the rate of mutation is greater than the coefficient of selection.”
If Basener and Sanford’s simulation allowed recombination between the genes, the outcome would be very different — there would be an equilibrium gene frequency at each locus, with no tendency of the mutant alleles at the individual loci to rise to fixation.
If selection acted individually at each locus, with growth rates for each haploid genotype being added across loci, a similar result would be expected, even without recombination. But in the Basener/Stanford simulation the fitnesses do not add — instead they generate linkage disequilibrium, in this case negative associations that leave us with selection at the different loci opposing each other. Add in recombination, and there would be a dramatically different, and much more conventional, result.
Most readers may want to stop there. We add this section for those more familiar with population genetics theory, simply to point out some mysteries connected with the Basener/Stanford simulations:
2. The behavior of their iterations in some cases is, well, weird. In the crucial final simulation, the genetic variance of fitness rises, reaches a limit, bounces sharply off it, and from then on decreases. We’re not sure why, and suspect a program bug, which we haven’t noticed. We have found that if we run the simulation for many more generations, such odd bouncings of the mean and variance off of upper and lower limits are ultimately seen. We don’t think that this has much to do with mutation overwhelming selection, though.
3. We note one mistake in the Basener and Sanford work. The organisms’ death rates are 0.1 per time step. That would suggest a generation time of about 10 time steps. But Basener and Stanford take there to be one generation per unit of time. That is incorrect. However the mutation rate and the selection coefficient are still 1 and 0.001 per generation, even if the generations are 10 units of time.
Joe Felsenstein, originally trained as a theoretical population geneticist, is an evolutionary biologist who is Professor Emeritus in the Department of Genome Sciences and the Department of Biology at the University of Washington, Seattle. He is the author of the books “Inferring Phylogenies” and “Theoretical Evolutionary Genetics”. He frequently posts and comments here.
Michael Lynch is the director of the Biodesign Center for Mechanisms of Evolution at Arizona State University, and author of “The Origins of Genome Architecture” and, with Bruce Walsh, of “Genetics and Analysis of Quantitative Traits”. Six of his papers are cited in the Basener/Stanford paper.
I had assumed that you understood the meaning of the “|” in P(B | A).
And therefore that is the way out of the “everything depends on carbon chemistry!” objection that I alluded to.
Conditional probabilities. Deep-six Bill’s ID-math.
The theory that claims that life’s diversity comes from small random changes over time that are selected in a population when these changes provide reproductive advantage defines evolutionary steps as independent events. This step by step process is nothing like your analogy and the individual selection events work well with the “and” probability equation. The problem is Darwin did not realize many of the changes were happening in large sequence space (DNA).
This supports my last point as Dawkin’s required pre existing knowledge of the target sequence in order to execute the Weasel program. A program that mutated a group of sentences and compared the sentences to the target sentence and selected the closest fit. This process was iterated until the target sentence was reached.
Why is that a problem? Please show that it is a problem. Don’t just claim that it is a problem.
Thanks Joe. You’re response was informative and it motivates me to brush up on pop gen if for nothing else to sharpen my math skills.
I cannot figure out what “last point” you are referring to here. The claim that you disputed was this:
which Weasel demonstrated rather nicely.
If your “last point” is related to the standard creationist talking point on this subject, I will merely point out that it is entirely unnecessary for the mutation and selection algorithm to have “pre-existing knowledge of the target sequence”; that can be housed in an Oracle which returns fitness values to the M&S algorithm.
Doesn’t really affect the claim
does it now?
Well, people generally model the mutations as independent events, but which mutations get to hang around in future generations to be the substrate for further mutations is absolutely NOT independent. So your math is wrong.
I have a minor request: please stop referring to your P(A) x P(B) as the ” “and” probability equation”. Let’s call it what it is: the wrong probability equation, or, if we’re being really charitable, the elementary school probability equation. My girls have been brought up to be reasonably well-behaved, but …
If you have a program that can reach a target sequence without using the target in the search you should start an op and we should cease discussion of Weasel. No one has shown this so far, good luck 🙂
Sure when you are talking about the same gene but thats not what we are comparing and these type of beneficial mutations are rare. Changes to the spliceosome sequence are independent of changes to a transcription factor sequence.
This is a probability equation when you don’t have measurable conditions. Its part of the statistical tool bag and sometimes the right tool for the job. If you can demonstrate it is not in this case I will cease using it in this discussion.
Weasel is trial and error. It is trial and error with a pre defined target as an error correction mechanism.
There’s been a bunch of OPs here on that very subject; here’s one.
Huh? Neither of these points is at all relevant to your math error.
Which ones hang around is NOT independent.
The first two sentences are word salad. Re the final sentence, no, that’s not how math works: if you want to use an approximation to the correct equation, it is incumbent upon you to demonstrate that the approximation is appropriate. Absent that demonstration, you are simply using the WRONG equation. This is High School math. At this point, my daughters are pointing and laughing.
Yes, Weasel is trial and error, using an iterative process with selection feedback and inheritance (feedback is provided by an Oracle that delivers the Hamming distance). It works far, far better than pure random trial and error. It also works with a (slowly) moving target…
Try to get past your embarrassment on this point.
Yes, which is why Dawkins says real evolution isn’t exactly like his WEASEL program. Though it still demonstrates that with selection, evolution is not at all like randomly guessing the entire sequence in a single event.
There are better examples of programs that manifest evolution by natural selection without any targets. Avida is one such example. Selection still works in Avida.
How does this refute the contention that you can not come up with ANY independent event using your twisted logic?
You tried to claim a meteor and me typing were two independent events, but I showed you that both involve the interaction of molecules, for one, which means, according to YOUR weird logic, means they must not be independent events.
So you still have not come up with one.
Saying, E=p?/Q -5 +-5 is not refuting anything DNA.
Regarding the sequence space claim I did not support let me defer so not to derail Joe’s thread any more. I will take a fresh look at Avida. Thanks for the suggestion.
Support that claim? They already supported the claim, multiple times. Rounds and rounds of selection! The 1300 aa protein did not appear from a random assembly of amino-acids. Its sequence is the result of the historical accumulation of successful sequences. Each protein contains a history of successes Bill. Because evolution tends to keep things that work, the process doesn’t have to start from completely random sequences each time. Sequences have already gone through many selection/reproduction stages. Billions of years worth of sequences selected from previously selected sequences from previously selected sequences from previously selected sequences … from previously selected sequences.
Dawkins once said it in rather poetic terms. Something on the lines of “each organism has the history of every success of its ancestors written in their DNA.”
It’s pretty simple to understand. If you actually try understanding.
Oh come on. I think evolutionists never even ponder this problem. What, at one time there were four advantageous sequences? And then five, which were even better? And then six…?
Yea, how would that work?
I understand your passion that you can create genetic information from random change selected for or fixed in populations by reproductive advantage. Respectfully I disagree but would like to table the discussion on this thread as we should be discussion Sanford’s paper which is about the Goldilocks theory of natural selection
Sure you do, but you can’t give any reasons why beyond your own ignorance based personal disbelief.
I do ‘t know about evolutionists, but I have pondered this problem.
Have you heard about reproduction?
Reproduction, selection, recombination, reproduction, selection, recombination, reproduction, selection, recombination, etc.
Then you are mistaken. There is actually an extensive literature on the origin and diversification of proteins. From how did the first proteins originate? (we don’t know), to how did there come to be so many different proteins? (we know this one, it’s evolution).
Bill Basener and John Sanford responded with Part 1 here:
Part 2 of Bill Basener and John Sanford’s response is now posted: