Richard Dawkins’s computer simulation algorithm explores how long it takes a 28-letter-long phrase to evolve to become the phrase “Methinks it is like a weasel”. The Weasel program has a single example of the phrase which produces a number of offspring, with each letter subject to mutation, where there are 27 possible letters, the 26 letters A-Z and a space. The offspring that is closest to that target replaces the single parent. The purpose of the program is to show that creationist orators who argue that evolutionary biology explains adaptations by “chance” are misleading their audiences. Pure random mutation without any selection would lead to a random sequence of 28-letter phrases. There are possible 28-letter phrases, so it should take about different phrases before we found the target. That is without arranging that the phrase that replaces the parent is the one closest to the target. Once that highly nonrandom condition is imposed, the number of generations to success drops dramatically, from to mere thousands.
Although Dawkins’s Weasel algorithm is a dramatic success at making clear the difference between pure “chance” and selection, it differs from standard evolutionary models. It has only one haploid adult in each generation, and since the offspring that is most fit is always chosen, the strength of selection is in effect infinite. How does this compare to the standard Wright-Fisher model of theoretical population genetics?
The Wright-Fisher model, developed 85 years ago, independently, by two founders of theoretical population genetics, is somewhat similar. But it has an infinite number of newborns, precisely of whom survive to adulthood. In its simplest form, with asexual haploids, the adult survivors each produce an infinitely large, but equal, number of offspring. In that infinitely large offspring population, mutations occur exactly in the expected frequencies. Each genotype has a fitness, and the proportions of the genotypes change according to them, exactly as expected. Since there are an infinite number of offspring, there is no genetic drift at that stage. Then finally a random sample of of the survivors is chosen to become the adults of this new generation.
In one case we get fairly close to the Weasel. That is when , so that there is only one adult individual. Now for some theory, part of which I explained before, in a comment here. We first assume that a match of one letter to the target multiplies the fitness by a factor of , and that this holds no matter how many other letters match. This is the fitness scheme in the genetic algorithm program by keiths. A phrase that has matches to the target has fitness . We also assume that mutation occurs independently in each letter of each offspring with probability , and produces one of the other 26 possibilities, chosen uniformly at random. This too is what happens in keiths’s program. If a letter matches the target, there is a probability that it will cease to match after mutation. If it does not match, there is a probability that it will match after mutation.
The interesting property of these assumptions is that, in the Wright-Fisher model with , each site evolves independently because neither its mutational process nor its selection coefficients depend on the the state of the other sites. So we can simply ask about the evolution of one site.
If the site matches, after mutation of the offspring will still match, and of them will not. Selection results in a ratio of of matching to nonmatching letters. Dividing by the sum of these, the probability of ending up with a nonmatch is then divided by the sum, or
Similarly if a letter does not match, after mutation the ratio of nonmatches to matches is . After selection the ratio is . Dividing by the sum of these, the probability of ending up with a match is
Thus, what we see at one position in the 28-letter phrase is that if it is not matched, it has probability of changing to a match, and if it is matched, it has probability of changing to not being matched. This is a simple 2-state Markov process. I’m going to leave out the further analysis — I can explain any point in the comments. Suffice it to say that
1. The probability, in the long run, of a position matching is .
2. This is independent in each of the 28 positions. So the number of matches will, in the long run, be a Binomial variate with 28 trials each with probability .
3. The “relaxation time” of this system, the number of generations to approach the long-term equilibrium, will be roughly .
4. At what value of do we expect selection to overcome genetic drift? There is no value where drift does not occur, but we can roughly say that when the equilibrium frequency is twice as great as the no-selection value of 1/27, that selection is then having a noticeable effect. The equation for this value is a quadratic equation in whose coefficients involve as well. If is much smaller than , then this value of is close to 0.44.
5. The probability of a match at a particular position, one which started out not matching, will be after generations
and the probability of a nonmatch given that one started out matching is
I don’t have the energy to see whether the runs of keiths’s program verify this, but I hope others will.