Wright, Fisher, and the Weasel

Richard Dawkins’s computer simulation algorithm explores how long it takes a 28-letter-long phrase to evolve to become the phrase “Methinks it is like a weasel”. The Weasel program has a single example of the phrase which produces a number of offspring, with each letter subject to mutation, where there are 27 possible letters, the 26 letters A-Z and a space. The offspring that is closest to that target replaces the single parent. The purpose of the program is to show that creationist orators who argue that evolutionary biology explains adaptations by “chance” are misleading their audiences. Pure random mutation without any selection would lead to a random sequence of 28-letter phrases. There are 27^{28} possible 28-letter phrases, so it should take about 10^{40} different phrases before we found the target. That is without arranging that the phrase that replaces the parent is the one closest to the target. Once that highly nonrandom condition is imposed, the number of generations to success drops dramatically, from 10^{40} to mere thousands.

Although Dawkins’s Weasel algorithm is a dramatic success at making clear the difference between pure “chance” and selection, it differs from standard evolutionary models. It has only one haploid adult in each generation, and since the offspring that is most fit is always chosen, the strength of selection is in effect infinite. How does this compare to the standard Wright-Fisher model of theoretical population genetics?

The Wright-Fisher model, developed 85 years ago, independently, by two founders of theoretical population genetics, is somewhat similar. But it has an infinite number of newborns, precisely N of whom survive to adulthood.  In its simplest form, with asexual haploids, the adult survivors each produce an infinitely large, but equal, number of offspring. In that infinitely large offspring population, mutations occur exactly in the expected frequencies. Each genotype has a fitness, and the proportions of the genotypes change according to them, exactly as expected. Since there are an infinite number of offspring, there is no genetic drift at that stage. Then finally a random sample of N of the survivors is chosen to become the adults of this new generation.

In one case we get fairly close to the Weasel. That is when N = 1, so that there is only one adult individual. Now for some theory, part of which I explained before, in a comment here. We first assume that a match of one letter to the target multiplies the fitness by a factor of 1+s, and that this holds no matter how many other letters match. This is the fitness scheme in the genetic algorithm program by keiths. A phrase that has k matches to the target has fitness (1+s)^k. We also assume that mutation occurs independently in each letter of each offspring with probability u, and produces one of the other 26 possibilities, chosen uniformly at random. This too is what happens in keiths’s program. If a letter matches the target, there is a probability u that it will cease to match after mutation. If it does not match, there is a probability u/26 that it will match after mutation.

The interesting property of these assumptions is that, in the Wright-Fisher model with N = 1, each site evolves independently because neither its mutational process nor its selection coefficients depend on the the state of the other sites. So we can simply ask about the evolution of one site.

If the site matches, after mutation 1-u of the offspring will still match, and u of them will not. Selection results in a ratio of (1+s)(1-u)\,:\,u of matching to nonmatching letters. Dividing by the sum of these, the probability of ending up with a nonmatch is then u divided by the sum, or

    \[ q \ = \ u\big/\left((1+s)(1-u)+u\right) \ = \ u \big/ (1 + s(1-u)) \]

Similarly if a letter does not match, after mutation the ratio of nonmatches to matches is \frac{u}{26}\,:\,1-\frac{u}{26}. After selection the ratio is (1+s)\frac{u}{26}\,:\,1-\frac{u}{26}. Dividing by the sum of these, the probability of ending up with a match is

    \[ p \ = \ \frac{u}{26}(1+s) \bigg/\left( \frac{u}{26}(1+s) + \left(1 - \frac{u}{26}\right)\right) \ = \ u(1+s) \big/ (26 + us) \]

Thus, what we see at one position in the 28-letter phrase is that if it is not matched, it has probability p of changing to a match, and if it is matched, it has probability q of changing to not being matched. This is a simple 2-state Markov process. I’m going to leave out the further analysis — I can explain any point in the comments. Suffice it to say that

1. The probability, in the long run, of a position matching is p/(p+q).

2. This is independent in each of the 28 positions. So the number of matches will, in the long run, be a Binomial variate with 28 trials each with probability p/(p+q).

3. The “relaxation time” of this system, the number of generations to approach the long-term equilibrium, will be roughly 1/(p+q).

4. At what value of s do we expect selection to overcome genetic drift? There is no value where drift does not occur, but we can roughly say that when the equilibrium frequency p/(p+q) is twice as great as the no-selection value of 1/27, that selection is then having a noticeable effect. The equation for this value is a quadratic equation in s whose coefficients involve u as well. If u is much smaller than s, then this value of s is close to 0.44.

5. The probability of a match at a particular position, one which started out not matching, will be after t generations

    \[ P(t) \ = \ \frac{p}{p+q} (1\:-\:(1-p-q)^t) \]

and the probability of a nonmatch given that one started out matching is

    \[ Q(t) \ = \ \frac{q}{p+q} (1\:-\:(1-p-q)^t) \]

I don’t have the energy to see whether the runs of keiths’s program verify this, but I hope others will.

291 thoughts on “Wright, Fisher, and the Weasel

  1. Mung,

    Black box huh. I should code a black box so I can learn.

    No, what you should do is sneer at the very idea you could learn anything from evolutionary computing.

  2. keiths: The smart choice is to use the integer closest to the predicted mean as our equilibrium criterion.

    I don’t think it’s very smart to assume the truth of what is in question. In the figure below, the dashed line is the model prediction.

  3. Tom,

    I don’t think it’s very smart to assume the truth of what is in question.

    Who’s assuming?

    We’re dealing with a Markov process. Joe analyzed it as one, and I implemented it as one in my program. The future behavior of a Markov process depends only on its current state, not on its prior history.

    In run after run of my program, the observed mean gravitates to the predicted mean and stays there, even in runs that last for millions of generations. The histograms display the expected symmetry. The model and the program agree.

    Given all of that, I stand by my statement:

    The smart choice is to use the integer closest to the predicted mean as our equilibrium criterion.

  4. If I use a higher number of survivors, keiths’ program quickly goes to maximum fitness.

    Seems to be a bug in my calculation of fitness. Possibly calculating generation zero.

  5. keiths: We’re dealing with a Markov process. Joe analyzed it as one, and I implemented it as one in my program.

    We are dealing with different Markov processes, plural, than the one that Joe analyzed. The question is how well the infinite-offspring analysis applies to finite-offspring cases.

    keiths: In run after run of my program, the observed mean gravitates to the predicted mean and stays there, even in runs that last for millions of generations. The histograms display the expected symmetry.The model and the program agree.

    Then you’ve done little exploration of the parameter space. I provided you with parameter settings that you have not considered. It’s strange that you should ignore them, and reassert your correctness.

  6. petrushka,

    You’re evidently plotting average values, so we need to know how you’re averaging. Also, it’s as important to know the number of offspring as the number of survivors. The curve does stick close to the mean for infinitely many offspring, \approx 26.34. According to my direct implementation of the finite-population model, it does not for

  7. The wallclock time for the run generating the following four plots was 1 minute, 8 seconds. I’m holding everything but the number of offspring constant.

    ETA: Tried to include all four plots at once. Didn’t work.

    Number of offspring: 10
    ************************************************************************
    Number of sites L : 28
    Number of possible alleles A : 27
    Number of offspring : 10
    Mutation rate u : 0.037037037037
    Selection parameter s : 20.0
    p = u * (1 + s) / (A-1 + u * s) : 0.0290858725762
    q = u / (1 + s * (1 – u)) : 0.0018281535649
    Assume equilibrium at generation: 0
    W-F probability of fit allele at site p/(p+q) : 0.940863297567
    Expected num. of fit alleles in parent Lp/(p+q): 26.3441723319
    Mean observed num. of fit alleles in parent : 19.9324506755
    Std. dev. of observed numbers of fit alleles : 2.17372026547
    ************************************************************************
    Number of offspring: 20
    ************************************************************************
    Number of sites L : 28
    Number of possible alleles A : 27
    Number of offspring : 20
    Mutation rate u : 0.037037037037
    Selection parameter s : 20.0
    p = u * (1 + s) / (A-1 + u * s) : 0.0290858725762
    q = u / (1 + s * (1 – u)) : 0.0018281535649
    Assume equilibrium at generation: 0
    W-F probability of fit allele at site p/(p+q) : 0.940863297567
    Expected num. of fit alleles in parent Lp/(p+q): 26.3441723319
    Mean observed num. of fit alleles in parent : 22.800301997
    Std. dev. of observed numbers of fit alleles : 1.98878876606
    ************************************************************************
    Number of offspring: 30
    ************************************************************************
    Number of sites L : 28
    Number of possible alleles A : 27
    Number of offspring : 30
    Mutation rate u : 0.037037037037
    Selection parameter s : 20.0
    p = u * (1 + s) / (A-1 + u * s) : 0.0290858725762
    q = u / (1 + s * (1 – u)) : 0.0018281535649
    Assume equilibrium at generation: 0
    W-F probability of fit allele at site p/(p+q) : 0.940863297567
    Expected num. of fit alleles in parent Lp/(p+q): 26.3441723319
    Mean observed num. of fit alleles in parent : 23.8999310007
    Std. dev. of observed numbers of fit alleles : 1.85098401771
    ************************************************************************
    Number of offspring: 40
    ************************************************************************
    Number of sites L : 28
    Number of possible alleles A : 27
    Number of offspring : 40
    Mutation rate u : 0.037037037037
    Selection parameter s : 20.0
    p = u * (1 + s) / (A-1 + u * s) : 0.0290858725762
    q = u / (1 + s * (1 – u)) : 0.0018281535649
    Assume equilibrium at generation: 0
    W-F probability of fit allele at site p/(p+q) : 0.940863297567
    Expected num. of fit alleles in parent Lp/(p+q): 26.3441723319
    Mean observed num. of fit alleles in parent : 24.4291057089
    Std. dev. of observed numbers of fit alleles : 1.74624338383
    ************************************************************************

  8. Reminder: To minimize the expected number of function evaluations to hit the target in the straight Weasel s=\infty, set the number of offspring to 20, and the mutation rate to a value close to what we’re looking at here (ETA: 1/27 in my last four plots — I normally go with 1/L). See the bottom curve on the second of DiEb’s two plots here (but keep in mind that he’s giving rates of “lazy” mutation, which changes the character with probability 26/27).

    The earliest of my runs were with small numbers of offspring.

  9. Tom,

    We are dealing with different Markov processes, plural, than the one that Joe analyzed. The question is how well the infinite-offspring analysis applies to finite-offspring cases.

    Yes, and I have already investigated that via my program. My equilibrium criterion is based on both theory and observation. It’s not an assumption, contrary to your characterization.

    keiths:

    In run after run of my program, the observed mean gravitates to the predicted mean and stays there, even in runs that last for millions of generations. The histograms display the expected symmetry.The model and the program agree.

    Tom:

    Then you’ve done little exploration of the parameter space. I provided you with parameter settings that you have not considered. It’s strange that you should ignore them, and reassert your correctness.

    Are you talking about the small-population cases? I’ve avoided those quite deliberately, because we already know they will depart from Joe’s model. I’m trying to approximate Joe’s model, not diverge from it.

  10. Because agreement between the predictions of Joe’s model and the performance of my program increases my confidence in the correctness of both.

    Joe’s model assumes an infinitude of offspring. Using a large number of offspring in my program will approximate that better than using a small number, so predictions (such as the mean Hamming distance) are more likely to match well when I use large numbers.

  11. One should add one prediction. For the Wright-Fisher model in the case of N = 1 the individual positions are independent, so the distribution of matches should be a binomial distribution with 28 trials and probability p/(p+q). This I mentioned before. It is rather annoying to test, but at least the variance (and the standard deviation) can be checked. If P = p/(p+q) the standard deviation should be at equilibrium

        \[ \sigma \ = \ \sqrt{28 P (1 - P)} \]

    If that fits, the distribution wll at least have the right width.

  12. Joe,

    I’ll update the latest version of my program to display both the predicted and observed standard deviations.

  13. Joe Felsenstein,

    3. At what value of s do we expect selection to overcome genetic drift? There is no value where drift does not occur, but we can roughly say that when the equilibrium frequency p/(p+q) is twice as great as the no-selection value of 1/27, that selection is then having a noticeable effect. The equation for this value is a quadratic equation in s whose coefficients involve u as well. If u is much smaller than s, then this value of s is close to 0.44.

    Let N_a denote the size of the alphabet. It appears that you took the approximate value of N_a/(N_a - 1 + u s) to be 1. With u=.001 and s=0.44, this works fine for N_a of at least 20 or so. It works poorly for N_a = 2.

  14. Tom English,

    It works poorly for N_a = 2.

    I didn’t test it for any value except N_a = 27. You’d expect it to work poorly for N_a = 2 because then twice the value of 1/N_a would be 1, which is as high as the equilibrium value could get. Any higher and you’re in real trouble.

  15. Joe Felsenstein: You’d expect it to work poorly for N_a = 2 because then twice the value of 1/N_a would be 1, which is as high as the equilibrium value could get.

    That’s precisely what I expected. Most of what you’ve said holds for all alphabet sizes N_a \geq 2, even though you refer to 27 throughout. It appears that you’ve provided an asymptotic result that works for N_a = 27 because the curve for p/(p+q), a function of N_a and u, converges on 2/N_a rapidly when u is much smaller than constant s=0.44.

  16. A Weasel run showing predicted vs observed mean Hamming distance and standard deviation after 800,000 generations:

  17. Tom English: It appears that you’ve provided an asymptotic result that works for because the curve for a function of and , converges on rapidly when is much smaller than constant .

    My result is not asymptotic for large alphabets. It is exact for alphabet size but is asymptotic in the number of offspring. Of course you can replace 25, 26, and 27 by respectively N_a-2, N_a-1. and N_a and you get the theory for any alphabet size.

  18. keiths,

    The latest run is certainly an excellent fit to the predictions of the model, owing to the large number of offspring (5000) and the length of the run. That is reassuring, suggesting that the theory is working.

  19. Joe,

    It’s gratifying to see a good fit between theory and observation, isn’t it?

    I let the run continue overnight. Here are the results after 10 million generations:

  20. petrushka,

    Keiths, have you tested with more than one survivor?

    Joe’s equations in the OP correspond to the case where the number of survivors is 1, so I don’t even print the predictions when num_survivors > 1.

  21. keiths:
    I’ll start another run with 50,000 offspring per generation.

    Why bother? The theoretical results for an infinite number of offspring were very close to those with 5,000 offspring. Why not try N = 2 surviving adults instead?

  22. Joe Felsenstein: My result is not asymptotic for large alphabets. It is exact for alphabet size but is asymptotic in the number of offspring. Of course you can replace 25, 26, and 27 by respectively N_a-2, N_a-1. and N_a and you get the theory for any alphabet size.

    Three in the morning was not exactly the best time for me to mess with the math. I thought I saw that the approximation improves not only with decreases in u, but also with increases in N_a. Perhaps I’m saying no more than that the solution changes little with increases to N_a. That’s not a terribly interesting observation. But I’m infinitely more interested in general theoretical results than in whacking my Weasel in public.

    I’ll check closely, bimeby.

  23. Joe Felsenstein: Why bother?The theoretical results for an infinite number of offspring were very close to those with 5,000 offspring.Why not try surviving adults instead?

    Why not do another OP, and tell us about the Wright-Fisher model for N=2? I’ve hoped you would get folks to let go of their Weasels, and focus instead on some basics of population genetics. (Fobbing off the monkey/Shakespeare model of cumulative selection as Dawkins’s model of natural selection is a crucial element of ID rhetoric. I’ve long believed that people aid and abet the ID movement when they treat the Weasel as though it were of any importance to the theory of biological evolution. Now I’m in dread of hearing about the Two-Headed Weasel, or some such shit.)

  24. Tom English,

    With N = 2 the theory is mush harder because the distribution of matches in the offspring is no longer independent in different positions. The state space is much bigger. Something could probably be done, but not as easily.

    In the meantime we could look at it empirically with the Wright-Fisher Weasel program,

  25. Tom:

    I’ve long believed that people aid and abet the ID movement when they treat the Weasel as though it were of any importance to the theory of biological evolution. Now I’m in dread of hearing about the Two-Headed Weasel, or some such shit.

    Cheer up, Tom.

    The original Weasel demonstrates the power of cumulative selection, and the modified Weasel shows how drift and selection interact. Those are biologically relevant even if the programs themselves aren’t realistic simulations of biological evolution.

    The Wright-Fisher model is also biologically relevant though not realistic (infinite offspring, non-overlapping generations), yet you’re comfortable talking about it.

    If we had to refrain from discussing any subject that might be misrepresented by someone like Mung, we’d be forced into silence.

    The modified Weasel achieves its purpose. It shows that even with a tiny effective population size, drift needn’t dominate selection. It also provides a nice visual demonstration of the population being pushed toward equilibrium, where drift and selection cancel each other out on average, then wandering stochastically above and below that equilibrium point.

  26. keiths:

    I’ll start another run with 50,000 offspring per generation.

    Joe:

    Why bother? The theoretical results for an infinite number of offspring were very close to those with 5,000 offspring.

    I’m interested in whether the remaining small differences between observed and predicted values are purely stochastic, or whether there is a noticeable systematic component due to the finite population size.

    ETA: I’ve also started a run with two survivors to satisfy your curiosity.

  27. Population genetics theory suggests that two cases that have the same values of Ns, the same values of Nu, and are observed on a time scale whose units are 1 per N generations should behave approximately the same.

    So if we double the number of surviving adults N and halve u and halve s we should get a process that achieves the same distribution of matches, but takes twice as long to do it.

    However this not exact, but is is an asymptotic result. Going from N = 1 to N = 2 might not show a very close fit. But in general these scaling rules work very well once one gets away from just a few individuals, So the rule would work much better going from N = 10 to N = 20.

  28. keiths:
    Disproportionate to what?

    The mean fitness goes from 16 to 27. See Joe’s post just before this one.

    My mean fitness calc doesn’t work with multiple survivors, but the histogram supposedly is unaffected by my add ons.

  29. Tom English: I’ve long believed that people aid and abet the ID movement when they treat the Weasel as though it were of any importance to the theory of biological evolution.

    And they do treat it that way, even though it is of no importance to the theory of biological evolution. I wish they would stop. 🙂

  30. keiths: The original Weasel demonstrates the power of cumulative selection…

    You have never actually shown this to be the case, you’ve merely asserted it to be so. When you were asked for an empirical test of your claim you declined to produce one.

  31. I’m not confident that my code for calculating the mean fitness is correct when there are more than two survivors.

  32. Joe Felsenstein: Did you lower and lower as well as increase ?

    I get nearly equivalent results with
    selection = 5, mutation = .01, survivors = 1
    or
    selection = 1.55, mutation = .005, survivors = 2

  33. petrushka: I get nearly equivalent results with
    selection = 5, mutation = .01, survivors = 1
    or
    selection = 1.55, mutation = .005, survivors = 2

    Then this shows the proportional changes not quite giving the same results. I would predict that for larger numbers of survivors the predicted relationship would work better.

  34. Yes. I see a better match between my N=10 and N=20 runs:

    Generation: 1318300
    Total population size: 5000 Number of survivors per generation: 10
    Selection: ON Coefficient: 0.30 Mutation rate: 0.010
    Mean Hamming distance: 13.42 Standard deviation: 2.20

    Generation: 1599300
    Total population size: 5000 Number of survivors per generation: 20
    Selection: ON Coefficient: 0.15 Mutation rate: 0.005
    Mean Hamming distance: 12.76 Standard deviation: 2.17

  35. And as expected, an even better match between N=100 and N=200:

    Generation: 525400
    Total population size: 5000 Number of survivors per generation: 100
    Selection: ON Coefficient: 0.10 Mutation rate: 0.010
    Mean Hamming distance: 11.20 Standard deviation: 2.05

    Generation: 526000
    Total population size: 5000 Number of survivors per generation: 200
    Selection: ON Coefficient: 0.05 Mutation rate: 0.005
    Mean Hamming distance: 11.27 Standard deviation: 2.03

  36. keiths,

    Would you please check this? I list the number of offspring per parent, so with 1000 offspring per generation for each of 10 parents, there are 10,000 offspring per generation. The mean and standard deviation are for the last 1000 generations.

    I’ll run some of yours, a bit later.

    ************************************************************************
    Number of sites L : 10000
    Number of possible alleles A : 27
    Number of parents : 10
    Number of offspring per parent : 1000
    Mutation rate u : 0.0001
    Selection parameter s : 10000.0
    Mean observed num. of fit alleles in parent : 9999.9998
    Std. dev. of observed numbers of fit alleles : 0.0141407213395
    ************************************************************************
    CPU times: user 6min 38s, sys: 966 ms, total: 6min 39s
    Wall time: 6min 40s

Leave a Reply