Richard Dawkins’s computer simulation algorithm explores how long it takes a 28-letter-long phrase to evolve to become the phrase “Methinks it is like a weasel”. The Weasel program has a single example of the phrase which produces a number of offspring, with each letter subject to mutation, where there are 27 possible letters, the 26 letters A-Z and a space. The offspring that is closest to that target replaces the single parent. The purpose of the program is to show that creationist orators who argue that evolutionary biology explains adaptations by “chance” are misleading their audiences. Pure random mutation without any selection would lead to a random sequence of 28-letter phrases. There are possible 28-letter phrases, so it should take about
different phrases before we found the target. That is without arranging that the phrase that replaces the parent is the one closest to the target. Once that highly nonrandom condition is imposed, the number of generations to success drops dramatically, from
to mere thousands.
Although Dawkins’s Weasel algorithm is a dramatic success at making clear the difference between pure “chance” and selection, it differs from standard evolutionary models. It has only one haploid adult in each generation, and since the offspring that is most fit is always chosen, the strength of selection is in effect infinite. How does this compare to the standard Wright-Fisher model of theoretical population genetics?
The Wright-Fisher model, developed 85 years ago, independently, by two founders of theoretical population genetics, is somewhat similar. But it has an infinite number of newborns, precisely of whom survive to adulthood. In its simplest form, with asexual haploids, the adult survivors each produce an infinitely large, but equal, number of offspring. In that infinitely large offspring population, mutations occur exactly in the expected frequencies. Each genotype has a fitness, and the proportions of the genotypes change according to them, exactly as expected. Since there are an infinite number of offspring, there is no genetic drift at that stage. Then finally a random sample of
of the survivors is chosen to become the adults of this new generation.
In one case we get fairly close to the Weasel. That is when , so that there is only one adult individual. Now for some theory, part of which I explained before, in a comment here. We first assume that a match of one letter to the target multiplies the fitness by a factor of
, and that this holds no matter how many other letters match. This is the fitness scheme in the genetic algorithm program by keiths. A phrase that has
matches to the target has fitness
. We also assume that mutation occurs independently in each letter of each offspring with probability
, and produces one of the other 26 possibilities, chosen uniformly at random. This too is what happens in keiths’s program. If a letter matches the target, there is a probability
that it will cease to match after mutation. If it does not match, there is a probability
that it will match after mutation.
The interesting property of these assumptions is that, in the Wright-Fisher model with , each site evolves independently because neither its mutational process nor its selection coefficients depend on the the state of the other sites. So we can simply ask about the evolution of one site.
If the site matches, after mutation of the offspring will still match, and
of them will not. Selection results in a ratio of
of matching to nonmatching letters. Dividing by the sum of these, the probability of ending up with a nonmatch is then
divided by the sum, or
Similarly if a letter does not match, after mutation the ratio of nonmatches to matches is . After selection the ratio is
. Dividing by the sum of these, the probability of ending up with a match is
Thus, what we see at one position in the 28-letter phrase is that if it is not matched, it has probability of changing to a match, and if it is matched, it has probability
of changing to not being matched. This is a simple 2-state Markov process. I’m going to leave out the further analysis — I can explain any point in the comments. Suffice it to say that
1. The probability, in the long run, of a position matching is .
2. This is independent in each of the 28 positions. So the number of matches will, in the long run, be a Binomial variate with 28 trials each with probability .
3. The “relaxation time” of this system, the number of generations to approach the long-term equilibrium, will be roughly .
4. At what value of do we expect selection to overcome genetic drift? There is no value where drift does not occur, but we can roughly say that when the equilibrium frequency
is twice as great as the no-selection value of 1/27, that selection is then having a noticeable effect. The equation for this value is a quadratic equation in
whose coefficients involve
as well. If
is much smaller than
, then this value of
is close to 0.44.
5. The probability of a match at a particular position, one which started out not matching, will be after generations
and the probability of a nonmatch given that one started out matching is
I don’t have the energy to see whether the runs of keiths’s program verify this, but I hope others will.
Here’s the fitness function plotted for s=0.44
For a computer simulation for survivor population of size N, we could do the brute method that is somewhat easy conceptually, probably slow computationally.
The difficulty is that it is desirable to have integer number of individuals, not fractional ones.
Say the surviving population is N=1000
So we have an individual with 27 matches. We give him 18,870 kids for s=0.44. We do the same calculation for the others and come up with a fairly sizable birth population. Then we do a random draw out of that birth population and form the surviving population of N=1000.
That’s how I envision the algorithm will work without any tricks, it just may be a little slow.
Choosing a finite number of adults out of the large, but finite, number of offspring is difficult. keiths did it one way. If one weights the individual offspring by their fitnesses, one can choose by sampling proportional to weights. However the total weight of each genotype then changes some after each individual is drawn.
As the number of offspring gets very large, one approaches the infinite case, where sampling an individual causes no change in the mix of individuals available to sample.
Sal:
Why not just set NUM_SURVIVORS to N in my program, while making sure that POPULATION_SIZE is much, much greater?
Joe:
I thought about that issue when I was writing the code. You could avoid it by segregating the survivor slots from the offspring slots. Then, when sampling offspring, you would copy them to the survivor slots but leave them in the offspring pool where they could potentially be resampled.
In the end, I decided that it didn’t make sense. For a given memory footprint, I think you’re better off dedicating as much of it to offspring as possible, so that the law of large numbers can work its magic.
I am among the most math deficient in this discussion, but the consistent appearance of pretty bell curves makes me smile. Some regular process is afoot. I leave it to joe f to tell me if the logic is correct.
Onemax. Rather thoroughly discussed here.
It appears that the effect of
is attenuated for non-large samples of offspring.
ETA: The sentence length L =100.
selection: 0
**************************************************************
u, s = 0.01 0.0
p, q = 0.005 0.01
Long-run probability of match at site is p/(p+q) = 0.333333333333
Long-run expected fitness is Lp/(p+q) = 33.3333333333
Observed long-run fitness is 33.459282302
Number of offspring is 10
**************************************************************
selection: 1
**************************************************************
u, s = 0.01 1.0
p, q = 0.00995024875622 0.00502512562814
Long-run probability of match at site is p/(p+q) = 0.664440734558
Long-run expected fitness is Lp/(p+q) = 66.4440734558
Observed long-run fitness is 62.8474613932
Number of offspring is 10
**************************************************************
selection: 2
**************************************************************
u, s = 0.01 2.0
p, q = 0.0148514851485 0.00335570469799
Long-run probability of match at site is p/(p+q) = 0.815693430657
Long-run expected fitness is Lp/(p+q) = 81.5693430657
Observed long-run fitness is 75.3551827575
Number of offspring is 10
**************************************************************
selection: 4
**************************************************************
u, s = 0.01 4.0
p, q = 0.0245098039216 0.00201612903226
Long-run probability of match at site is p/(p+q) = 0.923994038748
Long-run expected fitness is Lp/(p+q) = 92.3994038748
Observed long-run fitness is 84.9353405177
Number of offspring is 10
**************************************************************
selection: 8
**************************************************************
u, s = 0.01 8.0
p, q = 0.0432692307692 0.00112107623318
Long-run probability of match at site is p/(p+q) = 0.974745021855
Long-run expected fitness is Lp/(p+q) = 97.4745021855
Observed long-run fitness is 91.5322741918
Number of offspring is 10
**************************************************************
Textual output to follow.
Corresponding to the plot I just posted:
selection: 0
**************************************************************
u, s = 0.01 0.0
p, q = 0.005 0.01
Long-run probability of match at site is p/(p+q) = 0.333333333333
Long-run expected fitness is Lp/(p+q) = 33.3333333333
Observed long-run fitness is 33.157204755
Number of offspring is 20
**************************************************************
selection: 1
**************************************************************
u, s = 0.01 1.0
p, q = 0.00995024875622 0.00502512562814
Long-run probability of match at site is p/(p+q) = 0.664440734558
Long-run expected fitness is Lp/(p+q) = 66.4440734558
Observed long-run fitness is 65.4541717587
Number of offspring is 20
**************************************************************
selection: 2
**************************************************************
u, s = 0.01 2.0
p, q = 0.0148514851485 0.00335570469799
Long-run probability of match at site is p/(p+q) = 0.815693430657
Long-run expected fitness is Lp/(p+q) = 81.5693430657
Observed long-run fitness is 78.2395289412
Number of offspring is 20
**************************************************************
selection: 4
**************************************************************
u, s = 0.01 4.0
p, q = 0.0245098039216 0.00201612903226
Long-run probability of match at site is p/(p+q) = 0.923994038748
Long-run expected fitness is Lp/(p+q) = 92.3994038748
Observed long-run fitness is 88.8115764915
Number of offspring is 20
**************************************************************
selection: 8
**************************************************************
u, s = 0.01 8.0
p, q = 0.0432692307692 0.00112107623318
Long-run probability of match at site is p/(p+q) = 0.974745021855
Long-run expected fitness is Lp/(p+q) = 97.4745021855
Observed long-run fitness is 94.9761137651
Number of offspring is 20
**************************************************************
The long-run fitness is just as predicted, for larger numbers of offspring. But, holding
= 4, and letting the number of offspring range over {8, 16, 32}:
n_offspring: 8
**************************************************************
u, s = 0.01 4.0
p, q = 0.0245098039216 0.00201612903226
Long-run probability of match at site is p/(p+q) = 0.923994038748
Long-run expected fitness is Lp/(p+q) = 92.3994038748
Observed long-run fitness is 82.8873458505
Number of offspring is 8
**************************************************************
n_offspring: 16
**************************************************************
u, s = 0.01 4.0
p, q = 0.0245098039216 0.00201612903226
Long-run probability of match at site is p/(p+q) = 0.923994038748
Long-run expected fitness is Lp/(p+q) = 92.3994038748
Observed long-run fitness is 87.9172314187
Number of offspring is 16
**************************************************************
n_offspring: 32
**************************************************************
u, s = 0.01 4.0
p, q = 0.0245098039216 0.00201612903226
Long-run probability of match at site is p/(p+q) = 0.923994038748
Long-run expected fitness is Lp/(p+q) = 92.3994038748
Observed long-run fitness is 89.634596156
Number of offspring is 32
**************************************************************
Here the alphabet size is still 3, and the sentence length is still 100. Assuming I haven’t fouled up the implementation of the model, it’s kind of interesting to see small samples attenuate the effect of increasing
.
Some Weasel results.
I’ve held the mutation rate at u=1/28, and number of offspring at 1000. I throw out the first 2000 generations in calculation of mean (“observed long-run”) fitness. I’ve added corresponding standard deviations at the bottom of this comment.
selection (s): 0
**************************************************************
u, s = 0.01 0.0
p, q = 0.000384615384615 0.01
Long-run probability of match at site is p/(p+q) = 0.037037037037
Long-run expected fitness is Lp/(p+q) = 1.03703703704
Observed long-run fitness is 1.04495872491
Number of offspring is 1000
Sentence length is 28
Alphabet size is 27
**************************************************************
selection (s): 2
**************************************************************
u, s = 0.01 2.0
p, q = 0.00115295926211 0.00335570469799
Long-run probability of match at site is p/(p+q) = 0.255720823799
Long-run expected fitness is Lp/(p+q) = 7.16018306636
Observed long-run fitness is 7.2933745574
Number of offspring is 1000
Sentence length is 28
Alphabet size is 27
**************************************************************
selection (s): 4
**************************************************************
u, s = 0.01 4.0
p, q = 0.00192012288786 0.00201612903226
Long-run probability of match at site is p/(p+q) = 0.487804878049
Long-run expected fitness is Lp/(p+q) = 13.6585365854
Observed long-run fitness is 13.2219977347
Number of offspring is 1000
Sentence length is 28
Alphabet size is 27
**************************************************************
selection (s): 16
**************************************************************
u, s = 0.01 16.0
p, q = 0.00649847094801 0.000593824228029
Long-run probability of match at site is p/(p+q) = 0.916271924209
Long-run expected fitness is Lp/(p+q) = 25.6556138779
Observed long-run fitness is 25.57621861
Number of offspring is 1000
Sentence length is 28
Alphabet size is 27
**************************************************************
Addition. Here are standard deviations of fitness for the four runs, throwing out the first 2000 generations (as I did for the means, i.e., “observed long term fitness,” above).
s=0, std dev=1.0249881097232614,
s=2, std dev=2.2326912542112818,
s=4, std dev=2.8600378859108697,
s=8, std dev=1.5048244544570109
So it converges to the expected fitness within a few % points in just 2000 generations? Damn, blind chance is sort of predictable, huh?
In what way is this blind chance being predictable?
Can someone explain to me, why is all of this not just a model of directed evolution? Surely it is set up to be directed towards a target?
I’m sorry Charlie, we’ve been discussing GA for months and that nonsense has been dealt with time and again. Do your homework and read the previous threads on GA’s.
In much the same way that radioactive decay is predictable in the aggregate while being very much unpredictable as far as individual atoms are concerned, or in much that same way that casinos can reliably make a steady income off of games that are solidly founded on blind chance.
I’ve posted my Python code — all 100 lines, including an example of how to use it — at my blog, Bounded Science. There is still a big hole in my testing. I haven’t yet varied the mutation rate, while holding all other parameters constant. But I’m pretty comfortable sharing the code with you now.
Here are plots of another four runs, with a horizontal line added to each to show the mean fitness (first 5000 generations excluded). I’d add four other lines to show the Wright-Fisher predictions, if not for the fact that three of them are on top of the lines you see. There’s a discernible deviation for the red plot, which should not be surprising, because you can see that there’s greater variability in it than in the others.
Thanks for sharing the code Tom.
Maybe our creationist friends can answer how can it calculate the fitness of each generation without a single reference to the target string in it, LOL
Tom, your testing seems very thorough. One phrase I’d modify is “long-run expected fitness”. The quantity you are computing is the long-run expected number of matches. Since the fitness is a nonlinear function of the number of matches the expectation of the fitness is different from that.
In the theory with an infinite number of offspring and one surviving adult, the matches at one position are independent of the matches at another. At each position the fitness is
if there is a match,
if there is not. So the expected fitness is
. The random variables at each position being independent, the expected value of the product of fitnesses across positions is the product of the expectations. So the expected fitness is
.
When the number of offspring is not infinite, there will be small random departures from independent distribution across positions. They do not persist from one generation to another. The expected fitness overall will still be the product of expectations of at then individual positions, but those are no longer exactly
.
I also hope that at some point the other prediction of the theory will be tested: that the number of matches will follow a binomial distribution. At long-run equilibrium this will be a binomial distribution with 28 trials and probability of success equal to the long-term expectation. With an infinite number of offspring this probability will be
.
However the outcomes will not be independent from generation to generation. That depends on the mutation rate — when
is small successive generations are autocorrelated. So testing fit of the histograms may be hard to do unless one chooses samples well-separated in time.
I have an unrelated question for the experts. It’s again about additive vs geometric fitness functions. A few posts back Prof. Felsenstein pointed out that the correct fitness function for the additive case is
.
and
matches, tends towards
(s cancels out as noted by keiths). It seems mathematically impossible to have an additive fitness model accurately deal with a selection coefficient if the best that one can do is
. What use is this to model selection?
The question is, with this function, as s tends towards infinity, the relative fitness of two genotypes with
I’m probably missing something obvious again
Got it. You can see from the label on the y-axis of my last plot that I was trying to keep this straight. I’m accustomed to thinking of this somewhat differently, and I’m muddying the waters here. I’ve got a warning up at my blog, not to copy the code right now. I’ll change some identifiers (hopefully without breaking anything).
CharlieM, the whole point of the Weasel algorithm is to show that evolutionary theory is not a theory of evolution “by chance”. dazz’s remark is sarcastic, which you missed.
The Weasel refutes the statements in many lectures by creationists, statements that assure their audiences that evolutionary biologists explain adaptations as occurring “by chance”. If their audience believes that statement, they will of course immediately reject evolutionary biology as nonsense. After all, random genotypes will have such a small probability of producing exquisitely adapted phenotypes, that we will not ever see anything like the adaptations that we do see. The Weasel is a teaching demonstration showing that the processes evolutionary biologists envision are very, very far from random. That the statements of the creationist debaters are massively misleading.
You seem to have countered with the objection that the Weasel model is of processes that are very, very far from random. Well, um yes, of course.
Yes, that is a problem with the additive model. The multiplicative model does not have that problem, as that ratio is then
which becomes infinite as
becomes infinite, if
.
This seems to match keiths’ results. With s=5, he gets a mean of approximately 12.
Thank you, and also thanks to keiths again for putting me in the right track
So keith’s algo modeling the process matches Tom’s algo running the mathematical model. Sweet.
Now to test the binomial, wouldn’t it be enough to use a large L, and keep running keiths’ algo to see if the number of matches converges to
?
Woops, notational bungle. In my own private notes I called the expected probability of matches
, but in this thread I called the long-term expectation
The expectation after time
was called
. Those quantities should replace
.
Hi Tom,
Any restrictions on the use of your code or do you consider ti public domain?
It would be nice if we could all agree on using the same language here, even if it is Python. 😉
No problems compiling Python, R or Ruby on Windows.
I ran with s=5, and with a mutation rate of 1/28. The mean number of matching sites is about 16.
************************************************************************
selection (s): 5
************************************************************************
Sentence length L : 28
Alphabet size A : 27
Number of offspring : 1000
Mutation rate u : 0.0357142857143
Selection parameter s : 5.0
p = u * (1 + s) / (A-1 + u * s) : 0.00818553888131
q = u / (1 + s * (1 – u)) : 0.00613496932515
Assume equilibrium at generation: 5000
Theoretical prob. of match at site p/(p+q) : 0.571595558153
Expected number of matching sites Lp/(p+q) : 16.0046756283
Mean of observed numbers of matching sites : 15.8920642941
Std. dev. of observed nums. of matching sites: 2.61846152285
************************************************************************
CharlieM,
It gets towards the target by means of variation and selection of genotypes in the current population. The programmer does not direct it towards the target. Indeed, there would be no point in writing GAs for problem solution if it were simply a matter of specifying a target and directing the program to find it.
Consider a small modification – instead of distance from target, evaluate fitness by adding up the ASCII bits. Those with the greater sum are fitter than those with a lesser. There is no mention of a distant target – although it is clear that the program will converge on a string of all Zs, the program doesn’t know this. It is not drawn by, nor directed towards, that target. It is simply doing generational evaluation of fitter and less fit genotypes in the current population.
Well, you shouldn’t post it somewhere else. But I intend for you to use it. However, I’m not totally satisfied with my testing yet. It would be good if Keith matches my results with his radically different approach. What I really want to do is some mathematical analysis that covers the small-sample case. I’ll believe in my code and my math when I get them to agree.
Generation: 5090 Number of survivors per generation: 1
Selection: ON Coefficient: 5.00 Mutation rate: 0.03
Organism 0 Fitness 1.31e+010 Hamming distance 15
MEZHJYBSJIT IQDTGWE J OCASGD
Histogram of Hamming distances:
0: 0
1: 0
2: 0
3: 0
4: 0
5: 16 X
6: 32 XX
7: 118 XXXXXXXXX
8: 283 XXXXXXXXXXXXXXXXXXXXXXX
9: 313 XXXXXXXXXXXXXXXXXXXXXXXXXX
10: 411 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
11: 534 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
12: 442 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
13: 488 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
14: 718 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
15: 634 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
16: 354 XXXXXXXXXXXXXXXXXXXXXXXXXXXXX
17: 230 XXXXXXXXXXXXXXXXXXX
18: 180 XXXXXXXXXXXXXXX
19: 129 XXXXXXXXXX
20: 74 XXXXXX
21: 37 XXX
22: 6
23: 11
24: 17 X
25: 24 XX
26: 2
27: 5
28: 32 XX
Keiths’ program rounds off mutation rate. It’s either .03 or .04.
He said he was working on a test version that would cycle through the parameters and report only the mean and SD.
Yes, running the numbers
p=0.00818553888130968622100954979536
q=0.00613496932515337423312883435583
P(match) = p/(p+q) = 0.57159555815312682641729982466394
P(match) * 28 = 16.00467562828755113968439509059
I’m not sure that’s the issue. With u=.03 I get 16.04571429
He’s printing Hamming distance, that should be 28 – #of_matches, so 28 -16 = 12
keiths’ program does not seem particularly sensitive to mutation rate. I think that’s to be expected.
So a hamming distance of 12 is the same as 16 matches. Does that mean Tom’s and keiths’ programs are reporting the same results?
I’m thinking more along the lines of converting it into another language. I do have Python installed, I should probably just bite the bullet and use it. If I did convert it to another language I can’t see myself posting it anywhere but here. Just thought I’d ask first.
Could be, I think I was missing a key aspect that should have been obvious: even if the long term average tends towards p/(p+q) the number of matches doesn’t really converge. I guess it’s going to drift and hover around the expected value, so sampling is necessary.
keiths’ algo being at 13, 14 or 15 matches at any point in time is irrelevant, it’s the long term average that should converge… unless I’m making some other mistake again.
And yes, the mutation rate is almost irrelevant bellow a certain point if my spreadsheet doesn’t lie. The number of matches tends to 16.25806452 as u tends to zero. At u=0.1 it’s 15.52.
petrushka,
Did I reverse it correctly? What I’ve loaded into the Python interpreter sums to 5090, which matches the generation you gave me. So it’s a problem, I believe, that you’re not discarding the values from before the system reaches equilibrium. That’s how I account for the extra mass on the left (the earlier values are lower than the later ones). I’ve always thrown out at least the first 2000 generations, just to be on the safe side.
I’ll see, in a moment, if I get something that looks like this for the first 5090 data points.
[ 32, 5, 2, 24, 17, 11, 6, 37, 74, 129, 180, 230, 354,
634, 718, 488, 442, 534, 411, 313, 283, 118, 32, 16, 0, 0,
0, 0, 0]
Joe,
I get bell-shaped histograms for intermediate values of s. But I don’t know what to make of the extremes.
ETA: I fouled up the title. It should say u=1/L.
The histograms I just posted correspond to the top and bottom curves here.
Does this mean that the number of matches for s=5 hovers about 16 matches and doesn’t keep evolving towards the target?
This corresponds to the red curve above.
You can’t see it in the red histogram (s=5), but there are 5 occurrences of 26. In 100 thousand generations, there were always 2 or more non-matching sites among the 28 sites.
Generation: 5070 Number of survivors per generation: 1
Selection: ON Coefficient: 5.00 Mutation rate: 0.036
Mean fitness: 14.93 Standard Deviation: 2.594715
Not throwing away first 2000.
I added the following (plus declarations)
if (generation > 2000){
X = (double)genome_array[0].matches;
X_sum += X;
M = X_sum / ((double)generation - 2000) ;
X -= M;
X_M_sqr += X * X;
S_dev = sqrt(X_M_sqr / ((double)generation - 2000));
}
Thanks, I’m trying to install dateutils to run your script and see it in action
Throwing away the first 2000:
Generation: 7010 Number of survivors per generation: 1
Selection: ON Coefficient: 5.00 Mutation rate: 0.036
Mean fitness: 16.04 Standard Deviation: 2.70
Looks pretty close to me.
petrushka,
Here’s the histogram for the first 5090 generations of the red run (s=5) pictured above. The mean is just 15.14, contrasting with 15.97 when I throw out the first 5000 generations. My histogram is smoother, which leads me to ask also how many offspring there are per generation in your run. I’ve gone with 1000 here.