A selection-vs-drift version of Weasel

In his endless pursuit of that wascally Weasel, Mung made the following silly claim:

GAs are often used to demonstrate “the power of cumulative selection.” Given small population sizes drift ought to dominate yet in GAs drift does not dominate.

That is clearly false, but for the benefit of Mung (and his cousin Elmer) I have modified my Weasel program to incorporate both drift and selection.  They can now see for themselves that small population sizes are insufficient to guarantee that drift dominates selection.

The code is here. Compile it under Linux using “gcc -std=gnu99 -lm weasel.c -o weasel”.

Run the program and type ‘h’ to see a list of interactive commands:

c – clear the histogram data
f – change the selection coefficient
h – print this help message
m – change the mutation rate
p – pause until a key is pressed
q – quit the program
s – toggle selection on/off
t – change the target phrase

The program generates and updates a scaled histogram showing the number of generations spent at each possible Hamming distance from the target.

222 thoughts on “A selection-vs-drift version of Weasel

  1. Interestingly, a very minor modification would make this locate any 28-letter string from any start point – including a target unknown to the programmer. Which renders the usual guff somewhat moot – ‘it was designed to find that phrase’.

    It would be a simple matter of doing one more iteration of the random initialisation of the population, and using that as the target. All members of the starting population are approximately the same distance from each other as each is from the WEASEL words. It would take about the same time to converge on any combination of characters in a given space of such characters, given that all genomes are viable.

  2. cpp: mb6bfgiq.c:6 Could not find include file sys/select.h
    cpp: mb6bfgiq.c:7 Could not find include file termios.h

    Error mb6bfgiq.c: 271 undeclared identifier ‘TCSANOW’
    Warning mb6bfgiq.c: 271 possible usage of TCSANOW before definition
    Error mb6bfgiq.c: 276 undefined size for ‘incomplete struct termios defined at c:lccexamplesmb6bfgiq.c 267 new_termios’
    Error mb6bfgiq.c: 280 invalid type argument ‘incomplete struct termios defined at c:lccexamplesmb6bfgiq.c 267’ to ‘sizeof’
    Error mb6bfgiq.c: 285 undeclared identifier ‘TCSANOW’
    Error mb6bfgiq.c: 290 cannot initialize undefined ‘incomplete struct timeval defined at c:lccexamplesmb6bfgiq.c 290’
    Error mb6bfgiq.c: 290 skipping `{‘ `0L’ `,’ `0L’ `}’
    Error mb6bfgiq.c: 290 operands of = have illegal types ‘incomplete struct timeval defined at c:lccexamplesmb6bfgiq.c 290’ and ‘incomplete struct timeval defined at c:lccexamplesmb6bfgiq.c 290’
    Error mb6bfgiq.c: 290 undefined size for ‘incomplete struct timeval defined at c:lccexamplesmb6bfgiq.c 290 tv’
    Error mb6bfgiq.c: 291 undeclared identifier ‘fd_set’
    Error mb6bfgiq.c: 291 Syntax error; missing semicolon before `fds’
    Error mb6bfgiq.c: 291 undeclared identifier ‘fds’
    Error mb6bfgiq.c: 527 undeclared identifier ‘TCSANOW’
    Error mb6bfgiq.c: 267 undefined size for ‘incomplete struct termios defined at c:lccexamplesmb6bfgiq.c 267 orig_termios’
    13 errors, 48 warnings

  3. I get that too under windows (MinGW)

    WeaselDriftKeiths.c:6:24: sys\select.h: No such file or directory
    WeaselDriftKeiths.c:7:21: termios.h: No such file or directory

  4. 12.#define POPULATION_SIZE 50000 // total population size

    For a S to be nearly neutral, S should be less than 1/ (4 * Population Size). That would be S = 1 / 200,000 = 0.000005

    but that is merely the maximum threshold, one could have even weaker selection of say
    0.00000005

    For very large genomes this would be likely the case, not for little weasily genomes, but since we have changeable parameters, we can model nearly neutral in light of the above and see if Ohta and Kimura are right.

  5. This code has a large number of (haploid) newborns. At its default settings, only one of them survives to adulthood and is the parent of the next generation. In terms of theoretical population genetics, the effective population size at the default setting is 1, not 50,000. Sorry, keiths, I cannot agree that this program explores the effect of larger effective population sizes, unless you increase the constant for the number of survivors.

    Likewise scordova is wrong about what value of the selection coefficient is needed to have near-neutrality. The approximate calculation is 2Ns = 1 (this model is haploid, so a 2 rather than a 4. So with the number of survivors set to 1, a selection coefficient near 0.5 will be expected to show near-neutrality.

    In any case I believe that the whole point of Weasel experiments is to demonstrate the effect of selection, not the effect of neutrality. They were designed to show that when creationist lecturers tell their audience that evolutionary biologists are saying that adaptations result from purely random processes, they are either mistakenly wrong or deliberately wrong.

    To increase the effective population size in this program, increase the number of individuals that survive to adulthood, the second constant in the initial list of constants. That should result in the N being larger in the calculation of 2Ns, and thus result in smaller values of s being effective.

  6. In Dawkins’s monkey/Shakespeare model of cumulative selection, mutation sometimes changes incorrect characters in the parent phrase to other incorrect characters in the progeny phrase, and leaves the correct characters unchanged. When that happens, the mutant matches the target phrase as well as the parent does, though it is not identical to the parent.

    WETHINKS IT IS LIKE A WEASEL [parent]

    HETHINKS IT IS LIKE A WEASEL [mutant]

    Assuming that none of the “mutant progeny” matches the target phrase perfectly, the mutant above may replace the parent. Where I come from, folks call that drift.

    Even in the case of the (1+1) evolutionary algorithm (one parent, one offspring per generation), the offspring replaces the parent when it is greater than or equal to the parent in fitness. Including equality in the condition enables the algorithm to drift across fitness plateaux.

  7. I think the idea is that people are free to pick their own parameters and explore the consequences for themselves. Although, it is true, the number of survivors does not seem to be part of the interactive list. A simple edit would suffice.

  8. Likewise scordova is wrong about what value of the selection coefficient is needed to have near-neutrality. The approximate calculation is 2Ns = 1 (this model is haploid, so a 2 rather than a 4. So with the number of survivors set to 1, a selection coefficient near 0.5 will be expected to show near-neutrality.

    Wow thanks for the correction!

    This correction was worth the price of admission and Keiths making the new weasel!

    So with a little algebra, S= 1 / ( 2 Ne) is the threshold for near neutral.

    So with the number of survivors set to 1, a selection coefficient near 0.5 will be expected to show near-neutrality.

    That’s is of course presuming the survivor is not chosen by his selection coefficient directly, just a random sampling of 1 individual as the survivor out of a population of 50,000.

  9. All you have to do to get a very good (not optimal) setting of parameters is to identify what works well when only one character in the parent is incorrect.

  10. Joe:

    So with the number of survivors set to 1, a selection coefficient near 0.5 will be expected to show near-neutrality.

    Would we expect that a larger population pool from which the survivor is randomly selected to reduce the likelihood an individual with an S-coefficient of 0.5 will overtake the population.

    If the population pool from which the sole survivor is randomly selection is only 2, then random luck gives a good probability (1.5/2=75%) of being randomly selected rather than from a pool of 50,000 individuals (1.5/50,000).

    Should not population size then be equated to the starting pool (50,000), not the surviving pool (1)?

    For the scenario to reach the desired target, many identical mutations over time have to keep reappearing over and over again since the probability is so low that any one individual with a mutation will be the individual that starts the eventual overtake of the population of 50,000.

  11. stcordova: If the population pool from which the sole survivor is randomly selection is only 2, then random luck gives a good probability (50%) of being randomly selected rather than from a pool of 50,000 individuals (1.5/50,000).

    Should not population size then be equated to the starting pool (50,000), not the surviving pool (1)?

    I’m pretty sure you’re making the same mistake here that you made in your algo. You’re just considering the effect of the selection coefficient of the fittest descendant, and then assuming the rest are equally likely of being selected (selection coefficient = 0 for those, ignoring their fitness). Could be wrong though

  12. As for exploration of parameter space with just one parent per generation, DiEb did a great job of that in 2009 — not by running a program a whole bunch of times, but instead by doing mathematical analysis.

    Detaching the problem from the program, it’s a special case of what’s long been called the OneMax problem. Just about every evolutionary algorithm that’s ever been analyzed has been analyzed for OneMax. So the “tweak it and see what happens” approach is not the way to settle arguments. It’s at best a way to gain some intuition.

    The particulars of the problem are not very important. What really matters is that the characters in the sentence contribute independently to fitness.

  13. Now, I’ve been talking evolutionary computation. If you really want to talk biological evolution, then I don’t understand why you would say anything about the Dawkins’s monkey/Shakespeare model of cumulative selection — which he emphasized was not a model of cumulative natural selection.

  14. Allan,

    Interestingly, a very minor modification would make this locate any 28-letter string from any start point – including a target unknown to the programmer.

    That’s right. The algorithm doesn’t “care” what the specific target phrase is — it just needs to know that it’s getting “warmer” or “colder” as it evaluates variants. In fact, you can use the ‘t’ command to change the target mid-run. The program will start tracking toward the new target, whatever it is.

    Which renders the usual guff somewhat moot – ‘it was designed to find that phrase’.

    Yep.

  15. petrushka,

    It would seem that the code is UNIX specific.

    Yes. It wasn’t clear to me how to do the non-blocking I/O in a Windows environment. If anyone can tell me, and it isn’t too hairy, I’ll implement it with an #ifdef WINDOWS.

  16. Sal,

    For a S to be nearly neutral, S should be less than 1/ (4 * Population Size). That would be S = 1 / 200,000 = 0.000005

    No, what matters is the effective population size, not the total population size. Remember, there is only one survivor per generation with the default settings.

  17. Joe,

    This code has a large number of (haploid) newborns. At its default settings, only one of them survives to adulthood and is the parent of the next generation. In terms of theoretical population genetics, the effective population size at the default setting is 1, not 50,000. Sorry, keiths, I cannot agree that this program explores the effect of larger effective population sizes, unless you increase the constant for the number of survivors.

    I didn’t claim that it explored the effect of larger effective population sizes, though it can. I wrote:

    They [Mung and Elmer] can now see for themselves that small population sizes are insufficient to guarantee that drift dominates selection.

    An effective population size of one is certainly “small”, and with the right parameter settings the program will converge on the target. Drift does not dominate selection.

    Thus Mung’s statement is incorrect:

    GAs are often used to demonstrate “the power of cumulative selection.” Given small population sizes drift ought to dominate yet in GAs drift does not dominate.

  18. petrushka,

    I’m looking for something that the set_conio_terminal_mode(), reset_terminal_mode(), kbhit(), and getch() routines in my code can be easily mapped onto.

  19. keiths:
    Sal,

    No, what matters is the effective population size, not the total population size. Remember, there is only one survivor per generation with the default settings.

    Would this be a somewhat correct explanation of why the number of descendants doesn’t matter?

    Let’s imagine only 2 descendants, one survives (eff. population = 1). One is fitter and at a given selection coefficient is twice as likely to be selected:

    d1: 66%
    d2: 33%

    Now with 4 descendants, and the same average fitness (2 descendants of the same fitness of d1, 2 descendants of the same fitness of d2) and eff population = 1

    d’1: 33%
    d’2: 33%
    d’3: 16%
    d’4: 16%

    Clearly, the probability of picking a descendant of the max fitness is 66% in both cases, so the number of descendants doesn’t matter, effective population does

  20. Tom,

    Yes, there is drift in the original Weasel.

    The point of the new version isn’t simply to implement drift. It’s there already, after all.

    The point is to allow the user to vary the parameters and see the effects on the relative strength of selection and drift.

  21. Allan,

    I think the idea is that people are free to pick their own parameters and explore the consequences for themselves.

    Right.

    Although, it is true, the number of survivors does not seem to be part of the interactive list. A simple edit would suffice.

    It’s an easy change. I’ll add it and repost the code.

  22. Sal,

    That’s is of course presuming the survivor is not chosen by his selection coefficient directly, just a random sampling of 1 individual as the survivor out of a population of 50,000.

    The probability of choosing an individual offspring as the survivor is weighted according to its fitness. I’m using the fitness function that Joe suggested: (1 + s)^m, where s is the selection coefficient and m is the number of loci that match the target phrase.

  23. Probably the notion of “near neutral” is inappropriate since the concept worked with a different set of assumptions than the present assumption of having a population of 50,000 that is culled every generation to one sole survivor that has 50,000 kids.

    That said, with a selection coefficient of 1.0 so that the favored individual has 2 offspring instead of 1 — count me skeptical that this one individual has much of a chance of having its offspring overtake the population especially since it would in that case have only a chance of 1 in

    50,000/2 = 25,000

    …that’s a 1 in 25,000 chance of getting to the next generation. Not great odds.

    Of course one is counting on the fact that the same mutation like “M” in the first part of the target string will keep reappearing, so with enough trials on that position (say on average 25,000 attempts on the assumption we get an “M”, which if would mean really 25,000 x 26 random attempts ), it will finally take.

    We could of course choose to model things differently, but I was arguing that this is a better conception of a slightly biased stochastic process than Dawkins original weasel.

    It teaches the idea of the influence of drift, whereby even an individual with a couple extra kids relative to its peers in a population is likely to have an advantageous trait drift out of the population — the problem of Gambler’s ruin.

  24. Tom,

    So the “tweak it and see what happens” approach is not the way to settle arguments.

    Sure it is, at least in this case.

    Mung says that given the small effective population size, drift should dominate selection. With the right parameter settings, it’s easy to show that he’s wrong.

    And the evidence is cumulative. The longer you let the program run, the stronger the case becomes, as reflected in the histogram.

  25. stcordova: That said, with a selection coefficient of 1.0 so that the favored individual has 2 offspring instead of 1 — count me skeptical that this one individual has much of a chance of having its offspring overtake the population especially since it would in that case have only a chance of 1 in

    50,000/2 = 25,000

    …that’s a 1 in 25,000 chance of getting to the next generation. Not great odds.

    Wait, are you assuming one mutation per generation?

  26. keiths:
    petrushka,
    I’m looking for something that the set_conio_terminal_mode(),reset_terminal_mode(), kbhit(), and getch() routines in my code can be easily mapped onto.

    I’ll look, but don’t have a lot of hope.

    I did a lot of C programming in the late 80s, and we solved the portability problem by purchasing a DOS library of terminal i/o functions. It did not make the code compatible, but it allowed us to write a wrapper.

    there is, apparently, a compiling environment that will allow using unmodified UNIX code, but it’s another major install.

  27. stcordova,

    It teaches the idea of the influence of drift, whereby even an individual with a couple extra kids relative to its peers in a population is likely to have an advantageous trait drift out of the population — the problem of Gambler’s ruin.

    Who’s the ‘gambler’ in this scenario? I don’t really see the parallel.

    And are you suggesting that novel advantageous traits will always drift out of the population?

  28. petrushka,

    I know that kbhit() and getch() were originally DOS functions available in a non-standard include file called <conio.h>.

    You or dazz might want to try the following:

    1. #include <conio.h>
    2. remove the includes of sys/select.h and termios.h
    3. comment out the set_conio_terminal_mode() and reset_terminal_mode() routines, plus the associated calls
    4. comment out this line at the end of main():

    tcsetattr(0, TCSANOW, &orig_termios);

    5. Comment out this line:

    struct termios orig_termios;

    6. Compile.

    I may download Turbo C and try it myself, later.

  29. Tom,

    I’m sure I haven’t paid close enough attention to whatever misunderstanding it is that you’re trying to set straight.

    It’s pretty much summed up by this quote from Elmer Mung:

    GAs are often used to demonstrate “the power of cumulative selection.” Given small population sizes drift ought to dominate yet in GAs drift does not dominate.

    He’ll get that wascally Weasel! Just you wait!

  30. I modified the code to allow the number of survivors to be changed on the fly, using the ‘v’ command.

    The new code is here.

    Compile under Linux using ‘gcc -std=gnu99 -lm weasel.c -o weasel’.

  31. keiths:
    petrushka,

    I know that kbhit() and getch() were originally DOS functions available in a non-standard include file called <conio.h>.

    You or dazz might want to try the following:

    1. #include <conio.h>
    2. remove the includes ofsys/select.h and termios.h
    3. comment out the set_conio_terminal_mode() and reset_terminal_mode() routines, plus the associated calls
    4. comment out this line at the end of main():

    5. Comment out this line:

    6. Compile.

    I may download Turbo C and try it myself, later.

    I will try it, but I can simply fire an Ubuntu in a VM no problem

  32. dazz,

    I will try it, but I can simply fire an Ubuntu in a VM no problem

    The advantage of getting it to compile under Windows is that we can distribute an executable to Windows users who don’t have access to, or aren’t comfortable with, Linux.

    Mac users are on their own.

  33. I don’t have access to a Mac, so I can’t verify that it’ll work.

    However, I’d encourage a savvy Mac user to try it and to post an executable if it does.

  34. keiths:
    dazz,

    The advantage of getting it to compile under Windows is that we can distribute an executable to Windows users who don’t have access to, or aren’t comfortable with, Linux.

    Mac users are on their own.

    Makes sense. Just wanted to make sure you didn’t waste time to accommodate me.
    Why not python then? Something with a simple GUI maybe, if you want to change params on the fly?

  35. Why not python then? Something with a simple GUI maybe, if you want to change params on the fly?

    Why reinvent the wheel when I already have a complete C implementation?

  36. keiths: With the right parameter settings, it’s easy to show that he’s wrong.

    Congratulations. 🙂

    But we already knew from your original program that drift did not dominate even with an effective population size of 1. Something about the selection coefficient being near infinite, iirc.

    And you apparently didn’t read the material I referenced in my original OP.

    In other words, in small populations, the stochastic effects of random genetic drift overcome the effects of selection.

    I’m sure they will be happy to post a retraction if you ask nicely.

  37. Perhaps someone can enlighten me as to whether selection coefficient remains constant in biology.

    It seems to me that if a deleterious mutation becomes more prevalent in a small population, the relative advantage of non carriers increases. Wouldn’t selection advantage vary?

  38. keiths: However, I’d encourage a savvy Mac user to try it and to post an executable if it does.

    Compiles OK (with three warnings) and runs w/o apparent problems on my Mac. Might need to download X-code when prompted by the terminal.

  39. This question is for everyone, not just Mung.

    Mung: But we already knew from your original program that drift did not dominate even with an effective population size of 1. Something about the selection coefficient being near infinite, iirc.

    And you apparently didn’t read the material I referenced in my original OP.

    In other words, in small populations, the stochastic effects of random genetic drift overcome the effects of selection.

    I’m sure they will be happy to post a retraction if you ask nicely.

    Why would you expect analysis of a scientific model to apply to a system that was not modeled?

  40. Sal:

    Just wanted to thank you for your hard work on this. I forgot to say so earlier.

    You’re welcome. I hope you’ll take the results to heart, even though they aren’t what you were hoping for.

  41. petrushka:
    Perhaps someone can enlighten me as to whether selection coefficient remains constant in biology.

    It seems to me that if a deleterious mutation becomes more prevalent in a small population, the relative advantage of non carriers increases. Wouldn’t selection advantage vary?

    There are all sorts of complications in real life. But I think you are saying that even with the relative fitnesses staying the same (so that the ratios of fitnesses of two genotypes remains the same) that the effect of selection would be greater if the frequency of a deleterious allele is greater.

    Is that what you are saying? If not, I am not clear on what you are saying.

Leave a Reply