A selection-vs-drift version of Weasel

In his endless pursuit of that wascally Weasel, Mung made the following silly claim:

GAs are often used to demonstrate “the power of cumulative selection.” Given small population sizes drift ought to dominate yet in GAs drift does not dominate.

That is clearly false, but for the benefit of Mung (and his cousin Elmer) I have modified my Weasel program to incorporate both drift and selection.  They can now see for themselves that small population sizes are insufficient to guarantee that drift dominates selection.

The code is here. Compile it under Linux using “gcc -std=gnu99 -lm weasel.c -o weasel”.

Run the program and type ‘h’ to see a list of interactive commands:

c – clear the histogram data
f – change the selection coefficient
h – print this help message
m – change the mutation rate
p – pause until a key is pressed
q – quit the program
s – toggle selection on/off
t – change the target phrase

The program generates and updates a scaled histogram showing the number of generations spent at each possible Hamming distance from the target.

222 thoughts on “A selection-vs-drift version of Weasel

  1. keiths:
    petrushka,
    Give this code a try.To enable your DOS/Windows modifications, you’ll need to add a ‘#define WINDOWS’ directive.

    Worked first try. With the default selection coefficient it sees to get stuck, or at least slow down.

    I need a brief description of what the histogram is showing. I find it confusing that it does not self clear.

  2. Now I have a 64 bit executable. tomorrow I will make a 32 bit exe.

    Is there a place to upload them?

  3. keiths: To approximate a Wright-Fisher scenario. See this comment by Joe.

    Joe F:

    Since the adult of the next generation is simply randomly chosen from the newborn pool once that pool has had its allele frequencies change owing to natural selection. The probability that a non-matching site changes to be matching is then the above quantity.

    In your program the adult of the next generation is not randomly chosen from the newborn pool. Your program has a single adult chosen not randomly but rather specifically, from which the newborn pool is then generated.

    I ask again, what was the point of changing the population size and leaving the effective population size untouched?

  4. petrushka,

    Worked first try.

    Excellent!

    With the default selection coefficient it sees to get stuck, or at least slow down.

    Yes. With that selection coefficient, the equilibrium Hamming distance is far from a perfect match. In my runs, it’s 12.

    I need a brief description of what the histogram is showing. I find it confusing that it does not self clear.

    The Hamming distance is a measure of how close the parent phrase is to the target phrase at any point in time. A Hamming distance of 0 means there is a perfect match. A Hamming distance of 1 means there is a one-character mismatch, and a Hamming distance of 28 means there are no matches.

    Over time, the parent phrase will “wander” closer to the target, then away, then closer again. At each generation, the program notes the Hamming distance of the new parent and increments the corresponding bin in the histogram.

    The peak in the histogram will occur where there is an equilibrium between (average) drift and (average) selection.

    Now I have a 64 bit executable. tomorrow I will make a 32 bit exe.

    Is there a place to upload them?

    I haven’t done it myself. Some places won’t allow you to directly upload .exe files, so you might need to change the extension before uploading and then give instructions for renaming after download.

  5. Mung:

    If s=1 indicates complete lethality what does s=5 indicate? 5x deader than dead?

    There’s more than one kind of selection coefficient, Mung. Here’s Joe’s description of the relevant kind, from his online PopGen book:

    SELECTION COEFFICIENTS. Rather than representing relative fitnesses as wA : wa, it is frequently convenient to write them as 1+s : 1. The quantity s will be zero when there is no natural selection, and is referred to as the selection coefficient favoring A. It can take any value from -1 to ∞.

  6. Mung, describe in your own words what pick_survivor does.

    Keiths, for anyone running on Windows in a DOS box you might need to set the box to a smaller font size. It’s best if you open cmd first, rather than just double clicking on the exe. The box does not automatically stretch to display all the lines.

  7. Mung,

    In your program the adult of the next generation is not randomly chosen from the newborn pool.

    Yes, it is.

    You don’t understand the code, do you?

  8. keiths: The Hamming distance is a measure of how close the parent phrase is to the target phrase at any point in time.

    How does the program know the target? How do you calculate the Hamming distance to an unknown target?

  9. I’ll let Joe F analyze the math. The program behaves well and does what seems reasonable.

  10. Keiths, for anyone running on Windows in a DOS box you might need to set the box to a smaller font size. It’s best if you open cmd first, rather than just double clicking on the exe. The box does not automatically stretch to display all the lines.

    Okay. Mods, could one of you give me edit permissions on the OP so that I can add tips like this? I also want to update the code link to point to the latest version.

  11. Mung:

    How does the program know the target? How do you calculate the Hamming distance to an unknown target?

    petrushka:

    Mung, the code is right there. Take a look.

    Do some work of your own, Mung. We’re not here to spoon-feed you.

  12. keiths: Do some work of your own, Mung. We’re not here to spoon-feed you.

    You are claiming that your revised weasel code demonstrates that my claim was false. It doesn’t.

  13. Mung,

    I’ve already tested my code, and I continue to test it as we go along.

    But here’s your chance to shine. Write your own tests and find a fatal bug that, when fixed, shows that you were right all along and that

    Given an effective population size of one, drift ought to dominate, but it doesn’t.

    Good luck.

    You do understand the program, don’t you?

  14. petrushka: Mung, the code is right there. Take a look.

    I have the before and after versions of the code. I am looking at the code. Both versions of the code. If you have some other version of the code that contradicts anything I have posted about the code, do say so.

  15. keiths: There’s more than one kind of selection coefficient, Mung.

    So? Which kind of selection coefficient does your program use and why did you choose that particular kind of selection coefficient?

  16. Mung the code itself contradicts your claim regarding pick_survivors.

    If you disagree, simply write out your analysis of how it works.

  17. petrushka,

    Mung the code itself contradicts your claim regarding pick_survivors.

    If you disagree, simply write out your analysis of how it works.

    Yes, please. I’ll get some popcorn.

  18. petrushka: Mung the code itself contradicts your claim regarding pick_survivors.

    If this is in fact the case, you can write a test to demonstrate that your claim is true.

  19. I too enjoy popcorn. Eating popcorn isn’t the best way to demonstrate that popcorn doesn’t exist.

  20. keiths: This is an excellent Mung meltdown. Where’s Rich?

    keiths has no evidence to support his claims and calls out to Richardthughes for validation. Excellent!

  21. Mung: If this is in fact the case, you can write a test to demonstrate that your claim is true.

    So you don’t need to do anything to demonstrate your claim, but claims about your claims have to be demonstrated?

    Can’t say I’m totally surprised.

  22. Mung: keiths has no evidence to support his claims and calls out to Richardthughes for validation. Excellent!

    Apart from, as already noted, the code itself.

    As you are unwilling to explain the code, as you see it, the joke is on you.

  23. Mung: If you have some other version of the code that contradicts anything I have posted about the code, do say so.

    Heh. Misdirect the people questioning your understanding to an irrelevant point. Check!

  24. Generation: 25280 Number of survivors per generation: 20

    Selection: ON Coefficient: 0.50 Mutation rate: 0.01

    Organism 0 Fitness 3.79e+004 Hamming distance 2

    METHI KS IT IS LIKJ A WEASEL

    Histogram of Hamming distances:

    0: 1372205943417473106 X
    1: 864712057832082567
    2: 366790486251605232
    3: 201921448
    4: 4672670202244390967
    5: 36
    6: 13
    7: 0

  25. keiths:
    This is an excellent Mung meltdown.Where’s Rich?

    Fighting a couple of fires at work.

    First up, excellent code – job well done.

    I’ve been trying to be less confrontational with Mung, I keep hoping that when we do these sort of things he’ll engage and embark on a journey of meaningful discovery. Instead we’ve had a Gallienesque “test your code!” again. Work through it, Mung. If it works, that’s okay!

  26. Mung: keiths has no evidence to support his claims and calls out to Richardthughes for validation. Excellent!

    You’re becoming Joe G. Stop it, if only for your own sake! Also, you can call me Rich if you’d like.

  27. petrushka,

    After I complete the CAPTCHA, I get the following message (in a very small font) with no actual download:

    File Details:
    Filename: keiths.exe
    Size: 0 KB, Type: exe

    However, I can download the source code.

  28. Weasel version 02232016

    Press ‘h’ at any time for help

    Target Phrase: METHINKS IT IS LIKE A WEASEL

    Generation: 17690 Number of survivors per generation: 1

    Selection: ON Coefficient: 5.00 Mutation rate: 0.01

    Organism 0 Fitness 2.82e+012 Hamming distance 12

    YQTHIVXH QQ W LIMW AEWEASEL

    Histogram of Hamming distances:

    0: 0
    1: 0
    2: 0
    3: 0
    4: 0
    5: 4
    6: 193 XXXX
    7: 664 XXXXXXXXXXXXXXX
    8: 1260 XXXXXXXXXXXXXXXXXXXXXXXXXXXXX
    9: 1445 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
    10: 1962 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
    11: 2400 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
    12: 2580 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
    13: 2464 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
    14: 1810 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
    15: 1109 XXXXXXXXXXXXXXXXXXXXXXXXX
    16: 769 XXXXXXXXXXXXXXXXX
    17: 419 XXXXXXXXX
    18: 167 XXX
    19: 154 XXX
    20: 37
    21: 28
    22: 77 X
    23: 12
    24: 9
    25: 31
    26: 8
    27: 49 X
    28: 39

    Weasel version 02232016

    Press ‘h’ at any time for help

    Target Phrase: METHINKS IT IS LIKE A WEASEL

    Generation: 17690 Number of survivors per generation: 2

    Selection: ON Coefficient: 5.00 Mutation rate: 0.01

    Organism 0 Fitness 1.02e+021 Hamming distance 1

    METHINKS IT IS LIKE A WEASEC

    Histogram of Hamming distances:

    0: 3131 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
    1: 5048 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
    2: 5487 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
    3: 2573 XXXXXXXXXXXXXXXXXXXXXXXXXXXX
    4: 613 XXXXXX
    5: 373 XXXX
    6: 143 X
    7: 9
    8: 25
    9: 5
    10: 12
    11: 47
    12: 30
    13: 28
    14: 6
    15: 53
    16: 12
    17: 36
    18: 3
    19: 0
    20: 5
    21: 1
    22: 1
    23: 4
    24: 11
    25: 13
    26: 9
    27: 12
    28: 0
    Weasel version 02232016

    Press ‘h’ at any time for help

    Target Phrase: METHINKS IT IS LIKE A WEASEL

    Generation: 17690 Number of survivors per generation: 2

    Selection: ON Coefficient: 2.50 Mutation rate: 0.01

    Organism 0 Fitness 3.26e+012 Hamming distance 5

    METHISKS IEJIS LIKEGA DEASEL

    Histogram of Hamming distances:

    0: 0
    1: 0
    2: 165 XX
    3: 1373 XXXXXXXXXXXXXXXXXXXXXXXX
    4: 1948 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
    5: 3198 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
    6: 3408 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
    7: 2939 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
    8: 1805 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
    9: 1180 XXXXXXXXXXXXXXXXXXXX
    10: 846 XXXXXXXXXXXXXX
    11: 389 XXXXXX
    12: 94 X
    13: 25
    14: 7
    15: 28
    16: 9
    17: 22
    18: 13
    19: 68 X
    20: 70 X
    21: 16
    22: 25
    23: 3
    24: 6
    25: 32
    26: 21
    27: 0
    28: 0

    Weasel version 02232016

    Press ‘h’ at any time for help

    Target Phrase: METHINKS IT IS LIKE A WEASEL

    Generation: 17690 Number of survivors per generation: 4

    Selection: ON Coefficient: 2.50 Mutation rate: 0.01

    Organism 0 Fitness 1.71e+015 Hamming distance 0

    METHINKS IT IS LIKE A WEASEL

    Histogram of Hamming distances:

    0: 13026 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
    1: 3779 XXXXXXXXXXXXXXXXX
    2: 406 X
    3: 136
    4: 40
    5: 15
    6: 20
    7: 10
    8: 35
    9: 7
    10: 14
    11: 5
    12: 5
    13: 9
    14: 27
    15: 40
    16: 23
    17: 7
    18: 3
    19: 4
    20: 5
    21: 10
    22: 13
    23: 9
    24: 1
    25: 2
    26: 16
    27: 4
    28: 19

  29. petrushka,

    OK, so it appears that the effectiveness of selection in the program now depends on selection coefficient and effective population size. Was that your point?

  30. John Harshman:
    petrushka,
    OK, so it appears that the effectiveness of selection in the program now depends on selection coefficient and effective population size. Was that your point?

    Math is not my strong point. I’m depending on you guys to interpret this.

    The obvious answer to your question is that the program does what I expected it to do when I changed the parameters. Whether it is a useful program is above my pay grade.

  31. petrushka,

    On second thought I’m not so sure that I understand the printout. What is the histogram a histogram of, that is what are the hamming distances of? What is organism 0? If that’s the starting point, then none of the runs seem to be showing selection, just random diffusion at a greater or lesser rate, with the mode still close to the start.

  32. John Harshman: On second thought I’m not so sure that I understand the printout.

    I asked keiths about that earlier, and he responded.

    Here’s my understanding.

    The numbers to the left of the semicolon are distance from the target.

    The numbers to the left of the Xs are the number of times the survivor/parent has been that far from the target. The histogram represents the relative frequency for any given distance from the target.

    If you let it run a long time it resembles a bell curve.

    With a low coefficient, it is supposed to wander. That’s the point. Drift prevents perfect and permanent achievement of the target. By changing the parameters, you can make it behave like Dawkins weasel.

  33. John:

    On second thought I’m not so sure that I understand the printout. What is the histogram a histogram of, that is what are the hamming distances of? What is organism 0? If that’s the starting point, then none of the runs seem to be showing selection, just random diffusion at a greater or lesser rate, with the mode still close to the start.

    With the default parameter settings, ‘organism 0’ refers to whatever organism is chosen to be the parent of the next generation. If the number of survivors is changed to something other than one using the ‘v’ command, then organism 0 is just one of the survivors. The other survivors are not displayed, though that can be changed by modifying the GENOMES_TO_DISPLAY parameter and recompiling.

    As petrushka explained, the histogram shows the amount of time (i.e. the cumulative number of generations) spent at a particular distance from the target. It is updated and scaled continuously as the program runs.

    The genotypes are initialized randomly when the program starts, so the initial Hamming distance is high — typically 27. That means the action will initially occur toward the bottom of the histogram. With the default settings, selection will reduce the Hamming distance, pushing the activity upward until it settles into a dynamic equilibrium around 12. The histogram then “fills in” with the peak at the equilibrium distance.

    You can then fiddle with the selection coefficient, the mutation rate, and the number of survivors (i.e. the effective population size) to see how the histogram responds.

    To no one’s surprise except Mung’s, selection overpowers drift at some settings but not at others. That is, you are unlikely to achieve a perfect match (a distance of 0) with some settings, but virtually certain to achieve it with others.

  34. petrushka: I asked keiths about that earlier, and he responded.

    Here’s my understanding.

    The numbers to the left of the semicolon are distance from the target.

    The numbers to the left of the Xs are the number of times the survivor/parent has been that far from the target. The histogram represents the relative frequency for any given distance from the target.

    If you let it run a long time it resembles a bell curve.

    With a low coefficient, it is supposed to wander. That’s the point. Drift prevents perfect and permanent achievement of the target. By changing the parameters, you can make it behave like Dawkins weasel.

    What in each case is the starting position? What is the single organism 0 displayed for each run? It appears to me that organism 0 is your starting point, and what that would mean is that in each case there is a symmetrical binomial distribution around the starting point. If there were directional selection operating we should see a skew in the distribution in the direction of 0, which appears not to be the case. What we see here is selection entirely overcome by drift.

    If I understand this at all.

  35. A quick batch of tests indicates to me that at most settings, there is no meltdown. The genome reaches an equilibrium level of fitness which can be moved up or down, but it never latches and never fixes at either end of the fitness spectrum.

    It’s always wobbling.

  36. This little exercise has convinced me of something that has been percolating for a while.

    It is, perhaps, useful to cease thinking of a target, and think instead of fitness in the abstract. The weasel phrase provides a numeric measure of fitness, but does not need to be thought of as a goal. It is okay to reach an equillibrium level of fitness and hover about it.

    What doesn’t seem to happen is genetic meltdown. I’m sure there are parameters that will result in meltdown, but there seems to be a rather wide window of stability.

  37. petrushka,

    A quick batch of tests indicates to me that at most settings, there is no meltdown. The genome reaches an equilibrium level of fitness which can be moved up or down, but it never latches and never fixes at either end of the fitness spectrum.

    It’s always wobbling.

    Yes, and that’s what’s expected in theory.

    Joe derived an equation for the equilibrium point in terms of u (the mutation rate) and s (the selection coefficient), assuming an effective population size of one and an infinite number of offspring. I plan to see how well the program matches the equation over a range of u and s values.

  38. Keiths, I think it would be fairly easy to calculate a mean and sd for each pair of coefficients and survivor number.

    You could make an outer loop that runs through the permutations and prints out just the parameters and mean fitness. That could be imported into some graphing software.

    Just to make mung happy.

Leave a Reply