A selection-vs-drift version of Weasel

In his endless pursuit of that wascally Weasel, Mung made the following silly claim:

GAs are often used to demonstrate “the power of cumulative selection.” Given small population sizes drift ought to dominate yet in GAs drift does not dominate.

That is clearly false, but for the benefit of Mung (and his cousin Elmer) I have modified my Weasel program to incorporate both drift and selection.  They can now see for themselves that small population sizes are insufficient to guarantee that drift dominates selection.

The code is here. Compile it under Linux using “gcc -std=gnu99 -lm weasel.c -o weasel”.

Run the program and type ‘h’ to see a list of interactive commands:

c – clear the histogram data
f – change the selection coefficient
h – print this help message
m – change the mutation rate
p – pause until a key is pressed
q – quit the program
s – toggle selection on/off
t – change the target phrase

The program generates and updates a scaled histogram showing the number of generations spent at each possible Hamming distance from the target.

222 thoughts on “A selection-vs-drift version of Weasel

  1. petrushka,

    You could make an outer loop that runs through the permutations and prints out just the parameters and mean fitness.

    Yes, that’s what I plan to do.

    Just to make mung happy.

    I’m not worried about that. We’ve proven him wrong. Any continued denial on his part is a psychological issue, not an empirical one.

  2. keiths,

    Still don’t understand. What is the next generation for the organism 0 that’s printed? The one that happened to be the next one when the histogram is printed?

    It would be clearer if the starting sequence were printed somewhere.

  3. John:

    Still don’t understand. What is the next generation for the organism 0 that’s printed? The one that happened to be the next one when the histogram is printed?

    It would be clearer if the starting sequence were printed somewhere.

    This will all make more sense if you actually run the program. Petrushka posted a Windows executable here, assuming you have access to a Windows machine.

    The entire display gets updated as the program runs. That includes the Organism 0 genome.

    It will start out looking something like this…

    Organism 0 Fitness 1.30e+03 Hamming distance 24

    JJYEISJSHHONVBXKNTVZBJWTSOE

    …and then improve through selection to something like this…

    Organism 0 Fitness 6.09e+14 Hamming distance 9

    MEGOIGKS ITNQS LILE D WRZSEL

    …and, if the parameters are set appropriately, to something like this…

    Organism 0 Fitness 1.85e+40 Hamming distance 1

    METHINKS IT IS LIUE A WEASEL

    or even a perfect match:

    Organism 0 Fitness 5.73e+41 Hamming distance 0

    METHINKS IT IS LIKE A WEASEL

  4. keiths,

    Might I suggest that as an aid to comprehension that the histogram include a symbol labeling the starting sequence’s hamming distance?

    And someone might try to discover what combinations of parameters will end up with a stationary mode at the position of the starting sequence.

  5. I think you can model the behavior of this program (for most parameter settings) reasonably accurately as a Markov Chain: for each Hamming distance, you can calculate the probability that the next generation will be one better p(-1), the same p(0), or one worse p(+1).
    Masochists and pedants can include p(+2), p(-2), etc, etc…
    Then you can use Monte-Carlo to find the equilibrium distribution.
    After the initial burn-in has occurred, the distribution is not sensitive to the starting position, thanks to the simplicity of the fitness surface.
    I think this is what DiEb did for the Weasels. Joe F does this kind of analysis for a living
    ETA: I suspect that, for multiple-survivor situations, running keiths’s model will be easier that figuring out the transition probabilities…

  6. The program is kind of a visual aid for the math impaired. It’s useful to watch it happening. It seems to quiet down the genetic meltdown crowd.

  7. Hardly a major point, but the population takes a short while to settle into equilibrium given its start variance. When there is only one survivor, this takes 1 generation, so it’s even more trivial, but when you bump up the number of survivors, you’d need a short period of drift from a fully random start population (where all are about the same Hamming distance from each other, let alone the target) to get a population that is generating and fixing mutants at a rate consistent with its mutation rate, before releasing the hare … I mean weasel.

    This affects virtually nothing; just an observation.

  8. Allan Miller,

    Alan, for the relevant equation see chapter VII of my online text Theoretical Evolutionary Genetics, section VII.2 on “Drift versis mutation”, the subsection on “Finite numbers of alleles”. The probability called F there is the probability for one position in the phrase that two randomly chosen individuals among the survivors have the same letter, The recurrence equation, the first one given in that subsection, simply needs us to set K=27. (The theory there is for diploids, so the quantity called 2N is the number of survivors).

    It is a simple linear recurrence equation in F, and we can work out from that what is the rate of approach to the equilibrium value of F.

    By the way, I am working on a largish post on the mathematics of the N=1 case and should have it up soon.

  9. Joe,

    By the way, I am working on a largish post on the mathematics of the N=1 case and should have it up soon.

    I look forward to it!

  10. Result: METHINKS IT IS LIKE A WEASEL

    Which of those letters became fixed as a result of genetic drift?

  11. keiths: We’ve proven him wrong. Any continued denial on his part is a psychological issue, not an empirical one.

    You’re not fooling anyone, keiths.

  12. Mung,

    Just like in real biological populations.

    I must be psychic. I suggested above a small possible modification, to permit a population to equilibrate first. I didn’t say, but at the back of my mind was the Creationist carping: “but you have created an unrealistically wide set of start genomes”.

    Give it a whirl.

    Then try setting the start population up as all X’s – all 28 Hamming units away; as far away as you can get, and with no initial variation. Yes, I know, real genomes don’t consist of letters, and are more than 28 bits long.

  13. Even you Allan, in spite of your “skepticism”, must admit that generating genomes at random starts each genome in a randomly selected area of the search space.

    Does that really matter? Does that really change the ability of the program to find the target phrase? Would the result be any different if you started them all out the same or with only a single letter difference between them?

  14. Living things are, by definition, already in an equilibrium state.

    Are you going to OOL us?

    ETA:

    It would probably be useful to think of weasel modelling a duplicated gene — not necessary — but available to acquire a new function.

  15. Mung,

    Even you Allan, in spite of your “skepticism”, must admit that generating genomes at random starts each genome in a randomly selected area of the search space.

    F’God’s sake. I addressed this point, both before you posted and after it.

    Does that really matter? Does that really change the ability of the program to find the target phrase? Would the result be any different if you started them all out the same or with only a single letter difference between them?

    And this.

    Try all X’s. Try anything you like. Why are all you coders so coding-shy?

  16. petrushka,

    Living things are, by definition, already in an equilibrium state.

    True, but I was thinking of a population-genetic equilibrium – allowing a few generations of drift.

  17. I think the question of fixation in weasel might be interesting, but there is a problem.

    There are no neutral alleles in weasel, no synonymous or nearly synonymous substitutions.

    It might be fun to allow upper and lowercase letters.

  18. petrushka, in my equations for the Wright-Fisher model that comes close to being Weasel, there is a selection coefficient s, and one could make that zero.

    Also, at any position where the letter does not match the target, any mutation to another nonmatching letter will behave as if neutral, even though in reality both are deleterious.

  19. I think the word “fix” might be misleading. It implies something permanent or unchangeable.

    But what could it mean in a population that always bottlenecks to one parent?

    I suspect Mung wants to draw attention away from what the program does model, and toward anything it fails to model.

  20. petrushka: Living things are, by definition, already in an equilibrium state.

    If by this you mean that if you calculate the Hamming distance between two members of a population it’s going to be pretty much the same regardless of which two members of the population are chosedn, maybe keiths can answer that.

  21. petrushka: I suspect Mung wants to draw attention away from what the program does model, and toward anything it fails to model.

    You mean like genetic drift?

  22. Here ya go mung.

    20 parents, 10,000 generations, zero selection

    14: 0
    15: 0
    16: 0
    17: 0
    18: 0
    19: 0
    20: 0
    21: 3
    22: 4
    23: 54 X
    24: 113 XXX
    25: 274 XXXXXXXX
    26: 1144 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
    27: 1731 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
    28: 1960 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

  23. Here’s the population. I see at least one position fixed

    QTSYAGUWJGYROICWTYYMDMTTJDKT
    QTKYAGUPKGYQOOCWTYYMDTTTJWKT
    QTSYAGUWJGLROICWTYYMDMTTJDKT
    PTKYDKUWGGGAOICWTYYIXMTTJTKU
    QTKYVGUWGGYHOICWTNYMXMTKJTOU
    VEKYAPUWGOYUQBCSIXEMWMTTJTKU
    QTKYAGUPKGYQOOCWTYYMDTTTJDKT
    CTSYAGUWJGYROICWTYYMDMTTJDKT
    VEKYAPUWGOYUQBCSIXEMWSTEJTKU
    QTKYAGUWGGKHOICWTNYMXMTKJTKU
    PTKYDKUWGGGAOICWTYYIXMTTVTKU
    QTKYAGUWGGKHOICWTJYMXMTKJTKU
    QTKYAGUWGGKHOICWTJYMXMTKJTKU
    QTKYAGUWGGYQOICWTYYMDMTTBTKT
    QTKYAGUWGGKHOICWTNYMCMTKJTKU
    VEKYAPUWGOYUQBCSIXEMWSTTJTKU
    QTKYAGUWGGKHOICWTNYMCMTKJTKU
    QTKYVGUWGGYHOICWTNYMXMTKJTOU
    QTKYAGUPKGYQOOCWTYYMDTTTJDKT
    PTKYDKUWGGGAOICWTYYIXMTTJTKU

  24. Mung:

    Even you Allan, in spite of your “skepticism”, must admit that generating genomes at random starts each genome in a randomly selected area of the search space.

    Does that really matter? Does that really change the ability of the program to find the target phrase? Would the result be any different if you started them all out the same or with only a single letter difference between them?

    Mung,

    If you understood how the program works, you would already know the answer to your questions.

    And as Allan points out, you could easily modify the program to initialize the population however you desire. You claim to be a coder. Do some coding for a change!

    Third, you can create the desired scenario without even modifying the code, by a judicious application of the commands available to you. See if you can figure out how to do it.

    Less carping and more thinking, please.

  25. petrushka:

    I suspect Mung wants to draw attention away from what the program does model, and toward anything it fails to model.

    Mung:

    You mean like genetic drift?

    petrushka:

    It can model drift just fine if you set selection coefficient to 0.

    It models drift just fine even with nonzero selection coefficients. The equilibrium Hamming distances are where the effects of drift and selection net out to zero on average.

    Poor Mung is going in circles with Weasel. He thinks he’s chasing it, but in reality it is chasing him.

  26. DNA_Jock: I think you can model the behavior of this program (for most parameter settings)reasonably accurately as a Markov Chain: for each Hamming distance, you can calculate the probability that the next generation will be one better p(-1), the same p(0), or one worse p(+1).
    Masochists and pedants can include p(+2), p(-2), etc, etc…
    Then you can use Monte-Carlo to find the equilibrium distribution.
    After the initial burn-in has occurred, the distribution is not sensitive to the starting position, thanks to the simplicity of the fitness surface.
    I think this is what DiEb did for the Weasels. Joe F does this kind of analysis for a living

    DiEb didn’t simulate. He solved for the expected value of the first hitting time. The hard thing is to acquire a genuine understanding of what that means, not to use an environment like R to do the calculation.

    The random number of characters going from correct (incorrect) in the parent to incorrect (correct) in the offspring follows a binomial distribution. That’s the stuff of a first course in probability, so I won’t accept that it’s pedantic.

    DNA_Jock: I suspect that, for multiple-survivor situations, running keiths’s model will be easier that figuring out the transition probabilities…

    Now I’ll come off as pedantic. Keith has not presented a model. He has distributed a program, and made much of the fact that Mung (who’s laughed for years at people who take seriously someone who calls himself “Mung”) cannot infer the intended model from the source code of the program.

    I figured out the model that Keith intended to implement by reading a program that failed to implement it. Keith evidently did not learn the lesson that he should have, because he keeps telling us to run his program to see what he really means. Mung and his fellow UDders are enjoying the spectacle, at After the Milkbar Closes. He’s come through with appropriate software-engineering responses, which I doubt he would have generated on his own.

  27. keiths: The equilibrium Hamming distances are where the effects of drift and selection net out to zero on average.

    Shouldn’t this also depend on the mutation rate?

  28. Tom,

    I figured out the model that Keith intended to implement by reading a program that failed to implement it.

    Then by all means point out the flaws so I can fix them!

    Mung and his fellow UDders are enjoying the spectacle, at After the Milkbar Closes.

    Huh?

  29. keiths:
    John,

    The mutation rate already factors into the magnitude of both drift and selection.See Joe’s new OP.

    Are you saying that the mutation rate cancels out of the calculation of the equilibrium state?

  30. John,

    Are you saying that the mutation rate cancels out of the calculation of the equilibrium state?

    No, I’m saying that at equilibrium, the total rate at which letters are changing from matching to mismatching equals the total rate at which they are changing from mismatching to matching. Both of those rates already depend on the mutation rate.

  31. petrushka: There are no neutral alleles in weasel, no synonymous or nearly synonymous substitutions.

    Joe Felsenstein: petrushka, in my equations for the Wright-Fisher model that comes close to being Weasel, there is a selection coefficient s, and one could make that zero.

    I didn’t follow the previous discussion closely. Would you do me the favor of providing (or pointing me to) a terse summary of the model?

    It’s bothered me that Keith uses the selection coefficient to scale the overall fitness, instead of the fitness contributions of individual alleles. The latter is what I see in the tutorial that Mung brought to the table.

    Joe Felsenstein: Also, at any position where the letter does not match the target, any mutation to another nonmatching letter will behave as if neutral, even though in reality both are deleterious.

    Emphasis added to what I don’t understand. The fitness function as a special case of a pseudo-boolean function. That is, for real-valued weights w_1, w_2, \dotsc, w_n, and target sentence t_1, t_2, \dotsc, t_n,

        \[f_\mathbf{w}(x) = \sum_{i=1}^n w_i \cdot I_{t_i}(x_i),\]

    where

        \[I_{t}(a) =    \begin{cases}        1 & \text{if $a \in \{t\}$} \\       0 & \text{otherwise.}    \end{cases}\]

    is the indicator function for set \{t\}. I use the indicator function because there’s an easy generalization to sets T_i at the n loci. So an allele x_i \in T_i is beneficial if w_i is positive, and deleterious if w_i is negative. An allele x_i \notin T_i is neutral. Weasel is a special case in which all sets are of size 1, and all weights are equal to 1, i.e., alleles are either beneficial or neutral, never deleterious.

    Of course, I’d never argue with you about theoretical biology. I’m drawing on the theory of evolutionary computation (with no claim of expertise). For various evolutionary algorithms, with mutation rate 1/n, the expected number of function evaluations to maximize a pseudo-boolean function is on the order of n \ln n. This is an important result, for folks like me. It happens also to provide another way of shooting down the notion that the fitness function must be carefully designed for the algorithm to hit the target. All that matters is the signs of the (nonzero) weights, not their magnitudes.

  32. If you’re talking about the “UDders” pun, I’m sure everyone got it.

    The question is why Tom would think you had something to laugh about. Your attacks on Weasel have all failed.

  33. keiths: The question is why Tom would think you had something to laugh about. Your attacks on Weasel have all failed.

    Tom is being nice to you. You should be nice to Tom. Taking him seriously would help. You simply cannot be right all the time without fail.

  34. Mung,

    I do take Tom seriously. Taking someone seriously doesn’t mean passively agreeing with everything he says.

    You simply cannot be right all the time without fail.

    Of course not. So?

  35. petrushka,

    I installed lcc-win, but my makes are failing due to undefined symbols (screenshot below). The tool doesn’t seem to be able to figure out which libraries it needs, and the “Add libs” function doesn’t work. Specifying the full path to crtdll.lib didn’t help.

    Are you using the IDE? If so, could you look under Project->Configuration->Linker and tell me which libraries are being used in your setup?

    Or if you’re using the command line, could you give me the specific commands you’re using?

    Thanks.

  36. I tried that too, but it gave me the same results as for the IDE.

    Uninstalling and reinstalling didn’t help.

  37. look for crtdll.exp in the buildlib folder. Open it with notepad or a text editor and have a look.

    crtdll.lib should be in the lib folder

  38. From “LCC-Win32 User’s Manual”, pages 55-58, by Jacob Navia.

    The Linker

    The general format of a link command line is as follows:

    lcclnk [options] object-files library-files

    The linker will default to the following import libraries:

    LIBC.LIB A few C functions are in here
    KERNEL32.LIB Kernel function calls
    COMDLG32.LIB Common dialog boxes library
    USER32.LIB Windows ‘user’ functions
    GDI32.LIB GDI functions
    ADVAPI32.LIB More Windows functions
    COMCTL32.LIB Library for the common controls of Win32
    CRTDLL.LIB Library for the ‘C’ runtime functions

    These libraries do not need to be specified in the command line because the
    linker will search them anyway.

Leave a Reply