In his endless pursuit of that wascally Weasel, Mung made the following silly claim:
GAs are often used to demonstrate “the power of cumulative selection.” Given small population sizes drift ought to dominate yet in GAs drift does not dominate.
That is clearly false, but for the benefit of Mung (and his cousin Elmer) I have modified my Weasel program to incorporate both drift and selection. They can now see for themselves that small population sizes are insufficient to guarantee that drift dominates selection.
The code is here. Compile it under Linux using “gcc -std=gnu99 -lm weasel.c -o weasel”.
Run the program and type ‘h’ to see a list of interactive commands:
c – clear the histogram data
f – change the selection coefficient
h – print this help message
m – change the mutation rate
p – pause until a key is pressed
q – quit the program
s – toggle selection on/off
t – change the target phrase
The program generates and updates a scaled histogram showing the number of generations spent at each possible Hamming distance from the target.
petrushka,
Yes, that’s what I plan to do.
I’m not worried about that. We’ve proven him wrong. Any continued denial on his part is a psychological issue, not an empirical one.
keiths,
Still don’t understand. What is the next generation for the organism 0 that’s printed? The one that happened to be the next one when the histogram is printed?
It would be clearer if the starting sequence were printed somewhere.
John:
This will all make more sense if you actually run the program. Petrushka posted a Windows executable here, assuming you have access to a Windows machine.
The entire display gets updated as the program runs. That includes the Organism 0 genome.
It will start out looking something like this…
…and then improve through selection to something like this…
…and, if the parameters are set appropriately, to something like this…
or even a perfect match:
keiths,
Might I suggest that as an aid to comprehension that the histogram include a symbol labeling the starting sequence’s hamming distance?
And someone might try to discover what combinations of parameters will end up with a stationary mode at the position of the starting sequence.
I think you can model the behavior of this program (for most parameter settings) reasonably accurately as a Markov Chain: for each Hamming distance, you can calculate the probability that the next generation will be one better p(-1), the same p(0), or one worse p(+1).
Masochists and pedants can include p(+2), p(-2), etc, etc…
Then you can use Monte-Carlo to find the equilibrium distribution.
After the initial burn-in has occurred, the distribution is not sensitive to the starting position, thanks to the simplicity of the fitness surface.
I think this is what DiEb did for the Weasels. Joe F does this kind of analysis for a living…
ETA: I suspect that, for multiple-survivor situations, running keiths’s model will be easier that figuring out the transition probabilities…
The program is kind of a visual aid for the math impaired. It’s useful to watch it happening. It seems to quiet down the genetic meltdown crowd.
Hardly a major point, but the population takes a short while to settle into equilibrium given its start variance. When there is only one survivor, this takes 1 generation, so it’s even more trivial, but when you bump up the number of survivors, you’d need a short period of drift from a fully random start population (where all are about the same Hamming distance from each other, let alone the target) to get a population that is generating and fixing mutants at a rate consistent with its mutation rate, before releasing the hare … I mean weasel.
This affects virtually nothing; just an observation.
Allan Miller,
Alan, for the relevant equation see chapter VII of my online text Theoretical Evolutionary Genetics, section VII.2 on “Drift versis mutation”, the subsection on “Finite numbers of alleles”. The probability called F there is the probability for one position in the phrase that two randomly chosen individuals among the survivors have the same letter, The recurrence equation, the first one given in that subsection, simply needs us to set K=27. (The theory there is for diploids, so the quantity called 2N is the number of survivors).
It is a simple linear recurrence equation in F, and we can work out from that what is the rate of approach to the equilibrium value of F.
By the way, I am working on a largish post on the mathematics of the N=1 case and should have it up soon.
Joe,
I look forward to it!
Result: METHINKS IT IS LIKE A WEASEL
Which of those letters became fixed as a result of genetic drift?
Just like in real biological populations.
You’re not fooling anyone, keiths.
Mung,
I must be psychic. I suggested above a small possible modification, to permit a population to equilibrate first. I didn’t say, but at the back of my mind was the Creationist carping: “but you have created an unrealistically wide set of start genomes”.
Give it a whirl.
Then try setting the start population up as all X’s – all 28 Hamming units away; as far away as you can get, and with no initial variation. Yes, I know, real genomes don’t consist of letters, and are more than 28 bits long.
Even you Allan, in spite of your “skepticism”, must admit that generating genomes at random starts each genome in a randomly selected area of the search space.
Does that really matter? Does that really change the ability of the program to find the target phrase? Would the result be any different if you started them all out the same or with only a single letter difference between them?
Living things are, by definition, already in an equilibrium state.
Are you going to OOL us?
ETA:
It would probably be useful to think of weasel modelling a duplicated gene — not necessary — but available to acquire a new function.
Mung,
F’God’s sake. I addressed this point, both before you posted and after it.
And this.
Try all X’s. Try anything you like. Why are all you coders so coding-shy?
petrushka,
True, but I was thinking of a population-genetic equilibrium – allowing a few generations of drift.
I think the question of fixation in weasel might be interesting, but there is a problem.
There are no neutral alleles in weasel, no synonymous or nearly synonymous substitutions.
It might be fun to allow upper and lowercase letters.
petrushka, in my equations for the Wright-Fisher model that comes close to being Weasel, there is a selection coefficient s, and one could make that zero.
Also, at any position where the letter does not match the target, any mutation to another nonmatching letter will behave as if neutral, even though in reality both are deleterious.
I think the word “fix” might be misleading. It implies something permanent or unchangeable.
But what could it mean in a population that always bottlenecks to one parent?
I suspect Mung wants to draw attention away from what the program does model, and toward anything it fails to model.
If by this you mean that if you calculate the Hamming distance between two members of a population it’s going to be pretty much the same regardless of which two members of the population are chosedn, maybe keiths can answer that.
You mean like genetic drift?
Ahah! So you do know what a rhetorical question is!
It can model drift just fine if you set selection coefficient to 0.
Here ya go mung.
20 parents, 10,000 generations, zero selection
14: 0
15: 0
16: 0
17: 0
18: 0
19: 0
20: 0
21: 3
22: 4
23: 54 X
24: 113 XXX
25: 274 XXXXXXXX
26: 1144 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
27: 1731 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
28: 1960 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
Here’s the population. I see at least one position fixed
QTSYAGUWJGYROICWTYYMDMTTJDKT
QTKYAGUPKGYQOOCWTYYMDTTTJWKT
QTSYAGUWJGLROICWTYYMDMTTJDKT
PTKYDKUWGGGAOICWTYYIXMTTJTKU
QTKYVGUWGGYHOICWTNYMXMTKJTOU
VEKYAPUWGOYUQBCSIXEMWMTTJTKU
QTKYAGUPKGYQOOCWTYYMDTTTJDKT
CTSYAGUWJGYROICWTYYMDMTTJDKT
VEKYAPUWGOYUQBCSIXEMWSTEJTKU
QTKYAGUWGGKHOICWTNYMXMTKJTKU
PTKYDKUWGGGAOICWTYYIXMTTVTKU
QTKYAGUWGGKHOICWTJYMXMTKJTKU
QTKYAGUWGGKHOICWTJYMXMTKJTKU
QTKYAGUWGGYQOICWTYYMDMTTBTKT
QTKYAGUWGGKHOICWTNYMCMTKJTKU
VEKYAPUWGOYUQBCSIXEMWSTTJTKU
QTKYAGUWGGKHOICWTNYMCMTKJTKU
QTKYVGUWGGYHOICWTNYMXMTKJTOU
QTKYAGUPKGYQOOCWTYYMDTTTJDKT
PTKYDKUWGGGAOICWTYYIXMTTJTKU
Mung:
Mung,
If you understood how the program works, you would already know the answer to your questions.
And as Allan points out, you could easily modify the program to initialize the population however you desire. You claim to be a coder. Do some coding for a change!
Third, you can create the desired scenario without even modifying the code, by a judicious application of the commands available to you. See if you can figure out how to do it.
Less carping and more thinking, please.
petrushka:
Mung:
petrushka:
It models drift just fine even with nonzero selection coefficients. The equilibrium Hamming distances are where the effects of drift and selection net out to zero on average.
Poor Mung is going in circles with Weasel. He thinks he’s chasing it, but in reality it is chasing him.
DiEb didn’t simulate. He solved for the expected value of the first hitting time. The hard thing is to acquire a genuine understanding of what that means, not to use an environment like R to do the calculation.
The random number of characters going from correct (incorrect) in the parent to incorrect (correct) in the offspring follows a binomial distribution. That’s the stuff of a first course in probability, so I won’t accept that it’s pedantic.
Now I’ll come off as pedantic. Keith has not presented a model. He has distributed a program, and made much of the fact that Mung (who’s laughed for years at people who take seriously someone who calls himself “Mung”) cannot infer the intended model from the source code of the program.
I figured out the model that Keith intended to implement by reading a program that failed to implement it. Keith evidently did not learn the lesson that he should have, because he keeps telling us to run his program to see what he really means. Mung and his fellow UDders are enjoying the spectacle, at After the Milkbar Closes. He’s come through with appropriate software-engineering responses, which I doubt he would have generated on his own.
Shouldn’t this also depend on the mutation rate?
John,
The mutation rate already factors into the magnitude of both drift and selection. See Joe’s new OP.
Tom,
Then by all means point out the flaws so I can fix them!
Huh?
Are you saying that the mutation rate cancels out of the calculation of the equilibrium state?
John,
No, I’m saying that at equilibrium, the total rate at which letters are changing from matching to mismatching equals the total rate at which they are changing from mismatching to matching. Both of those rates already depend on the mutation rate.
I didn’t follow the previous discussion closely. Would you do me the favor of providing (or pointing me to) a terse summary of the model?
It’s bothered me that Keith uses the selection coefficient to scale the overall fitness, instead of the fitness contributions of individual alleles. The latter is what I see in the tutorial that Mung brought to the table.
Emphasis added to what I don’t understand. The fitness function as a special case of a pseudo-boolean function. That is, for real-valued weights and target sentence
where
is the indicator function for set I use the indicator function because there’s an easy generalization to sets at the loci. So an allele is beneficial if is positive, and deleterious if is negative. An allele is neutral. Weasel is a special case in which all sets are of size 1, and all weights are equal to 1, i.e., alleles are either beneficial or neutral, never deleterious.
Of course, I’d never argue with you about theoretical biology. I’m drawing on the theory of evolutionary computation (with no claim of expertise). For various evolutionary algorithms, with mutation rate the expected number of function evaluations to maximize a pseudo-boolean function is on the order of This is an important result, for folks like me. It happens also to provide another way of shooting down the notion that the fitness function must be carefully designed for the algorithm to hit the target. All that matters is the signs of the (nonzero) weights, not their magnitudes.
From premier scientist to mind reader.
Tom is wrong, of course, but even I got it.
If you’re talking about the “UDders” pun, I’m sure everyone got it.
The question is why Tom would think you had something to laugh about. Your attacks on Weasel have all failed.
No Tom, no one is coaching me.
Tom is being nice to you. You should be nice to Tom. Taking him seriously would help. You simply cannot be right all the time without fail.
Mung, Tom has moved on to Joe Felsenstein’s thread.
Mung,
I do take Tom seriously. Taking someone seriously doesn’t mean passively agreeing with everything he says.
Of course not. So?
petrushka,
I installed lcc-win, but my makes are failing due to undefined symbols (screenshot below). The tool doesn’t seem to be able to figure out which libraries it needs, and the “Add libs” function doesn’t work. Specifying the full path to crtdll.lib didn’t help.
Are you using the IDE? If so, could you look under Project->Configuration->Linker and tell me which libraries are being used in your setup?
Or if you’re using the command line, could you give me the specific commands you’re using?
Thanks.
I installed in the default directory and use:
\lcc\bin\lc xxx.c
I tried that too, but it gave me the same results as for the IDE.
Uninstalling and reinstalling didn’t help.
It seems to be one bad library. I’ll take a look.
look for crtdll.exp in the buildlib folder. Open it with notepad or a text editor and have a look.
crtdll.lib should be in the lib folder
From “LCC-Win32 User’s Manual”, pages 55-58, by Jacob Navia.
The Linker
The general format of a link command line is as follows:
lcclnk [options] object-files library-files
The linker will default to the following import libraries:
LIBC.LIB A few C functions are in here
KERNEL32.LIB Kernel function calls
COMDLG32.LIB Common dialog boxes library
USER32.LIB Windows ‘user’ functions
GDI32.LIB GDI functions
ADVAPI32.LIB More Windows functions
COMCTL32.LIB Library for the common controls of Win32
CRTDLL.LIB Library for the ‘C’ runtime functions
These libraries do not need to be specified in the command line because the
linker will search them anyway.
my crtdll.lib is 436kb
I moved to an older machine, Win 7 instead of Win 10, and now everything is working.