I often encounter posters here at TSZ who claim that Genetic Algorithms (GAs) either model or simulate evolution. They are never quite clear which it is, nor do they say what it means to model or simulate evolution (what would be required) and how GAs qualify as either one or the other. My position is that GAs neither model nor simulate evolution. In addition to other reasons I’ve given in the past I’d like to present the following argument.
GAs are often used to demonstrate “the power of cumulative selection.” Given small population sizes drift ought to dominate yet in GAs drift does not dominate. Why not?
Three questions:
- How do we determine the effective population size for a GA?
- How do we calculate the value of the selection coefficient?
- How do we determine when genetic drift will overcome the effects of selection?
In a GA written by keiths (a version of the WEASEL program) the default population size is 200.
#define POPULATION_SIZE 200 // total population size
Effective population size is the number of individuals in a population who contribute offspring to the next generation.
Even though the population size is 200, only one is selected to contribute offspring to the next generation.
#define NUM_SURVIVORS 1 // number of survivors per generation (must be less than POP_SIZE)
Given an effective population size of one, drift ought to dominate, but it doesn’t.
Given an effective population size of one, what must the selection coefficient be for drift to not dominate selection?
I’d truly appreciate any assistance with the concepts or the math.
In any event, there is no way that this GA (the keiths WEASEL program) either models or simulates evolution.
Reference:
Neutral Theory: The Null Hypothesis of Molecular Evolution
See the OP.
Mung’s Blunder!
So here’s a potentially interesting exercise.
Is there a way to tell if a WEASEL program is converging on the solution without waiting for it to halt?
IOW, what if we want to play with the parameters without having to [potentially] wait forever to see if it finds the target?
you can simply sample and print the average fitness or max fitness. What’s your point?
I compiled using lcc on Windows, and it seems to run. At least it doesn’t crash.
kid 1=,,3.5AH=”LKI=BT’@#EIS8 @C,DX
kid 2=R&C+L:&-T0-A>S+K1Z6,44P2GTWRWH&DF
kid 9=7V@QPPVV/7( S4X’RC1FL6!A$@Z8
kid 10=KX(VA5N8S3’F”GF0G>X4B (2(U,=
kid 11=’-W&&D@AVDQ5+@J<0(T"9Y#Y&*!3
kid 12=N5$R&OAT;+# &:!=N0).)LQEPD67
kid 13=(&E'971*9Z=UQXLC9T60V:MPSI+@
kid 14=%HAMUD9C.XT-7"BK(07<8=)DQ 65
kid 15=<XIH*SW8U09U,ITZS8ZC%D0&U!:I”O9N1W/W/:X8;A:FJE/323
kid 17=’K9CCC&&23=58)U7ZC%T4MO09 ‘A
kid 18=/>J@G ?LVXGSK827K-3WN;?4G(“0
kid 19=%7NBR3D1%(-(UR2>@8V;”>8=AC>7
kid 20=.9AFBJZ7@T5<XYG6*7U!1=LYNGH@
kid 21=U-+%:,E/.1H1D5:XJ'CTA/JI.TDM
kid 22=GQLK*N<H N*%1#?0L0Z4*SICCO*R
kid 23=Q,/4.%I!5’-EPH;IU
kid 24=,;&UY1<F:X.#DIS=8<* E*,FIW<&
I'm not sure at the moment what this means.
The comments seem to indicate that the strings should be made of uppercase alpha characters. Is that correct?
stcordova,
It’s not where I was headed, it’s where I was coming from. The number of mutations is a function of the per-base mutation rate. Double the number of bases, (ie by doubling the number of individuals) you double the number of mutations produced by the population. This is irrespective of whether they are neutral or not.
Your apparent proposal was that a large population with 1 mutation per generation will behave differently from a small population with 1 mutation per generation . Of course it will, because you must have changed the per-base mutation rate to achieve that difference.
No, because ‘bad’ are selected against and ‘good’ are selected for. For any value of s treated as alternatively added to and subtracted from 1, the detrimental side will be eliminated MUCH more readily than the beneficial. See Kimura.
Even if you produce detrimental at a rate which is a substantial multiple of beneficial, your genomes will become enriched in the beneficial and impoverished in the detrimental, unless you have your mutation rates set very high. You’ll find hitting kittens with a mallet has a negative effect on growth, too.
Thanks for trying it out.
If you run on the suggested compiler in the link provided, you can run it online and watch the string evolve on the last output line. If you set the s_coefficient to 20.0 you’ll see the last line converge on the target. As your run it with lower and lower s_coefficients, the number of generations required for convergence increases, and at some point, when the s_coefficient is low enough, evolution becomes a random walk, just as I predicted earlier.
The first lines that printed out are the random strings that represent each kid.
The final lines, after the run completes and if the target is hit reads:
Thanks for trying out the software. Feel free to modify by removing some print statements if you like.
OK, I have it running. I set the selection coefficient to 2.1 and it hit the target in 36975 generations.
At some point it wandered way off.
Another problem with Sal’s algo I (think) I’ve spotted:
When Sal creates the offspring of the fittest, it goes to random positions of the array of kids with a probability dependent on the selective coefficient:
/* fittest has a chance to be surving parent by random selection out of population enriched by fittest
form having extra kids */
/* add integer number part of s coefficient */
for (i=0; i < s_coefficient_integer_part; i++)
{
strcpy( kid[rand()%NUMBER_KIDS], clone_of_fittest_kid);
}
/* add occasional fractional number part of s coefficient occasionally */
if ( rand()%one_part_out_of == 0 )
{
strcpy( kid[rand()%NUMBER_KIDS], kid[index_of_fittest]);
}
…but when he creates the next generation, he only uses the kid in the first position of the array:
for (i=0; i < NUMBER_KIDS; i++)
{
make_mutant_kid(kid[0],kid[i]);
}
as a result, more often than not, the offspring of the fittest is NOT in the first position of the array, which keeps accumulating mutations with no control whatsoever.
So Sal has coded some sort of "survival of the random" instead of a proper drift model that is supposed to allow only neutral or slightly (occasional) deleterious mutations to fixate: if the fittest descendant is not selected according to the coefficient, a random mutant is picked for the next generation ignoring it's fitness completely
For example, with an offspring of 200 and a coefficient of 1.5, there’s a 1% probability that the fittest is selected for the next round. But if it’s not selected, the coefficient is completely ignored for the rest of the 199 descendants and one with lower fitness is just as likely to be selected as one with the same fitness as the fittest
I changed this line to print the fitness score:
printf(“fittest=%d: %sr”, fittest_score, kid[index_of_fittest]);
I haven’t analyzed what the code is doing, but I’ve made two observations.
The fitness score can drop quite significantly during a run. I haven’t figured out if this is a feature.
The random seed in my implementation is the same on each run. This could actually be useful.
I also commented out the line that prints the initial kids.
I think this version would have been fun if it were available during the latching wars. You can see clearly that no position is latched or protected from mutation.
Dazz,
Thank you for looking at my code and for the criticism and skepticism. I certainly welcome reviews and corrections. Thank you for taking time to look. The code was terse, not really the most readable. There is probably a way to write it that is a little more in line with how we would describe it in English, but I was trying to code quickly — hence a few tricks.
The way the probability of the fittest is given advantage is by having more offspring than each of the not-as-fit.
Yes, once the extra offspring are added, then everything is equiprobable, that’s how the selection coefficient creates advantage, by the number of offspring, not necessarily by how close it is to the target string.
Regarding this:
That is version 2:
http://creationevolutionuniversity.com/forum/viewtopic.php?f=4&t=130
Version 1 does it more in line with the way you apparently prefer by putting 1 survivor in the first postion (that is randomly selected). The fittest occupy that 1st position by accident. Version 1 converges to target more readily. The important thing is that there is still a relation to selection strength to speed of convergence. So I’m OK with version 1 as teaching tool as well.
Here is version 1:
http://creationevolutionuniversity.com/forum/viewtopic.php?f=4&t=129#p648
You can see I implemented it differently:
/* fittest has a chance to be surving parent by random selection out of population enriched by fittest
form having extra kids */
for (i=0; i < s_coefficient_integer_part; i++)
{
strcpy( kid[rand()%NUMBER_KIDS], kid[index_of_fittest]);
}
if ( rand()%one_part_out_of == 0 )
{
strcpy( kid[rand()%NUMBER_KIDS], kid[index_of_fittest]);
}
printf("fittest=%sr", kid[index_of_fittest]);
/* make mutant kids */
for (i=1; i < NUMBER_KIDS; i++)
{
make_mutant_kid(kid[0],kid[i]);
}
If you put the selection coefficient at virtual infinity (say 100.0) you’ll see it converge really fast both in version 1 and version 2. 🙂
I added
srand(time(NULL));
Noticed that too. Maybe Sal can clarify how could that be with one mutation per individual
I find it a bit confusing to think of broods of ‘kids’. You just pick an individual ‘at random’ to sire the next offspring (bearing in mind that ‘at random’ can include some selective increase in chances of being picked for some over others). You add this child to the population. Fitter individuals will persist in the population longer and/or sire offspring more frequently, and that’s how their selection coefficients are manifest, not by giving each one a number of offspring depending how fit it is.
And I think mutation rate should not be an integer. If set to 1, it looks like every individual will mutate once, and that’s not properly stochastic.
You may not be bothered about strict biological realism, but there doesn’t seem much point to the exercise if you avoid it unnecessarily.
stcordova,
I went through your code and this seems to be your algorithm:
1) Initialize 25 “kid” strings of length 28 with random letters drawn from the set of all uppercase letters and space.
2) Find the most fit kid based on the Hamming distance to the target string.
3) Mutate each string in the array by replacing it with the first kid (which is not necessarily, or even likely, the most fit) and changing one random letter.
4) Replace one of the kid strings in the array with a copy of the most fit kid from step 2. (The number to replace is the integer portion of the S_COEFFICIENT constant.)
5) With probability equal to 1 in 1 divided by the fractional part of S_COEFFICIENT, replace another of the kid strings in the array with a copy of the most fit kid from step 2.
6) If the target isn’t found, return to step 2.
What in Eris’ name do you think this models in the real world?
I made another change to make changes friendlier.
int main(int argc, char **argv)
{
target_strlen = strlen(target_string);
int i = 0;
/* srand(time(NULL)); */
double selco;
if(argc > 1) {selco = atof(argv[1]);} else {selco = (double) S_COEFFICIENT;}
Sample runs:
C:lccexamplessc>sc 2.1
fittest=28: METHINKS IT IS LIKE A WEASEL
integer = 2 fractional=0.100000 out_of=9
hit target in 36975 generations
C:lccexamplessc>sc 8.1
fittest=28: METHINKS IT IS LIKE A WEASEL
integer = 8 fractional=0.100000 out_of=10
hit target in 2092 generations
C:lccexamplessc>sc 16.1
fittest=28: METHINKS IT IS LIKE A WEASEL
integer = 16 fractional=0.100000 out_of=9
hit target in 5708 generations
C:lccexamplessc>sc 32.1
fittest=28: METHINKS IT IS LIKE A WEASEL
integer = 32 fractional=0.100000 out_of=9
hit target in 4835 generations
C:lccexamplessc>sc 64.1
fittest=28: METHINKS IT IS LIKE A WEASEL
integer = 64 fractional=0.100000 out_of=10
hit target in 7944 generations
C:lccexamplessc>
But the coefficient should apply to every descendant. The correct model IMO should assign every kid a probability of being picked based on it’s fitness and the selection coefficient
Well, that’s the one without drift. Drift is the fun part probably, I just would modify the drift implementation to give every descendant a proper “weight” or probability of being selected
Whoops, the following is more complete.
double selco;
if(argc > 1) {selco = atof(argv[1]);} else {selco = (double) S_COEFFICIENT;}
/* initialization and memory setup */
s_coefficient_integer_part = (int) selco;
s_coefficient_fractional_part = (double) selco – (double) s_coefficient_integer_part;
one_part_out_of = (int) (1.0/ s_coefficient_fractional_part);
Gentleman,
Thank you all for your reviewing my work and offering criticism and skepticism. I’m not here to say my program is the final word. It was to provide an illustration of how drift could be added to WEASEL so that WEASEL can evolve by selection and drift.
I wrote a non-drifting WEASEL in 105 lines. To make DRIFTING WEASEL V2, it took 146 lines. I did this partly to show the drifting model can be implemented without having to make a gigantic leap in complexity.
Certainly since Keiths and others wrote the traditional weasel, they can probably make a drifting weasel to the liking of the regulars here at TSZ. But I suspect the end results should be comparable to mine, namely, the weaker the s_coefficients, the longer it takes to converge if at all. I really don’t think that result should be controversial.
Is there supposed to be some sort of correlation between selection coefficient and convergence number of generations?
I don’t believe the algorithm you implemented is useful for any pedagogical purpose.
If you want an example that demonstrates drift, you could do worse than start with these posts from OMagain and olegt:
Randomness and Evolution: An Interactive Toy
Counting Generations of M&Ms
MandM revisited
If the algorithm is as I summarized it, your program does not do that.
C:\lcc\examples\sc>sc 100
fittest=28: METHINKS IT IS LIKE A WEASEL
integer = 100 fractional=0.000000 out_of=-2147483648
hit target in 26336 generations
C:\lcc\examples\sc>sc 2.1
fittest=28: METHINKS IT IS LIKE A WEASEL
integer = 2 fractional=0.100000 out_of=9
hit target in 36975 generations
C:\lcc\examples\sc>sc 3.1
fittest=28: METHINKS IT IS LIKE A WEASEL
integer = 3 fractional=0.100000 out_of=9
hit target in 12940 generations
C:\lcc\examples\sc>sc 4.1
fittest=28: METHINKS IT IS LIKE A WEASEL
integer = 4 fractional=0.100000 out_of=10
hit target in 8882 generations
C:\lcc\examples\sc>sc 5.1
fittest=28: METHINKS IT IS LIKE A WEASEL
integer = 5 fractional=0.100000 out_of=10
hit target in 10331 generations
Methinks there may be a bug.
Who says that a GA must model the real world?
Mung,
Someone who is critiquing an attempt to use a GA to say something about the real world?
It would be a start if it modeled the WEASEL program. Then we could move on.
Sal asserted that if I set the selection coefficient to 100 it would quickly converge on the target. I do not see any obvious correlation between coefficient and convergence speed.
I have submitted my results.
Sal. He’s saying he’s modeling genetic drift.
You are unbelievable. Seriously, and not in a good way
Isn’t the selection coefficient supposed to be a value between 0 and 1?
https://www.blackwellpublishing.com/ridley/a-z/Selection_coefficient.asp
Patrick, to Sal:
Oh dear.
Well, I’m a dummy but I’m not afraid to learn something new. Here’s a very simple example of how I would model random genetic drift:
Create a population of two organisms. Assume one is twice as “fit” as the other. Create a temporary population with two copies of the more fit genotype and one copy of the less fit genotype. Randomly select two organisms from the temporary population.
1.) is this the wrong way to do it?
2.) is there a simpler way?
Well, it does eventually wander over to WEASEL.
The strings are not at all limited to alpha characters.
Mung,
If you start with a population of two, pick one at random (which does not, incidentally, have to be a 50/50 chance for each; it just has to be probabilistic) and kill it, then breed from the other.
I think he intends them to be, with this line:
int random_ascii_number = rand() % (65+26-32)+32;
Leaving aside the fact that this doesn’t provide a uniform distribution, it doesn’t seem to allow for the space character, either.
Mung,
After all this discussion about Weasel, you still don’t understand my code?
It appears that you didn’t even read the first 20 lines, which include these:
#define DISPLAY_INTERVAL 1 // display results every n generations, where n == DISPLAY_INTERVAL
#define PAUSE_TIME 50000 // pause time in microseconds
#define GENOMES_TO_DISPLAY 1 // in STEP_MODE, number of (the fittest) genomes to display at each step
#define STEP_MODE 1 // if set, program will pause every n generations and display the genomes
Here’s a snapshot of the output just before the target is reached:
Target Phrase: METHINKS IT IS LIKE A WEASEL
Generation: 67
Selection: ON
Organism 0 Fitness 26
MEWHIFKS IT IS LIKE A WEASEL
It would seem more straightforward to define a character string
” ABCDEFGHIJKLMNOPQRSTUVWXYZ”
and use rand() % 27 to get an index to one of the characters.
[edited]
Thanks to all for the very helpful beta testing. I’ll review my code in light of the deficiency reports filed so far.
Other than a few excursions into javascript, I haven’t played with C code in 20 years.
I loved it at the time, but the rust has accumulated.
There are things here I don’t remember from K&R.
I strongly recommend that you write out your algorithm in prose, as I did when summarizing your code, before simply hacking more on it. Make it painfully clear exactly what you are modeling and how.
I also suggest that you switch to a language like Python. Your C would not pass code review on any team I’ve worked with.
Actually I looked at it and rejected it as a reasonable way to accomplish what I was looking for.
I don’t want to have to watch the screen and try to figure out if it’s converging by visually comparing characters. Why should I do manually what I ought to be able to have a computer do for me?
Consider the possibility that the person running the program does not know the target phrase in advance.
Consider the possibility that the program is not converging at all.
Consider the possibility that the program is converging but will take a very long time (assume no latching).
etc. etc.
So your program comes up short again.
Actually it’s kind of cool, which is why Sal did it that way too. It took the wizards at UD months to understand that WEASEL doesn’t latch. You can see “Not latching” in ten seconds if you display the intermediate strings.
Otherwise just an aesthetic choice.
Mung,
The program does exactly what you want. It’s just that you’re too dim to recognize it.
You don’t have to compare characters. The program redraws the screen after each DISPLAY_INTERVAL. You just look at one spot on the screen and watch the phrase converge to the target.
Are you telling me you haven’t even run the program?
See the “Fitness” field in the snapshot I gave you? Do you think you could look at that spot on the screen and determine whether the number was going up or down?
The program displays the target phrase at the top of the screen. Look at my snapshot.
Then the Fitness field will show a value that is not increasing toward 28.
Then you set DISPLAY_INTERVAL to a large value so that I/O and pause time don’t dominate.
I’m expecting the next question to be:
“weasel” is a great example of evolution by intelligent design. Unfortunately people like you will never come to realize that all GAs are examples of evolution by intelligent design.
“Disinherit the Wind” put “weasel” in its place. If Dawkins ever sees that play or reads the transcript it would put him over the edge
And I just told you I don’t want to have to do that.
File a complaint with the moderator.
On the other hand, Robert Frost could have written a poem about your concerns and called it, The Shit Not Given.
Mung:
keiths:
Mung:
Too difficult for you? Exceeds your attention span?