Randomness and evolution

Posted on January 6, 2014 by Allan Miller

Here’s a simple experiment one can actually try. Take a bag of M&M’s, and without peeking reach in and grab one. Eat it. Then grab another and return it to the bag with another one, from a separate bag, of the same colour. Give it a shake. I guarantee (and if you tell me how big your bag is I’ll have a bet on how long it’ll take) that your bag will end up containing only one colour. Every time. I can’t tell you which colour it will be, but fixation will happen.

This models the simple population process of Neutral Drift. Eating is death, duplication is reproduction, and the result is invariably a change in frequencies, right through to extinction of all but one type. You don’t have to alternate death and birth; choose any scheme you like short of peeking in the bag and being influenced by residual frequencies (ie: frequency-dependent Selection), and you will end up with all one colour.

Is Chance a cause here? Well … yes, in a sense it is, in the form of sample error. Survival and reproduction are basically a matter of sampling the genes of the previous generation. More random samples are a distortion of the larger population than aren’t, so, inexorably, your future populations will move away from any prior makeup, increasing some at the expense of others till only one variant remains.

Selection is a consistent bias upon this basic process. If different colours also differed a little in weight, say, more of some would be at the bottom of the bag than others, so you’d be more likely to pick one type than another. In more trials, the type more likely to be picked would be picked more often, to express it somewhat tautologously. You’d get a sampling bias.

Both of these processes are random – or stochastic, to use the preferred term. In reality, they are variations of the same process, with continuously varying degrees of bias from zero upwards. It makes no sense to call selection nonrandom, unless by ‘random’ you mean unbiased. Where there is no bias, all is Drift. But turning up the selective heat does not eliminate drift – sample error – and so does not eliminate stochasticity.

With a source of new variation, these processes render evolution inevitable. Even with a brand new mutation, with no selective advantage whatsoever, 1/Nth of the time (where N is the population size) it will become the sole survivor. That’s the baseline. If there is a selective advantage, it will be more likely and quicker to fix, on the average. If at a selective disadvantage, it will be less likely and slower.

Conversely, without a source of new variation, all existing variation would be squeezed out of the population, and evolution would stop.

650 thoughts on “Randomness and evolution”

olegt on January 14, 2014 at 4:12 pm said:

phoodoo: Yes, its much closer to reality. And why do you think the Wright Fisher model does it that way anyway? ,Because I am crazy and I don’t know what I am talking about, but now see once we can give it a name, like say Wright Fisher, well, now its a valid point about generation times, so now we can believe its true! It has a name afterall!

You’ve got it backwards, phoodoo. A model is not good because people gave it a proper name. They gave it a name because it is good.
phoodoo on January 14, 2014 at 4:12 pm said:

olegt: Oh, let’s not go there, IdoP. Phoodoo can’t make heads or tails of the Moran model. Throwing in a new model will only confuse the poor thing even more.

I wonder when your physics classes are going to get to the semester where it explains how four does not equal one?
phoodoo on January 14, 2014 at 4:16 pm said:

olegt: The two counts can easily be off by 10 generations or so. I have a cousin the same age as me.

And the further back you go, the more generations will disperse.

You have a cousin the same age as you?? Come on, get out of here! No way. That’s unbelievable! Its almost like you are from the same generation or something, God, that is weird.

I recommend you stick with physics. This whole generation thing is too too crazy. A cousin! How could that be?
olegt on January 14, 2014 at 4:17 pm said:

phoodoo: I wonder when your physics classes are going to get to the semester where it explains how four does not equal one?

I have already explained to you, phoodoo, why this comment is inane. You don’t understand the concept of physical units. Take a freshman physics course, that should take care of it.

I never said that 1 = 4. That would be silly. I said that 1 generation = 4 deaths (in a population of 4).

This is completely fine. It’s like saying that 1 foot = 12 inches. It is not the same as saying that 1 = 12.
Alan Fox on January 14, 2014 at 4:24 pm said:

phoodoo: And in fact, we do assume that a similar thing happens with humans. If we trace your ancestry back 100 generations, it would come out to roughly the same time as if you trace my ancestry back 100 generations.

T’ain’t necessarily so!

In that paper I linked to on calculating generation interval in humans, it varied significantly from culture to culture, “primitive” hunter-gather societies and third world populations were sampled. There’s also a difference between paternal and maternal generation interval. We might think that, say, old-stone-age populations may have had shorter generation times than currently.
olegt on January 14, 2014 at 4:25 pm said:

phoodoo: You have a cousin the same age as you?? Come on, get out of here! No way. That’s unbelievable! Its almost like you are from the same generation or something, God, that is weird.

Don’t understand the significance of that, phoodoo? Let me unpack it for you.

My grandmother had four children. My father was younger than his oldest sister by about 15 years. His sister’s grandson is the same age as me.

So in just two generations we have a difference of one generation.

If you go back N generations, you will find typical differences between generation numbers that scale as the square root of N. So tracing your ancestors 100 generations back will give discrepancies of plus or minus 10.

Coexistence of different generations in the Moran model (M&Ms) is a feature, not a bug.
Allan Miller on January 14, 2014 at 4:40 pm said:

phoodoo,

No Allan, in your model, if we began the experiment with a full bag of M&M’s existing at time zero, and then began doing your shuffling as described, with out calling it a new generation each time, what we end up with is M&M’s who now all are from completely different time zones.

No. They are all in the same time zone – they all exist at the same time. The tick of the clock is not determined by the generation time; generations occur against the background of a steadily ticking clock. Each individual traces a (slightly) different number of steps back to the original population, and therefore those steps are of different average time length.

Some get changed to six generations later, some to four, some to none, and if its a large bag, some can go to 100 generations later

“Yadda yadda yadda. Here are some numbers I pulled out my ass”. In This post, I gave an actual example of a typical variance and mean for an actual run. It does not accord with your intuition. And the reason is quite simple; it is vanishingly improbable that an individual will still be in the bag at the same time as its distant descendants are too. For g+1, quite probable, g+2, a bit less, g+3, a bit less still … it soon gets down to the negligible. Which is why the numbers I gave you – actual numbers from an actual run – do not accord with your intuition on variance.

You want more? I have loads.

-but when we look at the gene distribution, we are counting it as ONE snip of time, even though they have all changed.

The time doesn’t change equally for all the M&M’s because we can’t reset our clock for each new draw. Now they are Jesus Christ and Brad Pitt and George Washington all living together.

The ‘tick’ of the clock is fundamentally the time taken to perform the computer operations to remove a member and replicate another. That is constant per member and constant per N members. Each approximate generation occupies a time period of N ticks, so g generations is a multiple – each ‘population generation’ takes about the same time to complete; each individual generation is a little longer or shorter than the mean. Which is entirely what you’d expect, because time-to-reproduction varies. It must do so, even for one individual – offspring are produced serially. This is, in that respect, just like a ‘real’ population. And certainly insufficiently unlike a ‘real’ one to be invalidated. You really think, if I introduced age-related mortality and a juvenile stage that the behaviour of the model would just disappear? You don’t think someone might have noticed this? Bio-Complexity are gagging for content.
olegt on January 14, 2014 at 4:47 pm said:

Excellent point, Allan Miller!

In that example,

Allan Miller: Population of 1500 after fixation of new mutation:
Births/deaths: 1693584
Generations by approximation (births/deaths divided by N): 1129
Mean generations by counting: 1160
Minimum count: 1112
Maximum count: 1199

Here we have 1160 generations on average. We expect fluctuations in generation count of the order of the square root of that, which is 34. So generations should be roughly between 1160−34 = 1126 and 1160+34 = 1194. Lo and behold, that’s what Allan’s simulation yields.
DNA_Jock on January 14, 2014 at 5:19 pm said:

phoodoo: You have a cousin the same age as you?? Come on, get out of here!No way.That’s unbelievable!Its almost like you are from the same generation or something, God, that is weird.

Well my wife’s brother is younger than her nephew, and my grandmother married her step-uncle, giving my Dad half-cousins who are also his step-cousins once removed…I think.
Anyhoo.
Phoodoo comes up with his unique definition of “generation” = one iteration of Allan’s model by imagining that the other N – 1 M&M’s all die, leaving precisely one offspring. It’s a Wright-Fisher model, but the fecundity per generation is 1 + 1 / N, rather than the traditional 2. The other N – 1 M&M’s have all “aged” by one generation, in his mind. He thinks that one iteration corresponds to N births and N deaths, of which only one can have any effect whatsoever.

Pass the brain-bleach.
JonF on January 14, 2014 at 5:56 pm said:

phoodoo: You have a cousin the same age as you?? Come on, get out of here!No way.That’s unbelievable!Its almost like you are from the same generation or something, God, that is weird.

Most living cousins are not from the same generation as the people to which they are cousins. A cousin is someone with whom one shares an ancestor, often but far from always a child of an aunt or uncle, so if you go back far enough you everyone is your cousin. But it’s common to refer to people as your cousin if that person shares a known ancestor with you, usually a great-great-grandparent or someone more recent. I have a living cousin who is the granddaughter of my great-grandfather, so she is formally my first cousin once removed but I call her my cousin.
olegt on January 14, 2014 at 6:03 pm said:

I have posted a new thread Counting generations of M&Ms. Let’s move the discussion of generation counting there.
BK on January 14, 2014 at 11:03 pm said:

phoodoo: It roughly happens with humans, but it actually does happen with some species. How about salmon? Groups are all from the same generation

You’ll need to find a different species to support your assertion. In any given salmon stream/river you will find precocious spawners (jacks/jennies @ 2 yr old) and you will also find adults in the 3, 4 or 5 yr old class with some rivers having salmon returning in up to 6 or 7 years in addition to the earlier year-classes.
petrushka on January 15, 2014 at 1:05 am said:

JonF: Most living cousins are not from the same generation as the people to which they are cous

My father was the youngest of 12, and he was 35 when I was born. I have had first cousins who were thirty years older than me. Maybe more. Not so many now.
Piltdown2 on January 15, 2014 at 7:17 am said:

keiths,

My issue with the term “unguided macroevolution” is that when you step back and look at the general structure of nested hierarchy, it implies guidance, starting with small single cell organisms and branching up in a fairly orderly fashion into more and more complex organisms. Whereas an unguided random process would sometimes go up, sometimes down, sometimes sideways, sometimes backwards, may split here, may reunite here etc. The “tree of life” may no longer be the most appropriate metaphor for the current version of this nested hierarchy, but it still serves an analogous purpose. Would you go so far as to say the growth of a tree is unguided? Surely there are guiding principles at work.
Also, implying that my statement, that a design mechanism may be built into our cells, is analogous to angels pushing the moon and planets across the sky was unhelpful. Here’s a quote from the abstract of the 2008 article “Facilitated Variation: How Evolution Learns from Past Environments to Generalize to New Environments” by Parter, Dashtan, and Alon: “The key observation is that organisms are designed such that random genetic changes are channeled in phenotypic directions that are potentially useful.” (http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2563028/).
I hope this doesn’t come across as ungrateful, because I sincerely appreciate the time you and others, especially Allan Miller for his OP and subsequent comments, here at TSZ have taken to help shape my views and insist on critical thinking backed by research.
Piltdown2 on January 15, 2014 at 7:20 am said:

Neil Rickert: It is the question that directly addresses what biologists mean when they say that mutation is random.Shapiro is clearly agreeing with what they mean, but he does not like their use of the word “random” with that meaning.

Thanks. I’ll reserve comment (I’m sure these ideas will come up on a future post) until I’ve actually had a chance to read the book. And, petrushka, I should have posited my own version of an important question instead of simply questioning yours. It’s off-topic, but I might go along the lines of How did organisms develop the ability to generate novelty?
Zachriel on January 15, 2014 at 2:58 pm said:

Piltdown2: My issue with the term “unguided macroevolution” is that when you step back and look at the general structure of nested hierarchy, it implies guidance, starting with small single cell organisms and branching up in a fairly orderly fashion into more and more complex organisms.

That is a typical conflation of the term guided. The Earth is guided in its orbit by the Sun’s gravity, but nowadays, hardly anyone accounts for the angels who do the actual work. Guided as in a river in its channel, or guided as in the god Hapi causes the Nile to flood.

Piltdown2: Whereas an unguided random process would sometimes go up, sometimes down, sometimes sideways, sometimes backwards, may split here, may reunite here etc.

Integrated complexity at the beginning of life started at the floor, so there was nowhere to go but up. And change in complexity has certainly not been monotonic over evolutionary history. Parasites, for instance, often lose complexity by relying on their hosts for certain functions.

Piltdown2: Here’s a quote from the abstract of the 2008 article “Facilitated Variation: How Evolution Learns from Past Environments to Generalize to New Environments” by Parter, Dashtan, and Alon: “The key observation is that organisms are designed such that random genetic changes are channeled in phenotypic directions that are potentially useful.”

It’s easy to show in simulations that, in an oscillating environment, organisms will develop the ability to adapt to the changing environment. For instance, humans still have hair follicles over their bodies, so it is quite plausible they could evolve a thick fur, if the climate required it (absent clothes, of course).
Blas on January 16, 2014 at 3:32 pm said:

Sorry I´m late here but only today I had time to play with Mr. Felstein program.
I set the parameters in one case Population 100 and in the other 1000, initial frequency of the alele A I used .99 and .999 as I supposed “a” a neutral ramdom mutation of A and I used a migration rate between the ten independant population of 0.001 and 0.0001.
I run twelve time each conditions and only in one run some populations get close to the fixation of alele “a”.
Can anyone explain this to me?
Joe Felsenstein on January 16, 2014 at 3:50 pm said:

Blas,

It sounds like you are using our lab’s teaching program PopG. You have a high initial gene frequency of A, some migration among 10 populations of size 100 or of size 1000, and no mutation or selection.

With no selection, in this situation the probability of fixation of a would be its initial gene frequency. As the 10 populations are connected by migration, they will all fix for A or all fix for a. The probability of all fixing for a would be its initial gene frequency, which is 0.01 or 0.001.

It sounds as if that is what you see. So this is the expected result. Of the initial set of 10 populations, one copy of the gene ends up being the ancestor of every copy. The chance that it happens to be a copy with the a allele is then 0.01 (or 0.001).
Blas on January 16, 2014 at 3:54 pm said:

Joe Felsenstein,

Thank you.
phoodoo on January 16, 2014 at 3:58 pm said:

Blas:
Sorry I´m late here but only today I had time to play with Mr. Felstein program.
I set the parameters in one case Population 100 and in the other 1000, initial frequency of the alele A I used .99 and .999 as I supposed “a” a neutral ramdom mutationof A and I used a migration rate between the ten independant population of 0.001 and 0.0001.
I run twelve time each conditions and only in one run some populations get close to the fixation of alele “a”.
Can anyone explain this to me?

Blas,

I proposed a list of at least 7-10 examples of unrealistic findings which could possibly occur in this kind of sampling. Not a single poster here has shown how many times these aberrations specifically occured in say one round of complete fixation starting from zero and ending up with 100, or one round of 1000 (or how many times do they occur in just a pop. of 10) , without any alteration of the data whatsoever, and yet they summarize it by coming back here and saying, we averaged it, see, our program shows it is realistic.

So if you think they are going to acknowledge any deficiencies in any of their claims, I would say don’t hold your breath. I have given up on believing they have any desire to at all debate authentically.
petrushka on January 16, 2014 at 4:08 pm said:

Phoodoo: did you notice that Blas’s question was answered and that Blas acknowledged the answer?
olegt on January 16, 2014 at 4:12 pm said:

phoodoo: I proposed a list of at least 7-10 examples of unrealistic findings which could possibly occur in this kind of sampling. Not a single poster here has shown how many times these aberrations specifically occured in say one round of complete fixation starting from zero and ending up with 100, or one round of 1000 (or how many times do they occur in just a pop. of 10) , without any alteration of the data whatsoever, and yet they summarize it by coming back here and saying, we averaged it, see, our program shows it is realistic.

That is simply not true. Here is Allan Miller’s comment addressing these issues. And that’s only one example. People were more than willing to address the issues you kept throwing up.
OMagain on January 16, 2014 at 4:12 pm said:

phoodoo: ot a single poster here has shown how many times these aberrations specifically occured in say one round of complete fixation starting from zero and ending up with 100

If you stick around, you can help me build whatever features will clarify this into the v2 of my toy. E.G. I intend to make available the complete history of some/all of the generations.

Tell me what statistics you want to see! A complete history of every change at every time step? Sure!
Allan Miller on January 16, 2014 at 4:17 pm said:

phoodoo,

phoodoo,

Just to note that I have been manfully struggling against WordPress’s useless table handling to provide at least some of the stats you requested from the sim. I don’t have to do this, and I am aware that, when and if the stats do appear, they won’t be enough/notwhatyouasked for. The issue is not, as you gracelessly insinuate, secretiveness, bluffing or disinterest, but simply time and my own desire to be thorough. It’s a probabilistic process, and therefore simple statistics from single runs do not convey the distribution adequately. Or, if you prefer, the answer is 42.
petrushka on January 16, 2014 at 4:22 pm said:

I don’t believe this site allows tables, so you will have to host a page somewhere else and link to it.
Allan Miller on January 16, 2014 at 4:27 pm said:

petrushka,

Well, you can generate them, either using the paste-from-word button or using a 3rd party data-to-html converter. They just look crap.
Zachriel on January 16, 2014 at 6:05 pm said:

phoodoo: I proposed a list of at least 7-10 examples of unrealistic findings which could possibly occur in this kind of sampling.

Our simulator is working fairly well at this point, and we can create all sorts of tables of results. For what is phoodoo looking specifically?
JonF on January 16, 2014 at 6:19 pm said:

Allan Miller:
petrushka,

Well, you can generate them, either using the paste-from-word button or using a 3rd party data-to-html converter. They just look crap.

When I pasted clean and simple HTML table code from DreamWeaver’s code window each td appeared on a different line.
cubist on January 16, 2014 at 6:20 pm said:

phoodoo: Blas,
I proposed a list of at least 7-10 examples of unrealistic findings which could possibly occur in this kind of sampling.

Hi, phoodoo! I don’t know if you saw it, but in the “Counting generations of M&Ms” thread, I proposed a model which I thought would not display any of the “unrealistic findings” of which you’ve been complaining. Rather than force you to go searching thru that thread to find it, I’ll just cut-and-paste the thing here:

1: In this model, each M&M has its own unique ID label (which, being unique, cannot ever be associated with more than one M&M in the same run); its own color (which it may or may not share with however-many other M&Ms); and a list of its ancestors.
2: Each time the model is run, the guy who’s doing it sets up the initial conditions:
    2.i: Initial population. This is an integer. If phoodoo has any insights to give regarding the size of the initial population, we can take those insights into account when setting up the model; otherwise, let’s say the initial population can be anything from (picking arbitrary numbers out of my hat) 1 to 10,000.
    2.ii: Number of colors in the initial population. Another integer, whose starting value can be anywhere from 2 to 10, except if phoodoo has anything to say about the number of colors, in which case we again take phoodoo’s insights into account when setting up the model.
    2.iii: The distribution of colors. Are all colors evenly distributed, are there 4 times as many red M&Ms as there are green M&Ms, whatever. If you want to have 1 M&M of one color, and every other M&M be a second color shared in common, go for it.
    2.iv: Birth rate. This is a percentage, from 0.0% up to 100%. The way it works in this model is, we completely ignore the concept of ‘generations’. On each ‘tick’ of the ‘clock’, the model ‘rolls a die’ separately for each individual M&M. With a birth rate of 10%, M&M #1 has a 10% chance of reproducing; M&M #2 has a 10% chance of reproducing; M&M #3 has a 10% chance of reproducing; and so on, for each member of the M&M ‘breeding population’.
    2.v: Death rate. This, too, is a percentage, from 0.0% up to 100%. On each ‘tick’ of the ‘clock’, the model ‘rolls a die’ separately for each individual M&M. With a death rate of 10%, M&M #1 has a 10% chance of dying; M&M #2 has a 10% chance of dying; M&M #3 has a 10% chance of dying; and so on, for each member of the M&M ‘breeding population’.
    2.vi: Number of cycles you want the model to, er, cycle through before it stops running. This is an integer. It’s also an upper limit, not an absolute number; if the entire ‘breeding population’ becomes the same color at some point before this number, that’s when the model stops running.
3: After the initial conditions are defined, the model creates the defined number of M&Ms; assigns each M&M a unique ID label; assigns each M&M a color at random, in accordance with the defined number of colors and the defined distribution; and assigns each M&M a blank ‘list’ of ancestors.
4: Once the initial set of M&Ms has been created/defined, the ‘clock’ starts ‘ticking’. The following things occur on each ‘tick’ of the ‘clock’:
    4.i: The model stores the current list of M&Ms, including all their various ID labels, all their various colors, and all their various ancestries.
    4.ii: The model rolls its ‘birth-rate dice’ for each M&M in the current list. For each M&M whose die-roll said yep, this one reproduced, the model creates a new M&M. This new M&M has a unique ID label. Whatever color the ‘parent’ M&M was, the ‘child’ M&M has that color. Whatever ancestor-list the ‘parent’ M&M had, add the ‘parent’ M&M’s ID label to that list, and the ‘child’ M&M has that extended list.Each newly-created ‘child’ M&M is added to a list of This Year’s Kids.
    4.iii: The model rolls its ‘death-rate dice’ for each M&M in the current list. For each M&M whose die-roll said yep, this one’s dead, the model removes that M&M from the current list.
    4.iv: The list of This Year’s Kids is appended to the current list of M&Ms.
5: After all the births and deaths of the entire ‘breeding population’ of M&Ms have been accounted for, the model checks to see how many different colors exist in said ‘breeding population’; if there’s only one surviving color, the model stops running.
6: If the model has reached the cycle-number that was defined by the user, the model stops running.

How does this model satisfy your concerns, phoodoo?
JonF on January 16, 2014 at 6:25 pm said:

Paste from Word:

1A 1B
2A 2B

Paste code from DW:

1A 1B
2A 2B

Interesting, although both of those should have cell borders
Allan Miller on January 16, 2014 at 6:59 pm said:

JonF,

I was actually doing it in an OP, so behaviour may be different (we seem to have 3 different environments with subtle differences, one for OPs, one for comments and one for editing comments!). Managed to get a table of sorts, but it was nowhere near as nice as the Word source, and I ran out of patience fiddling. I’m surprised Word (or Excel) doesn’t have a ‘generate-HTML’ option. But I’m sure there are plenty of ways and means for experienced blogsmiths. I await phoodoo’s effusive thanks for the trouble I’ve gone to! 😉
cubist on January 16, 2014 at 7:18 pm said:

Generating HTML from MSWord and/or MSExcel: This should be doable—look for “Save as web page…” in the FIle menu, or try “Save as…” and select “Web page (HTML)” from the Format popup menu—but I wouldn’t recommend it, on two grounds. One, MS-generated HTML tends to be bloated with excessive quantities of “class=yada-yada” tags and the like, so the file-size ends up a lot bigger than it needs to be. Two, it’s not clear that TSZ’s software knows what to do with table-related tags in HTML (we shall see what happens with that newfangled TablePress thingie).

I’m not sure what experiments you’ve tried, Alan, but if all else fails, there’s always monospaced fonts. Looks like the TSZ environment uses the code tag to indicate hey, use a monospaced font here! Like so:

+––––+–––––––+ | 1 | 1 | +––––+–––––––+ | 2 | 8 | +––––+–––––––+ | 4 | 64 | +––––+–––––––+ | 8 | 512 | +––––+–––––––+ | 16 | 4,096 | +––––+–––––––+
Oddly, I had to use non-breaking spaces (on a Mac, you get them by typing command-space; not sure about Win and/or Linux) and en-dashes (on a Mac, type option-dash; dunno how to get them on other OSes) to get everything lined up nice and neat.
Alan Fox on January 16, 2014 at 7:32 pm said:

Allan Miller,

I’ve just added TablePress as a plugin. I don’t know if it is any use. Anyone with author status can use it in the OP editor.

ETA anyone who wants to use the OP Editor and hasn’t got author status, just ask and I’ll change the status.
OMagain on January 16, 2014 at 7:45 pm said:

It might be easier to take a screenshot and link to that or put it in a OP then format tables in the comment box!
Alan Fox on January 16, 2014 at 7:48 pm said:

cubist: (we shall see what happens with that newfangled TablePress thingie).

It seems to work in OPs and suggests it might work in a comment, too. I’ll see.

[table id=2 /]

ETA Looks as if it will only work in an OP
Alan Fox on January 16, 2014 at 8:29 pm said:

OMagain,

TablePress will import files in CSV, HTML, JSON, XLS and XLSX formats.
JonF on January 16, 2014 at 8:55 pm said:

Allan Miller:
JonF,

I was actually doing it in an OP, so behaviour may be different (we seem to have 3 different environments with subtle differences, one for OPs, one for comments and one for editing comments!). Managed to get a table of sorts, but it was nowhere near as nice as the Word source, and I ran out of patience fiddling. I’m surprised Word (or Excel) doesn’t have a ‘generate-HTML’ option. But I’m sure there are plenty of ways and means for experienced blogsmiths. I await phoodoo’s effusive thanks for the trouble I’ve gone to!

Both can save as (horrible) HTML. That’s why many HTML editors include a “clean Word HTML” function.
Joe Felsenstein on January 16, 2014 at 9:14 pm said:

phoodoo: Blas,

I proposed a list of at least 7-10 examples of unrealistic findings which could possibly occur in this kind of sampling.Not a single poster here has shown how many times these aberrations specifically occured in say one round of complete fixation starting from zero and ending up with 100, or one round of 1000 (or how many times do they occur in just a pop. of 10) , without any alteration of the data whatsoever, and yet they summarize it by coming back here and saying, we averaged it, see, our program shows it is realistic.

So if you think they are going to acknowledge any deficiencies in any of their claims, I would say don’t hold your breath.I have given up on believing they have any desire to at all debate authentically.

Blas’s question was not about the M&M simulation. It was about my population genetics simulation program PopG (which, by the way, simulates a diploid Wright-Fisher discrete-generations model).

Do you have some arguments about the validity of results from PopG?
Allan Miller on January 17, 2014 at 10:19 am said:

cubist,

Just a couple of observations (there is nothing to stop anyone building any model they wish).

1: In this model, each M&M has its own unique ID label (which, being unique, cannot ever be associated with more than one M&M in the same run); its own color (which it may or may not share with however-many other M&Ms); and a list of its ancestors.

The unique labelling is enough to satisfy the basic criterion – ‘can a single ‘black’ M&M fix’? If a unique label fixes, then its ‘colour’ (or whatever else represents phenotype) must do also. Creating sets with the same colour should be less satisfactory for phoodoo – the extreme is a single representative taking over entirely.

2: Each time the model is run, the guy who’s doing it sets up the initial conditions:
2.i: Initial population. This is an integer. If phoodoo has any insights to give regarding the size of the initial population, we can take those insights into account when setting up the model; otherwise, let’s say the initial population can be anything from (picking arbitrary numbers out of my hat) 1 to 10,000.

In some languages (eg VBA – thumbs nose at Patrick!) – you can make your array open-ended (within language limits), and dimension it at runtime based on parameter.

2.ii: Number of colors in the initial population. Another integer, whose starting value can be anywhere from 2 to 10, except if phoodoo has anything to say about the number of colors, in which case we again take phoodoo’s insights into account when setting up the model.

As I say, I’d make the number of ‘colours’ N – the population size – and actually use unique label as a proxy for colour. The population will pass through a stage (many stages) where it has created ‘pools’ of particular labels all by itself, by increasing the representation of some original members at the expense of others. The process essentially runs ‘in the dark’. We (and any potential selective agent) only see the colours – and labels – when we turn the lights on.

2.iii: The distribution of colors. Are all colors evenly distributed, are there 4 times as many red M&Ms as there are green M&Ms, whatever. If you want to have 1 M&M of one color, and every other M&M be a second color shared in common, go for it.

Phoodoo should be satisfied with nothing less than every colour unique, and nominating a single one as the mutant, if the impossibility of fixing a new mutation is a concern. But either way you can separate the colour table from the model – it does not need to keep track. Before you start, you can declare that 1-255 is green, 256-891 is red, etc. When you turn the lights on, and 213 is fixed, it’s green.

2.iv: Birth rate. This is a percentage, from 0.0% up to 100%. The way it works in this model is, we completely ignore the concept of ‘generations’. On each ‘tick’ of the ‘clock’, the model ‘rolls a die’ separately for each individual M&M. With a birth rate of 10%, M&M #1 has a 10% chance of reproducing; M&M #2 has a 10% chance of reproducing; M&M #3 has a 10% chance of reproducing; and so on, for each member of the M&M ‘breeding population’.

I feel that would be process-heavy to no effect. Distributing a birth rate evenly amongst the population resolves to generating the same 1/N chance per member as if you’d just thrown one N-sided die once to get the next breeder, but in a more fiddly manner. Anything else (if I’ve misinterpreted the method) is a kind of bias – individuals examined first have a different chance than those examined later***, because serial positional examination isn’t … ummm … random. It’s either 1/N or it’s biased.

*** meaning that if you always start from the same end, those offspring have entered the population and affect its size as seen by the later ones.

2.v: Death rate. This, too, is a percentage, from 0.0% up to 100%. On each ‘tick’ of the ‘clock’, the model ‘rolls a die’ separately for each individual M&M. With a death rate of 10%, M&M #1 has a 10% chance of dying; M&M #2 has a 10% chance of dying; M&M #3 has a 10% chance of dying; and so on, for each member of the M&M ‘breeding population’.

Ditto

2.vi: Number of cycles you want the model to, er, cycle through before it stops running. This is an integer. It’s also an upper limit, not an absolute number; if the entire ‘breeding population’ becomes the same color at some point before this number, that’s when the model stops running.

An evolutionist programmer with the courage of their convictions would not have a ‘halting’ parameter other than fixation of an original singleton! The argument is that fixation (of ancestry, if not of the unmutated version of that ancestor’s allele) is inevitable. Phoodoo seemed to be arguing, among other things, that certain runs should loop indefinitely.
Patrick on January 17, 2014 at 3:59 pm said:

In some languages (eg VBA – thumbs nose at Patrick!) – you can make your array open-ended (within language limits), and dimension it at runtime based on parameter.

In some languages *cough*Common Lisp*cough* you can make your array open-ended (within the resources available on your machine) and increase its dimensions on the fly.

😉
Allan Miller on January 17, 2014 at 4:27 pm said:

Patrick,

In that, then, they are on a par! You can issue ReDim when you like, as often as you like. It’s about 10% slower than a fixed-dimension array, though, per operation upon it.

(I’m not, of course, holding vba up as the ideal tool – its random number and clock facilities make it particularly crap for this kind of analysis. I found some strange cyclic behaviour in the pseudorandom function on large iterations (even below the hex ‘ffffff’ at which it repeats the previous cycle precisely), which I shook it out of by some additional reseeding shenanigans. But I don’t have another language at home, and it would have the virtue (if phoodoo didn’t keep just staring at the ball when passed it) of being transferable to other home users.)
DNA_Jock on January 17, 2014 at 5:10 pm said:

Allan Miller:
(I’m not, of course, holding vba up as the ideal tool – its random number and clock facilities make it particularly crap for this kind of analysis. I found some strange cyclic behaviour in the pseudorandom function on large iterations (even below the hex ‘ffffff’ at which it repeats the previous cycle precisely), which I shook it out of by some additional reseeding shenanigans. But I don’t have another language at home, and it would have the virtue (if phoodoo didn’t keep just staring at the ball when passed) of being transferable to other home users.)

I had a very similar experience. When I had two runs-to-fixation come out identical, I immediately concluded design.
No, that’s not right. Rather than conclude that a noodley appendage had inserted itself into my stochastic process, I reckoned “Wow! That sure is one crappy PRNG”, and resorted to re-seeding “shenanigans” too.
I love the mental image I have of phoodoo staring at the ball – it’s like a bowling ball, and he is just sitting, staring intently, waiting for it to do something. Thank you for that. 🙂
Allan Miller on January 17, 2014 at 5:43 pm said:

DNA_Jock,

Yep, my issue was highlighted by a curious over-representation of fixation for members N/4 apart, with a ‘supercycle’ at N/2 – but only for values of N that were divisible by 4, and sufficient trials to reveal the cyclic pattern. The irony being that you can use departure from mathematical expectation to check the program you’ve been running to prove to someone that the maths is solid! So the next accusation would be “circular!”.
Walter Kloover on January 17, 2014 at 8:19 pm said:

I think that Phoodoo understands how the simulation works. Xe thinks the simulation would be closer to reality if in each trial, instead of choosing one candy to reproduce in each trial, all the candies left in the bag reproduced in each trial, with one of them reproducing twice. Then each trial would result in a new generation in each lineage (except for the one that was eliminated). And that would make the definition of generation match the common usage of the word.

The original simulation allows for huge variations in the current generation numbers among the lineages. That is somewhat different from reality. Phoodoo thinks that is so different from reality that it makes the simulation inappropriate.

Xis version of the simulation keeps all the generation number the same for each lineage (each trial replaces all 10 candies). That is also different from reality.

I’m no expert, but I think the results of his version would be essentially the same as the results from the version described in the OP, so that it doesn’t really matter which version of the simulation you use (for purposes of demonstrating the inevitability of fixation due to drift).

On a slightly different aspect of the topic, even if a particular neutral mutation does not fix, it may remain in the population for some time, and therefore be available as a potentiating mutation for some later mutation, as in the citrate eating bacteria, right?
Allan Miller on January 18, 2014 at 10:20 am said:

Walter Kloover,

The original simulation allows for huge variations in the current generation numbers among the lineages. That is somewhat different from reality. Phoodoo thinks that is so different from reality that it makes the simulation inappropriate.

I think more accurately it has the intuitive potential for huge variations. I ran 200 replicates of N=10000, counting actual generations, and recording mean, min and max for each run. The maximum variation between lineages was 7% of the mean, the minimum 0.5%. *** Sorting these spreads in ascending order gave a plot with a somewhat sigmoidal shape – the lowest 10 and the highest 30 had a steeper between-run gradient than the bulk (left-hand graph in the link below, plotting spread for each run). Having thus sorted, the narrowest spreads were associated with the longest runs (right-hand graph, mean generation count to fixation for each of those runs). One can debate cause-and-effect – is fixation quicker because the spreads are greater, or are the spreads greater because the runs are shorter? I’d say the latter.

***(Of course the spread of a run is too simple a statistic in itself. There will be a distribution about the mean; the spread is determined entirely by outliers).

[Plots]

I’m no expert, but I think the results of his version would be essentially the same as the results from the version described in the OP, so that it doesn’t really matter which version of the simulation you use (for purposes of demonstrating the inevitability of fixation due to drift).

I think that’s the case. What is described is the baseline process – you vary N as minimally as possible, 1 in and 1 out, and you apply a completely impartial method to locate both the next death and the next birth. This is the slowest it can run. If there is preference for some alleles over others, fixation occurs more quickly, whichever way you shift that balance. If you allow N to vary more widely, the rate is more conditioned by the minima. If you restrict breeding to a fraction of an individual’s residence time, again you reduce the effective population size (Ne).

On a slightly different aspect of the topic, even if a particular neutral mutation does not fix, it may remain in the population for some time, and therefore be available as a potentiating mutation for some later mutation, as in the citrate eating bacteria, right?

That’s the idea, yes. Although drift and selection act against variation, mutation and simultaneous processes of fixation leave a standing pool of variation in any population. While it is around, it can lead to further change, and/or become adaptively favoured in itself.
Joe Felsenstein on January 18, 2014 at 1:37 pm said:

I agree that there is no real difference between having one M&M replaced at a time, and a version that has that happen but also declares all the other M&Ms to have replaced themselves. It’s still the same Moran model. The transition probabilities are the same either way, so the behavior of the process is the same.
petrushka on January 18, 2014 at 4:31 pm said:

What would happen if you ran this somewhat like WEASEL, but without selection? On each virtual clock tick, every member of the population undergoes some probability of mutation. There would be no ambiguity about generations, because every member would be replaced (most buy unmutated versions).
Joe Felsenstein on January 18, 2014 at 5:46 pm said:

petrushka:
What would happen if you ran thissomewhat like WEASEL, but without selection? On each virtual clock tick, every member of the population undergoes some probability of mutation. There would be no ambiguity about generations, because every member would be replaced (most buy unmutated versions).

For this sort of Moran model, that would be one standard way of putting mutation into the model. You could alternatively mutate only the newborn M&N (of course with a correspondingly higher mutation rate). These two methods would not be exactly equivalent to each other, but they wouild be close fpr large N.
Allan Miller on January 18, 2014 at 5:55 pm said:

petrushka,

I don’t think a ‘neutral WEASEL’ would result in fixation, except trivially every generation. It is effectively a genome with 28 loci. Each ‘generation’ is a set of offspring – with a rather high mutation rate! – from which only one survives. N oscillates from 1 to however-many-offspring to 1 …

It’s an illustration of the power of hill-climbing, rather than population sampling.
Allan Miller on January 18, 2014 at 6:12 pm said:

I don’t think ‘generation’ is particularly ambiguous anyway, unless one thinks that every zygote-zygote interval should be of the same length. It’s a random variable, and hence a population of lineages will experience different counts in the same unit time.