Evo-Info 3: Evolution is not search

Introduction to Evolutionary Informatics, by Robert J. Marks II, the “Charles Darwin of Intelligent Design”; William A. Dembski, the “Isaac Newton of Information Theory”; and Winston Ewert, the “Charles Ingram of Active Information.” World Scientific, 332 pages.
Classification: Engineering mathematics. Engineering analysis. (TA347)
Subjects: Evolutionary computation. Information technology–Mathematics.

Marks, Dembski, and Ewert open Chapter 3 by stating the central fallacy of evolutionary informatics: “Evolution is often modeled by as [sic] a search process.” The long and the short of it is that they do not understand the models, and consequently mistake what a modeler does for what an engineer might do when searching for a solution to a given problem. What I hope to convey in this post, primarily by means of graphics, is that fine-tuning a model of evolution, and thereby obtaining an evolutionary process in which a maximally fit individual emerges rapidly, is nothing like informing evolution to search for the best solution to a problem. We consider, specifically, a simulation model presented by Christian apologist David Glass in a paper challenging evolutionary gradualism à la Dawkins. The behavior on exhibit below is qualitatively similar to that of various biological models of evolution.

Animation 1. Parental populations in the first 2000 generations of a run of the Glass model, with parameters (mutation rate .005, population size 500) tuned to speed the first occurrence of maximum fitness (1857 generations, on average), are shown in orange. Offspring are generated in pairs by recombination and mutation of heritable traits of randomly mated parents. The fitness of an individual in the parental population is, loosely, the number of pairs of offspring it is expected to leave. In each generation, the parental population is replaced by surviving offspring. Which of the offspring die is arbitrary. When the model is modified to begin with a maximally fit population, the long-term regime of the resulting process (blue) is the same as for the original process. Rather than seek out maximum fitness, the two evolutionary processes settle into statistical equilibrium.

Figure 1. The two bar charts, orange (Glass model) and blue (modified Glass model), are the mean frequencies of fitnesses in the parental populations of the 998,000 generations following the 2,000 shown in Animation 1. The mean frequency distributions approximate the equilibrium distribution to which the evolutionary processes converge. In both cases, the mean and standard deviation of the fitnesses are 39.5 and 2.84, respectively, and the average frequency of fitness 50 is 0.0034. Maximum fitness occurs in only 1 of 295 generations, on average.

I should explain immediately that an individual organism is characterized by 50 heritable traits. For each trait, there are several variants. Some variants contribute 1 to the average number offspring pairs left by individuals possessing them, and other variants contribute 0. The expected number of offspring pairs, or fitness, for an individual in the parental population is roughly the sum of the 0-1 contributions of its 50 traits. That is, fitness ranges from 0 to 50. It is irrelevant to the model what the traits and their variants actually are. In other words, there is no target type of organism specified independently of the evolutionary process. Note the circularity in saying that evolution searches for heritable traits that contribute to the propensity to leave offspring, whatever those traits might be.

The two evolutionary processes displayed above are identical, apart from their initial populations, and are statistically equivalent over the long term. Thus a general account of what occurs in one of them must apply to both of them. Surely you are not going to tell me that a search for the “target” of maximum fitness, when placed smack dab on the target, rushes away from the target, and subsequently finds it once in a blue moon. Hopefully you will allow that the occurrence of maximum fitness in an evolutionary process is an event of interest to us, not an event that evolution seeks to produce. Again, fitness is not the purpose of evolution, but instead the propensity of a type of organism to leave offspring. So why is it that, when the population is initially full of maximally fit individuals, the population does not stay that way indefinitely? In each generation, the parental population is replaced with surviving offspring, some of which are different in type (heritable traits) from their parents. The variety in offspring is due to recombination and mutation of parental traits. Even as the failure of parents to leave perfect copies of themselves contributes to the decrease of fitness in the blue process, it contributes also to the increase of fitness in the orange process.

Both of the evolutionary processes in Animation 1 settle into statistical equilibrium. That is, the effects of factors like differential reproduction and mutation on the frequencies of fitnesses in the population gradually come into balance. As the number of generations goes to infinity, the average frequencies of fitnesses cease to change (see “Wright, Fisher, and the Weasel,” by Joe Felsenstein). More precisely, the evolutionary processes converge to an equilibrium distribution, shown in Figure 1. This does not mean that the processes enter a state in which the frequencies of fitnesses in the population stay the same from one generation to the next. The equilibrium distribution is the underlying change­less­ness in a ceaselessly changing population. It is what your eyes would make of the flicker if I were to increase the frame rate of the animation, and show you a million generations in a minute.

Animation 2. As the mutation rate increases, the equilibrium distribution shifts from right to left, which is to say that the long-term mean fitness of the parental population decreases. The variance of the fitnesses (spread of the equilibrium distribution) increases until the mean reaches an intermediate value, and then decreases. Note that the fine-tuned mutation rate .005 ≈ 10–2.3 in Figure 1.

Let’s forget about the blue process now, and consider how the orange (randomly initialized) process settles into statistical equilibrium, moving from left to right in Animation 1. The mutation rate determines

  1. the location and the spread of the equilibrium distribution, and also
  2. the speed of convergence to the equilibrium distribution.

Animation 2 makes the first point clear. In visual terms, an effect of increasing the mutation rate is to move equilibrium distribution from right to left, placing it closer to the distribution of the initial population. The second point is intuitive: the closer the equilibrium distribution is to the frequency distribution of the initial population, the faster the evolutionary process “gets there.” Not only does the evolutionary process have “less far to go” to reach equilibrium, when the mutation rate is higher, but the frequency distribution of fitnesses changes faster. Animation 3 allows you to see the differences in rate of convergence to the equilibrium distribution for evolutionary processes with different mutation rates.

Animation 3. Shown are runs of the Glass model with mutation rate we have focused upon, .005, doubled and halved. That is,  = 2 ⨉ .005 = .01 for the blue process, and  = 1/2 ⨉ .005 = .0025 for the orange process.

An increase in mutation rate speeds convergence to the equilibrium distribution, and reduces the mean frequency of maximum fitness.

I have selected a mutation rate that strikes an optimal balance between the time it takes for the evolutionary process to settle into equilibrium, and the time it takes for maximum fitness to occur when the process is at (or near) equilibrium. With the mutation rate set to .005, the average wait for the first occurrence of maximum fitness, in 1001 runs of the Glass model, is 1857 generations. Over the long term, maximum fitness occurs in about 1 of 295 generations. Although it’s not entirely accurate, it’s not too terribly wrong to think in terms of waiting an average of 1562 generations for the evolutionary process to reach equilibrium, and then waiting an average of 295 generations for a maximally fit individual to emerge. Increasing the mutation rate will decrease the first wait, but the decrease will be more than offset by an increase in the second wait.

Figure 2. Regarding Glass’s algorithm (“Parameter Dependence in Cumulative Selection,” Section 3) as a problem solver, the optimal mutation rate is inversely related to the squared string length (compare to his Figure 3). We focus on the case of string length (number of heritable traits) L = 50, population size N = 500, and mutation rate  = .005, with scaled mutation rate uʹ L2 = 12.5 ≈ 23.64. The actual rate of mutation, commonly denoted u, is 26/27 times the rate reported by Glass. Note that each point on a curve corresponds to an evolutionary process. Setting the parameters does not inform the evolutionary search, as Marks et al. would have you believe, but instead defines an evolutionary process.

Figure 2 provides another perspective on the point at which changes in the two waiting times balance. In each curve, going from left to right, the mutation rate is increasing, the mean fitness at equilibrium is decreasing, and the speed of convergence to the equilibrium distribution is increasing. The middle curve (L = 50) in the middle pane (N = 500) corresponds to Animation 2. As we slide down the curve from the left, the equilibrium distribution in the animation moves to the left. The knee of the curve is the point where the increase in speed of convergence no longer offsets the increase in expected wait for maximum fitness to occur when the process is near equilibrium. The equilibrium distribution at that point is the one shown in Figure 1. Continuing along the curve, we now climb steeply. And it’s easy to see why, looking again at Figure 1. A small shift of the equilibrium distribution to the left, corresponding to a slight increase in mutation rate, greatly reduces the (already low) incidence of maximum fitness. This brings us to an important question, which I’m going to punt into the comments section: why would a biologist care about the expected wait for the first appearance of a type of organism that appears rarely?

You will not make sense of what you’ve seen if you cling to the misconception that evolution searches for the “target” of maximally fit organisms, and that I must have informed the search where to look. What I actually did, by fine-tuning the parameters of the Glass model, was to determine the location and the shape of the equilibrium distribution. For the mutation rate that I selected, the long-term average fitness of the population is only 79 percent of the maximum. So I did not inform the evolutionary process to seek out individuals of maximum fitness. I selected a process that settles far away from the maximum, but not too far away to suit my purpose, which is to observe maximum fitness rapidly. If my objective were to observe maximum fitness often, then I would reduce the mutation rate, and expect to wait longer for the evolutionary process to settle into equilibrium. In any case, my purpose for selecting a process is not the purpose of the process itself. All that the evolutionary process “does” is to settle into statistical equilibrium.

Sanity check of some claims in the book

Unfortunately, the most important thing to know about the Glass model is something that cannot be expressed in pictures: fitness has nothing to do with an objective specified independently of the evolutionary process. Which variants of traits contribute 1 to fitness, and which contribute 0, is irrelevant. The fact of the matter is that I ignore traits entirely in my implementation of the model, and keep track of 1s and 0s instead. Yet I have replicated Glass’s results. You cannot argue that I’ve informed the computer to search for a solution to a given problem when the solution simply does not exist within my program.

Let’s quickly test some assertions by Marks et al. (emphasis added by me) against the reality of the Glass model.

There have been numerous models proposed for Darwinian evolution. […] We show repeatedly that the proposed models all require inclusion of significant knowledge about the problem being solved. If a goal of a model is specified in advance, that’s not Darwinian evolution: it’s intelligent design. So ironically, these models of evolution purported to demonstrate Darwinian evolution necessitate an intelligent designer.

Chapter 1, “Introduction”


[T]he fundamentals of evolutionary models offered by Darwinists and those used by engineers and computer scientists are the same. There is always a teleological goal imposed by an omnipotent programmer, a fitness associated with the goal, a source of active information …, and stochastic updates.

Chapter 6, “Analysis of Some Biologically Motivated Evolutionary Models”


Evolution is often modeled by as [sic] a search process. Mutation, survival of the fittest and repopulation are the components of evolutionary search. Evolutionary search computer programs used by computer scientists for design are typically teleological — they have a goal in mind. This is a significant departure from the off-heard [sic] claim that Darwinian evolution has no goal in mind.

Chapter 3, “Design Search in Evolution and the Requirement of Intelligence”

My implementation of the Glass model tracks only fitnesses, not associated traits, so there cannot be a goal or problem specified independently of the evolutionary process.

Evolutionary models to date point strongly to the necessity of design. Indeed, all current models of evolution require information from an external designer in order to work. All current evolutionary models simply do not work without tapping into an external information source.

Preface to Introduction to Evolutionary Informatics


The sources of information in the fundamental Darwinian evolutionary model include (1) a large population of agents, (2) beneficial mutation, (3) survival of the fittest and (4) initialization.

Chapter 5, “Conservation of Information in Computer Search”

The enumerated items are attributes of an evolutionary process. Change the attributes, and you do not inform the process to search, but instead define a different process. Fitness is the probabilistic propensity of a type of organism to leave offspring, not search guidance coming from an “external information source.” The components of evolution in the Glass model are differential reproduction of individuals as a consequence of their differences in heritable traits, variety in the heritable traits of offspring resulting from recombination and mutation of parental traits, and a greater number of offspring than available resources permit to survive and reproduce. That, and nothing you will find in Introduction to Evolutionary Informatics, is a fundamental Darwinian account.

1,439 thoughts on “Evo-Info 3: Evolution is not search

  1. DNA_Jock: Mung, the black bag and the white bag represent alternative universes in which eye evolution is more or less unlikely.

    One in universe eye evolution is less likely,. in the other universe eye evolution is more likely. How many universes do we need to sample before we know the probabilities associated with eye evolution?

    Is anyone, at any time, going to put some actual numbers [as opposed to imaginary numbers] to eye evolution?

    And I really like how you managed to assume your conclusion, by the way. I’m thinking you didn’t even notice it.

    How did you decide which bag represents our universe? Why, the one with the greater proportion of eyes is our universe, obviously!

    How do you know that our eye sampling actually gives us a relative frequency? Relative to what? Relative to everything not an eye? I say you get more “not an eye” from your bag. Our universe is analogous to your black bag. x% is relatively small.

    You can of course manufacture a scenario, as you have done, in which the proportion favors the conclusion you’re trying to reach. But that does not tell us that it is easy to evolve an eye.

  2. phoodoo: Because you see DNA, what you have actually done, by taking out 90 objects, is to sample .000000000000000000000 0000000000000000000
    0000000000000000000 0000000000000000000 0000000000000000000000000000000000000000000000000000000000000000 0000000000000000000 00000000000000000000000000000001 (I left out 10*486 zeros to save space) % of the entire contents of the bag.

    And how do you know the number of objects in the bag?

  3. phoodoo: Because you see DNA, what you have actually done, by taking out 90 objects…

    I think he’s assuming his sampling is random. But the bag is full of water and the balls with eyes just happen to be such that they float to the top where his short little arms can just manage reach them.

    The proportion of eyes in our universe is greater … compared to what? The proportion of eyes in some other universe?

    DNA_Jock: What we are interested in, in this analogy, is the proportions of different types of balls in the bag, the relative frequencies. Let’s recognize that the bag is very, very big compared with the size of our sample.

    And how do we know that the proportion of balls is greater in one bag than in the other? We declare it to be so! Take that IDiots!

    ok, so DNA-Jock had a “Rumraket moment” too. What is it about eye evolution that makes evolutionists go all loopy? Ever since Darwin.

  4. newton: And how do you know the number of objects in the bag?

    You don’t. You’re making an inference about relative frequencies based upon your sampling.

  5. Just so we’re clear, the issue isn’t over whether or not we can infer relative frequencies or proportions from drawing samples. That’s not the problem with Rumraket’s argument.

  6. Mung: I think he’s assuming his sampling is random.

    Right, but even if it is totally random, if your sample size is too small, it is meaningless. Change his numbers from taking out 90, to only taking out two. If it is still just a minuscule fraction of the entire contents of the bag, how can you draw any conclusions. If there are 10 hundred, trillion, billion gazillion balls in a bag, and you only pull out 90, you have taken out so few, that its the same as only taken out 1-only worse.

  7. Mung: Just so we’re clear, the issue isn’t over whether or not we can infer relative frequencies or proportions from drawing samples.

    But that’s also part of the problem. Until you know what percentage you have sampled, you know nothing.

    In DNA’s example, we never know what percentage we have sampled. The percentage could be essentially zero.

  8. I’ll borrow from DNA_Jock:

    What we are interested in is the proportions of different types of structures in the bag, the relative frequencies. Let’s recognize that the bag is very, very big compared with the size of our sample. Somewhere in the bag are eyes.

    What is the probability that if we reach in and draw from the bag that we’ll get an eye? It’s pretty high really. Because over here to our left, there are lots of eyes. And we drew those from the same bag.

    What else did you draw from the bag, and how many were there of those?

    Erm, well, that’s not really relevant.

    Then how do you calculate the relative frequency, the proportion?

    Well, see, over here, I have this analogy …

    ETA: Great analogy! Now once more, how did you decide that the probability of drawing an eye from the bag is pretty high?

    Just count the number of eyes!

    ok, but don’t you need to count the times you drew something that was not an eye from the bag?

    No, not really. You see, I have this analogy …

  9. I would like to thank DNA_Jock for helping clarify the exact nature of the problem with Rumraket’s claims about eye evolution. He has absolutely no clue what the relative frequencies are.

    He has no idea whether he’s likely to get another eye on the next draw. The sheer number of eyes alone, that he drew in the past, don’t tell him anything about the probabilities associated with his next draw.

    Once again I’ve been vindicated. My claims justified.

    #EyeMockStupidity

  10. Mung: You don’t. You’re making an inference about relative frequencies based upon your sampling.

    Just wondered where phoo came up with his number. Perhaps it was a hypothetical

  11. newton: Just wondered where phoo came up with his number. Perhaps it was a hypothetical

    What do you mean? There is no number in DNA’s example, that’s part of the problem. He has taken out a few balls, the numbers are meaningless, because we don’t know how many balls are inside. What if he only took out one ball? Maybe there was 10 balls inside, all green but one, and we took the only none green one. How could we ever know? Maybe there were two inside, and we took the red one.

    Its meaningless. And this is from our resident so called math expert. This is how our next generation is supposed to be smarter?

  12. phoodoo: But that’s also part of the problem. Until you know what percentage you have sampled, you know nothing.

    So opinion polls are a waste of time?

    Seems to me, if you poll a representative sample, you do learn something.

    ETA, when sampling balls from a bag, shake the bag first.

  13. Mung:
    Just so we’re clear, the issue isn’t over whether or not we can infer relative frequencies or proportions from drawing samples. That’s not the problem with Rumraket’s argument.

    So what is the problem with my argument? Please correctly state my argument, preferrably by quoting it (as opposed to making up a stupid caricature of it), and then explain what the problem with it is.

    I predict you will fail on both accounts.

  14. Mung,

    What is it about eye evolution that makes evolutionists go all loopy? Ever since Darwin.

    Another interesting question: what is it about evolution that makes Creationists get all giddy about comparatively simple matters such as probability?

  15. phoodoo,

    If there are 10 hundred, trillion, billion gazillion balls in a bag, and you only pull out 90, you have taken out so few, that its the same as only taken out 1-only worse.

    I think you should show your working on this. Your conclusion is wrong. Particularly that ‘only worse’ addendum. Sampling 90 is ‘worse’ than sampling 1? For what, or whom?

  16. A Rumraket moment is, apparently, a moment where you say something correct that Mung & Phoodoo utterly fail to find an actual fault with. Not that this is something to be proud about, they couldn’t find their way out of a phone booth with in broad daylight, on a guided tour, with a map and compass.

    It might really be that I need “to be taken down a notch”, but Mung & Puddidoo just aren’t the people for the job.

  17. Mung,

    He has no idea whether he’s likely to get another eye on the next draw. The sheer number of eyes alone, that he drew in the past, don’t tell him anything about the probabilities associated with his next draw.

    So you seem to be arguing that given a bag containing unknown numbers and frequencies, no amount of prior draws will give us any information about the probability of drawing an X? That does not seem correct to me.

  18. Mung: phoodoo: Because you see DNA, what you have actually done, by taking out 90 objects…

    I think he’s assuming his sampling is random. But the bag is full of water and the balls with eyes just happen to be such that they float to the top where his short little arms can just manage reach them.

    The proportion of eyes in our universe is greater … compared to what? The proportion of eyes in some other universe?

    See, Mung, you can understand this. In my “eyes” example, I specifically noted that we know that the sampling is hinky:

    Imagine a second, white, bag. We draw 100 balls from it using the same hinky sampling procedure and get 75 with eyes on. Interesting. We don’t know what x is, but thanks to the fact that the sampling process is identical, we can say that the proportion of eyes is greater in the white bag.

    I made no argument whatsoever about the absolute probability of evolving an eye.

    How did you decide which bag represents our universe? Why, the one with the greater proportion of eyes is our universe, obviously!

    Heavens, no. It doesn’t matter: the only point, which you are desperately avoiding, is that if you keep drawing eyes from the bag, then your estimate of the proportion in the bag rises, and equally, if you keep drawing not-eyes from the bag, your estimate declines. This is true, even if your sampling procedure is somewhat hinky.
    Of course, phoodoo wants numbers, and to satisfy him (and his rather elementary misconception about probability) we have to switch to a different set-up. Phoodoo thinks that, because we have sampled only a small proportion of the bag, we cannot learn anything about the unsampled things. He wrote “What you have pulled out tells you absolutely zero about the frequencies inside remaining”. That’s wrong. So I used phoodoo’s example of pulling 7 red things and 2 blue things out of a bag, but I increased n ten-fold, to make the situation clearer. If we pull 70 red things and 20 blue things out of a bag, we have learnt something about the remainder of the bag, even if the bag is arbitrarily large:

    If we pulled out 70 red things and 20 blue things, and nothing else, we would have gained knowledge about the probable proportions of things remaining in the bag. The bag is unlikely to be 99% green things. And even if the sampling is heavily biased, it is still wrong to state “What you have pulled out tells you absolutely zero about the frequencies inside remaining”.

    You, Mung, understand that to satisfy phoodoo’s craving for numbers over concepts, we have to make an assumption about our sampling procedure.
    He asks:

    Because you see DNA, what you have actually done, by taking out 90 objects, is to sample .000000000000000000000 0000000000000000000
    0000000000000000000 0000000000000000000 0000000000000000000000000000000000000000000000000000000000000000 0000000000000000000 00000000000000000000000000000001 (I left out 10*486 zeros to save space) % of the entire contents of the bag.
    So how unlikely is it that it is 99% green? Maybe its only “Like, oh yea right!” unlikely? Or is it “Ahh, maybe”?

    IF we assume random sampling, then the likelihood that the bag is 99% green is 10^180-fold less than it was before we did the sampling. I’d say we’ve learnt something about the remainder of the bag, despite the fact that we have sampled less than 1% of the bag.
    Would you like to confirm that for him? He might believe you.

  19. phoodoo: What do you mean? There is no number in DNA’s example, that’s part of the problem. He has taken out a few balls, the numbers are meaningless, because we don’t know how many balls are inside. What if he only took out one ball? Maybe there was 10 balls inside, all green but one, and we took the only none green one. How could we ever know? Maybe there were two inside, and we took the red one.

    I meant was there any particular reason you pulled that particular number out the bag of possibilities .

    But it doesn’t matter how many ball are in the bag if you pick out a blue you know the the initial probability of picking a blue is greater than zero, pick two and you know the the initial probability of picking blue was larger when you only picked one.

    That is what I understood Rum’s point to be.

  20. OK, so I did this Excel thing.

    Cell A1 contains =RAND() That gives a random number to 7 decimal places (let’s say we have a bag with 10,000,000 balls, or propose a sample space of 10,000,000 independent coin tosses drawn from an infinite series, take your pick). The random value in A1 corresponds to the ‘true’ probability of picking a ball from a given bag, or getting HEADS on a particular (weighted) coin. We could actually make it a ‘true’ probability in a ball example because we could make exactly that number say TAILS, and 10,000,000 minus that number say HEADS. We can manifest the probability as physical frequencies, rather than have it residing, with a coin, ‘somewhere in the air’.

    We can represent that in the Excel toy. We generate a trial which samples the bag/coin.

    Cell A6 contains =RAND() too.
    Cell B6 contains =IF(A6>=%A%1,”HEADS”,”TAILS”)
    Cell C6 contains =COUNTIF(%B%6:B6,”TAILS”)
    Cell D6 contains =C6/(ROW(A6)-5)

    (eta – replace % with dollar sign; forgot about bloody Latex).

    We’ve picked another random number. If it’s less than A1, it’s TAILS, otherwise HEADS. Then we keep a running count of the frequencies in our sample to date, and compute the mean. We can turn that into a series of trials as long as we want by just selecting those 4 cells, clicking at bottom right and dragging down. You can get a brand new coin/bag, and a new set of trials, by clicking F9.

    Now it’s pretty easy to see that the value in D always approaches (converges on) that in A1 as the list of trials lengthens. And it should be easy to see why – the value in A1 in this case feeds directly into the determination of HEADS/TAILS, and you’re bound to sample more on one side of it than the other (unless it’s exactly 0.5000000) The more you sample, the more info you get about the bag, without getting near to sampling the whole thing, or even knowing how big it is. Did I cheat in some way?

  21. Alan Fox: So opinion polls are a waste of time?

    Seems to me, if you poll a representative sample, you do learn something.

    ETA, when sampling balls from a bag, shake the bag first.

    Alan, maybe think more about your own statement first. How do you know when it is a representative sample?

    Let’s try an example. I want to know how many people who have ever existed on the planet dream of rabbits. I asked one guy if he ever does. He said no.

    Conclusion: People don’t dream about rabbits.

  22. newton: But it doesn’t matter how many ball are in the bag if you pick out a blue you know the the initial probability of picking a blue is greater than zero,

    No, no, of course not. The initial probability is no longer a probability. Its is a certainty, because you have just picked a blue one. Are you saying before you picked it the probability was 100% that you would pick a blue?

  23. Allan Miller: Particularly that ‘only worse’ addendum. Sampling 90 is ‘worse’ than sampling 1? For what, or whom?

    More comprehension problems Allan. This is starting to be a habit with you.

    Taking out 90 balls out of 10*1000 is indeed worse than taking 1 from a bag of ten.

  24. DNA_Jock: IF we assume random sampling, then the likelihood that the bag is 99% green is 10^180-fold less than it was before we did the sampling. I’d say we’ve learnt something about the remainder of the bag, despite the fact that we have sampled less than 1% of the bag.
    Would you like to confirm that for him? He might believe you.

    Wait, what did you try to say here DNA? I stated that you have only sampled 10^-1000 percent of the total content size (I am not sure if that is the right way to write it) and you replied that now “the likelihood that the bag is 99% green is 10^180-fold less than it was before.”

    Do you want to show your math?

  25. phoodoo,

    More comprehension problems Allan. This is starting to be a habit with you.

    Certainly with your posts, which have an unfortunate habit of leaving out vital information. Such as:

    Taking out 90 balls out of 10*1000 is indeed worse than taking 1 from a bag of ten.

    You had only said, originally, that ’90 was worse than 1′. I saw no 90-from-X vs 1-from-Y there.

    Even with the addition of the population size, I’m not sure what or whom it’s ‘worse’ for. It is a smaller proportion of the total, I grant you, and hence less likely to give the expected value.

  26. phoodoo,

    In fact I’ve just checked prior posts as well, and nowhere do you state that your 1 was being drawn from a bag of 10 rather than the same original popuylation as the 90. Don’t blame other people for your failure to communicate.

  27. phoodoo: He has taken out a few balls, the numbers are meaningless, because we don’t know how many balls are inside. What if he only took out one ball? Maybe there was 10 balls inside,

    I can’t help it is you refuse to read, Allan.

  28. phoodoo,

    I can’t help it is you refuse to read, Allan.

    Refuse to read? Can barely be arsed, is a more accurate summary, the more so since I discover how I was supposed to assemble your meaning from multiple posts.

    It is not at all clear that when you used the example draw ‘1’, in a post that does not even mention the population size 10, one is actually supposed to refer to a prior post, in which a draw of 1 from 10 was made. Be sure to let me know when your references to the number 1 no longer demand this implicit “from 10”, so I can reset.

    Here is all I thought I had to go on:

    If there are 10 hundred, trillion, billion gazillion balls in a bag, and you only pull out 90, you have taken out so few, that its the same as only taken out 1-only worse.

    1 from 10. I see. Taking one hypothetical proportion from one hypothetical bag is ‘worse’ (taken to mean that with the smaller proportion is worse) than taking a different hypothetical proportion from a different bag. Got it.

  29. phoodoo,

    Did you miss this, from the comment of mine that Alan ‘helpfully’ guanoed?

    You are focused on the ratio of the sample size to the population size. But when the population size is much larger than the sample size, the variability in a statistic depends far more on the absolute size of the sample than it does on the ratio of sample size to population size.

    Your intuition is incorrect. You’re focusing on the wrong number.

  30. Alan Fox: So opinion polls are a waste of time?

    If we have data, let’s look at the data. If we have opinions, let’s go with mine.

    – Jim Barksdale

  31. Alan Fox: ETA, when sampling balls from a bag, shake the bag first.

    Weren’t you reading? It’s a really big bag. Universe size. And we’re taking our samples from a really, really small area in that universe.

  32. phoodoo: Taking out 90 balls out of 10*1000 is indeed worse than taking 1 from a bag of ten.

    No, actually. Even this statement is wrong. You really don’t understand basic statistics.

    phoodoo: Wait, what did you try to say here DNA? I stated that you have only sampled 10^-1000 percent of the total content size (I am not sure if that is the right way to write it) and you replied that now “the likelihood that the bag is 99% green is 10^180-fold less than it was before.”

    Do you want to show your math?

    Here goes.
    To keep it simple, let’s stick with the frequentist approach.
    Imagine a big bag – there’s somewhere between 1,000 and 10^1,000 balls in it.
    Imagine that 99% of them are green.
    The frequentist asks “What is the probability of getting a result this weird (or weirder), given my assumptions?”
    We draw a ball from the bag at random.
    What is the probability, in this scenario, that we draw a non-green ball? It’s 1%.
    We draw a second ball from the bag at random. Because the bag is large, compared with our sample, we ignore the difference between sampling-with-replacement and sampling-without-replacement, which you would only confuse you.
    What is the probability, in this scenario, that the second ball is non-green ball? It’s 1%. And the probability that balls 1 and 2 are non-green is 1% x 1% = 0.01%.
    We keep doing this; note that if we tried to take into account the depletion of non-green balls, our probabilities would get even smaller. That’s too complicated — let’s stick with the idea that the bag is really, really big, say 10^10 balls, so there is no appreciable depletion.
    After extracting 90 non-green balls, the frequentist notes that
    “The probability of getting a result this weird, given my assumption that the bag is 99% green, is 1% ^ 90, that is (10^-2)^90, that is 10^-180”
    See? High School math.
    Now, someone who doesn’t know what a p-value is might turn that sentence around and say “The probability that the bag is 99% green is 10^-180”. That’s not true. As a nod to Bayes, I said, “the likelihood that the bag is 99% green is 10^180-fold less than it was before.” which is an approximation, but alludes to a key concept here.
    If you are interested in learning something, I have an easier-to-understand example from the world of diagnostic testing. Let me know.

    Alternatively, if you want to confuse yourself, you could think about how your knowledge of the remaining contents of a bag that started with 1,000 balls in it varies as you remove balls.
    n=1 knowledge: not a lot.
    n=500 knowledge: a lot
    n=999 knowledge: not a lot.
    Weird, huh?

  33. phoodoo,

    It boils down to this. You believe that if we hold the sample size constant, then the probability of getting an unrepresentative sample will skyrocket as the population size increases.

    That’s false.

    If the population size is much larger than the sample size, then the probability of getting an unrepresentative sample barely changes at all even when the size of the population is increased exponentially.

  34. phoodoo: No, no, of course not. The initial probability is no longer a probability.

    Sure it is, you have just put a bound on it, greater than zero. The exact probability is still unknown

    Its is a certainty, because you have just picked a blue one

    What is certain is the ball you pick is the ball you pick, which ball you pick before you pick is a matter of probability

    . Are you saying before you picked it the probability was 100% that you would pick a blue?

    Nope, you are.I am saying the initial probability for a blue pick was at least 1 divided by the members of the bag.

  35. Allan Miller: So you seem to be arguing that given a bag containing unknown numbers and frequencies, no amount of prior draws will give us any information about the probability of drawing an X? That does not seem correct to me.

    I’m thinking that you need to go back and re-read what I wrote.

    If you draw out ten white balls and ten red balls in 20 draws I do believe you do have some information.

  36. DNA_Jock: Would you like to confirm that for him? He might believe you.

    I disagree with phoodoo. But it could be that I don’t understand his argument. Maybe he’s channeling Hume.

  37. Allan:

    So you seem to be arguing that given a bag containing unknown numbers and frequencies, no amount of prior draws will give us any information about the probability of drawing an X? That does not seem correct to me.

    Mung:

    I’m thinking that you need to go back and re-read what I wrote.

    Here’s what you wrote, Mung:

    He has no idea whether he’s likely to get another eye on the next draw. The sheer number of eyes alone, that he drew in the past, don’t tell him anything about the probabilities associated with his next draw.

    No idea? Really?

  38. Mung,

    I disagree with phoodoo. But it could be that I don’t understand his argument. Maybe he’s channeling Hume.

    Yeah, and maybe Carrot Top is channeling Socrates.

  39. Mung,

    OK, I re-read and it looks the same. I may have missed the significance of the eyes alone part, but I’m only guessing that’s what you want me to observe.

  40. DNA_Jock: It doesn’t matter: the only point, which you are desperately avoiding, is that if you keep drawing eyes from the bag, then your estimate of the proportion in the bag rises, and equally, if you keep drawing not-eyes from the bag, your estimate declines. This is true, even if your sampling procedure is somewhat hinky.

    This is amusing. I make multiple posts on the specific subject of relative frequency and am somehow avoiding the point.

    How did you decide to that you should draw more “eyes” than “not an eye” out of the white bag?

    Imagine a second, white, bag. We draw 100 balls from it using the same hinky sampling procedure and get 75 with eyes on. Interesting. We don’t know what x is, but thanks to the fact that the sampling process is identical, we can say that the proportion of eyes is greater in the white bag.

    Yes you can say that the proportion of eyes is greater. 75 balls with eyes. 25 balls without eyes. But these are made up numbers and don’t tell us anything about the actual proportion of eyes to not eyes in nature.

    In Rumraket’s argument, what number of “not an eye” did he calculate?

  41. Allan Miller: Certainly with your posts, which have an unfortunate habit of leaving out vital information.

    He’s giving you a real-life opportunity to test your sampling theory. You should be grateful.

  42. Mung, to DNA_Jock:

    How did you decide to that you should draw more “eyes” than “not an eye” out of the white bag?

    He didn’t. Here’s what he wrote:

    …if you keep drawing eyes from the bag, then your estimate of the proportion in the bag rises, and equally, if you keep drawing not-eyes from the bag, your estimate declines.

    That’s correct, and it’s the point that Rumraket was making, which you foolishly disputed.

  43. Mung: Just so we’re clear, the issue isn’t over whether or not we can infer relative frequencies or proportions from drawing samples. That’s not the problem with Rumraket’s argument.

    I quote myself.

  44. Allan Miller: OK, I re-read and it looks the same. I may have missed the significance of the eyes alone part, but I’m only guessing that’s what you want me to observe.

    Yes, I was very precise in what I wrote. Just counting eyes doesn’t tell you anything about relative frequency or proportion.

    Proportional to what?

  45. Mung:

    I quote myself.

    And I quote Rumraket:

    So what is the problem with my argument? Please correctly state my argument, preferrably by quoting it (as opposed to making up a stupid caricature of it), and then explain what the problem with it is.

    I predict you will fail on both accounts.

  46. Rumraket: It might really be that I need “to be taken down a notch”, but Mung & Puddidoo just aren’t the people for the job.

    Don’t blame us for being the only ones with enough guts to point out the oblivious. Any number of other people who are qualified to set you straight could have spoken up. But where’s the fun in that?

  47. Rumraket: So what is the problem with my argument? Please correctly state my argument, preferrably by quoting it (as opposed to making up a stupid caricature of it), and then explain what the problem with it is.

    I explained quite clearly what the problem was. Your argument depends on relative frequencies and proportionality. Yet you leave unsaid the relative and proportional to what. Our sampling of [what are you sampling from – phenotype space?] gives us lots of eyes [compared to what?].

    You also don’t provide any actual numbers. For obvious reasons.

  48. Mung, to Rumraket:

    Your argument depends on relative frequencies and proportionality. Yet you leave unsaid the relative and proportional to what.

    No, he doesn’t:

    Which gets us to the other way to understand the frequency, the one I intended.
    The claim is that, if we re-rolled the tape of the history of life, eyes would evolve again. And do so probably close to every time. This is the sense in which I claim eyes are likely with evolution, as opposed to unlikely without evolution. We see something like eyes even in simple bacterial life. Cyanobacteria are known to control their motility so they move towards the light that powers their photosynthesis, which indicates they have some sort of light-detecting organ that also gives them a sense of direction.

    So the claim is that the frequency of independent life-histories where eyes evolve (supposing, of course, that it is life that lives in an environment with significant light) is high. And I think this claim is justified by the observation that basically anything that lives in an environment with light, has either evolved them independently, or inherited them through common descent and retained them through selection. And that primitive yet highly beneficial light-sensitive mechanisms exist even in bacterial life.

    What a waste of time you are, Mung.

Leave a Reply