The Law of Conservation of Information is defunct

Posted on November 12, 2015 by Tom English

About a year ago, Joe Felsenstein critiqued a seminar presentation by William Dembski, “Conservation of Information in Evolutionary Search.” He subsequently discussed Dembski’s primary source with me, and devised a brilliant response, unlike any that I had considered. This led to an article, due mostly to Felsenstein, though I contributed, at The Panda’s Thumb. Nine days after it appeared, Dembski was asked in a radio interview whether anyone was paying attention to his technical work. Surely a recipient of

the Darwin-Wallace Medal,
the John J. Carty Award for the Advancement of Science, and
the International Prize for Biology

qualifies as a someone. But Dembski changed the topic. And when the question came around again, he again changed the topic. Mind you, this isn’t how I know that Felsenstein blasted conservation of “information,” which is not information, in evolutionary “search,” which does not search. It’s how I know that Dembski knows.

Or, I should say, it’s how I first knew. The Discovery Institute has since employed Dembski’s junior coauthor, Winston Ewert, to quietly replace various claims, including the most sensational of them all (Dembski and Marks, “Life’s Conservation Law: Why Darwinian Evolution Cannot Create Biological Information,” 2010; preprint 2008):

Though not denying Darwinian evolution or even limiting its role in the history of life, the Law of Conservation of Information shows that Darwinian evolution is inherently teleological. Moreover, it shows that this teleology can be measured in precise information-theoretic terms.

Felsenstein realized that we could apply their measure to a simple model of evolution by natural selection, devoid of purpose, and obtain a large quantity. According to the model, evolution begins with a random genotype, and ends with a genotype fitter than all of its neighbors. The neighbors of a genotype are those that can arise from it by mutation in a single point. In each step of the evolutionary process, a genotype is replaced by the fittest of its neighboring genotypes. The overall result of evolution is a sequence of genotypes that is unconstrained in how it begins, and highly constrained in how it ends. Each genotype in the sequence is not only fitter than all of the genotypes that precede it, but also fitter than all of their neighbors. That is, evolution successively constrains the genotype to smaller and smaller subsets of the space of genotypes. The final genotype is at the very least fitter than all of its neighbors. Equivalently, the minimum degree of constraint is the neighborhood size. Dembski and Marks mistake this for the degree of teleology (purpose) in evolution, and refer to it as active information. The gist of “conservation of information” is that teleology comes only from teleology. As Dembski said in his seminar presentation:

If you will, the teleology of evolutionary search is to produce teleology.

Considering that the neighborhood size indicates only how many, not at all which, genotypes are eliminated in a single step of evolution, there can be no argument that constraint implies purpose.¹ Ewert does not hazard an overt reply, but in fact responds by downgrading “active information” from a measure of teleology to a measure of bias. The new significance of “conservation of information” is this: if the constraint, er, bias of a natural process is not due to design, then nature itself must be constrained, er, biased.² We have it from Ewert, writing on behalf of Dembski and Marks, that:

Of course, Darwinian evolution is not a teleological process and does not search for a goal [e.g., birds…] Whatever search or process might be in play, … it produces birds much more often than chance would otherwise lead us to predict. It is this bias towards producing a bird that we call active information. […] Having postulated Darwinian evolution, … the fact that birds exist has to be explained in terms of the initial configuration of the universe. The universe must have begun with a large amount of active information with respect to the target of birds.

Although “information” stumbles on, searching for brains to eat, the vital principle has departed from the Law of Conservation of Information (LCI). No more does LCI show what it shows. The credit for dispatching teleology goes entirely to Joe Felsenstein. You should have a look at his latest, “Why ID Advocates Downplay Our Disagreement With Them,” before watching me deliver a round to the frontal lobe of the Conservation of Information Theorem.

The credit for keeping things surreal goes entirely to the Discovery Institute. Replacing Dembski, a full-time senior fellow, with Ewert in an exchange with a renowned evolutionary geneticist is beyond bizarre. But it is perhaps no accident that the authorship of the response serves the same purpose as its rhetorical tactics, namely, to conceal the presence of huge concessions. What Ewert does, avoiding all signs of what he’s doing, is to undertake salvage of Dembski’s treatment of LCI in Being as Communion: A Metaphysics of Information (2014). Rather than identify a source, he speaks from authority. Rather than replace terms that convey precisely the misconceptions in the book, he explains matter-of-factly that they don’t mean what they seem to say. And rather than admit that Felsenstein and I set him and his colleagues straight on the meanings, Ewert proclaims that “These Critics of Intelligent Design Agree with Us More Than They Seem to Realize.” The way he trumps up agreement is to treat a single section of our article, which merely reiterates an old point regarding the smoothness of fitness landscapes, as though it were the whole. We actually focus on an arbitrary, and hence arbitrarily rough, landscape.

LCI, putatively a law of nature, putatively has a mathematical foundation. According to Being as Communion (p. 148):

A precise theoretical justification for the claim that natural selection is inherently teleological comes from certain recent mathematical results known as Conservation of Information (CoI) theorems.

Now the claim is that natural selection is inherently biased, and that something must account for the bias — either design or the initial “configuration” of the Universe (wink wink, nudge nudge) — given that bias is conserved. In short, CoI still applies, with the understanding that I is for bIas. Dembski places his work in the context of earlier analysis of search, and mentions a sometime theorist you’ve heard of before (p. 151):

Computer scientist Thomas English, in a 1996 paper, also used the term “Conservation of Information,” though synonymously with the then recently proved results by Wolpert and Macready about No Free Lunch (NFL). In English’s version of NFL, “the information an optimizer gains about unobserved values is ultimately due to its prior information of value distributions.”

I actually proved an NFL theorem more general than that of Wolpert and Macready, and used the term “conservation of information” to characterize an auxiliary theorem. Although I got the math right, what I wrote about it in plain language was embarrassingly wrong. I happened to emend my online copy of the paper a month before Dembski’s book appeared, adding a preface titled “Sampling Bias Is Not Information.” So, while it definitely was Felsenstein who left Dembski et al. no choice but to abandon teleology, it may be that I had some influence on their choice of a new position. In any case, it falls to me to explain why they are embarrassingly wrong in what they claim about math that they have gotten right.

The right approach, for a general readership, is to address only what is most obviously wrong, and to put as much as possible into pictures. We’ll be looking at broken sticks. We’ll even watch them breaking randomly to pieces. This is how Dembski et al. see the biases of an evolutionary process being determined, in the absence of design. CoI tells us something about the random length of a particular segment, selected before the stick breaks. But Felsenstein and I selected an outcome after modeling the evolutionary process. We targeted an outcome for which the bias was large. The bias was not large because we targeted the outcome. Even if we pretend that a broken stick determined the bias of the evolutionary process, CoI does not apply. The theorem that does apply has no name. It is the solution to Exercises 666-667 in a highly respected text of the 19th Century, Choice and Chance. Given that it bears the Number of the Beast, and comes from the Reverend William Allen Whitworth, I’m tempted to call it the Revelation Theorem. But I’ll avoid giving offense, and refer instead to the Broken Stick Theorem.

Breaking sticks

Dembski et al. believe that CoI applies to all physical events that scientists target for investigation. The gist of their error is easy to understand. A scientist is free to investigate any event whatsoever after observing what actually occurs in nature. But the CoI theorem assumes that a particular event is targeted prior to the existence of a process. This is appropriate when an engineer selects a process in order to generate a prespecified event, i.e., to solve a given problem. It is no coincidence that the peer-reviewed publications of Dembski et al. are all in the engineering literature. The assumption of the theorem does not hold when a scientist works in the opposite direction, investigating an event that tends to occur in a natural process. Put simply, there is a difference between selecting a process to suit a given target and selecting a target to suit a given process. The question, then, is just how big the difference is. How badly wrong is it to say that the CoI theorem characterizes conservation of bias in nature? Fortunately, the error can be conveyed accurately with pictures. What we shall see is not conservation, but instead unbounded growth, of the maximum bias (“active information”).

ETA: Text between the horizontal rules is an improved introduction to the technical material, developed in discussion here at TSZ. It comes verbatim from a comment posted a week ago. I’ve made clear all along my intent to respond to feedback, and improve the post. However, I won’t remove any of the original content, because that’s too easily spun into a retraction.

Dembski et al. represent natural processes abstractly. In their math, they reduce the evolutionary process to nothing but the chances of its possible outcomes. The CoI theorem is indifferent to what the possible outcomes actually are, in physical reality, and how the process actually works, that the outcomes should have the chances of occurrence that they do. Here I assume that there are only 6 possible outcomes, arbitrarily named 1, 2, 3, 4, 5, 6. The possible outcomes could be anything, and their names say nothing about what they really are. Each of the possible outcomes has a chance of occurrence that is no less than 0 (sure not to occur) and no greater than 1 (sure to occur). The chances of the possible outcomes are required to add up to 1.

As far as the CoI theorem is concerned, an evolutionary process is nothing but a list of chances that sum to 1. I’ll refer to the list of chances as the description of the process. The first chance in the description is associated with the possible outcome named 1, the second chance in the description is associated with the possible outcome named 2, and so forth. The list

$.1, \quad .3, \quad .1, \quad .2, \quad .1, \quad .2$

is a valid description because each of the numbers is a valid chance, lying between 0 and 1, and because the total of the chances is 1. We can picture the description of the evolutionary process as a stick of length 1, broken into 6 pieces.

[Need a new figure here.]

Naming the segments 1, 2, 3, 4, 5, 6, from left to right, the length of each segment indicates the chance of the possible outcome with the corresponding name. Consequently, the depiction of the evolutionary process as a broken stick is equivalent to the description of the process as a list of the chances of its possible outcomes.

You perhaps wonder how I would depict the evolutionary process as a broken stick if a “possible” outcome had absolutely no chance of occurring. And the answer is that I could not. There is no segment of length 0. In the CoI theorem, however, chances precisely equal to 0 are effectively impossible. Thus it is not misleading to say that Dembski et al. reduce the evolutionary process to a broken stick.

There are infinitely many ways to break our metaphorical stick into a given number of segments. Averaging over all of them, the lengths of the segments are

$\frac{1}{6}, \quad \frac{1}{6}, \quad \frac{1}{6}, \quad \frac{1}{6}, \quad \frac{1}{6}, \quad \frac{1}{6}.$

That is, in the average description of an evolutionary process, the possible outcomes are uniform in their chances of occurrence. Dembski et al. usually advocate taking uniform chances as the standard of comparison for all processes (though they allow for other standards in the CoI theorem). Dembski and Marks go much further in their metaphysics, claiming that there exist default chances of outcomes in physical reality, and that we can obtain knowledge of the default chances, and that deviation of chances from the defaults is itself a real and objectively measurable phenomenon. Although I want to limit myself to illustrating how they have gone wrong in application of CoI, I must remark that their speculation is empty, and comes nowhere close to providing a foundation for an alternative science. Otherwise, I would seem to allow that they might repair their arguments with something like the Broken Stick Theorem.

Taking uniform chance as the standard to which all evolutionary processes are compared, we naturally arrive at an alternative representation. We begin by writing the standard description a bit differently, multiplying each of the chances by 1.

$1 \times \frac{1}{6}, \quad 1 \times \frac{1}{6}, \quad 1 \times \frac{1}{6}, \quad 1 \times \frac{1}{6}, \quad 1 \times \frac{1}{6}, \quad 1 \times \frac{1}{6}.$

Now we can write any description whatsoever by adjusting the multipliers, while leaving the fractions 1/6 just as they are. The trick is to multiply each of the chances in the description by 1, but with 1 written as 6 $\times$ 1/6. For instance, the description

$\frac{1}{24}, \quad \frac{1}{3}, \quad \frac{1}{12}, \quad \frac{1}{4}, \quad \frac{1}{6}, \quad \frac{1}{8}$

is equivalent to

$\frac{6}{24} \times \frac{1}{6}, \quad \frac{6}{3} \times \frac{1}{6}, \quad \frac{6}{12} \times \frac{1}{6}, \quad \frac{6}{4} \times \frac{1}{6}, \quad \frac{6}{6} \times \frac{1}{6}, \quad \frac{6}{8} \times \frac{1}{6}.$

The multipliers

$\frac{6}{24}, \quad \frac{6}{3}, \quad \frac{6}{12}, \quad \frac{6}{4}, \quad \frac{6}{6}, \quad \frac{6}{8}$

are the biases of the process, relative to the standard in which the chances are uniformly 1/6. The process is biased in favor of an outcome when the bias is greater than 1, and biased against an outcome when the bias is less than 1. For instance, the process is biased in favor of outcome 4 by a factor of 6/4 = 1.5, meaning that the chance of the outcome is 1.5 times as great as in the standard. Similarly, the process is biased against outcome 1 by a factor of 24/6 = 4, meaning that the chance of the outcome is 6/24 = 0.25 times as great as in the standard. The uniform standard is unbiased relative to itself, with all biases equal to 1.

The general rule for obtaining the biases of an evolutionary process, relative to the uniform standard, is to multiply the chances by the number of possible outcomes. With 6 possible outcomes, this is equivalent to scaling the the broken stick to a length of 6. We gain some clarity in discussion of CoI by referring to the biases, instead of the chances, of the evolutionary process. The process is metaphorically a broken stick, either way. Whether the segment lengths are biases or chances is just a matter of scale. We shall equate the length of the stick to the number of outcomes, and thus depict the biases of the process, for and against the possible outcomes corresponding to the segments.

To make the pictures clear, we assume that the evolutionary process has only 6 possible outcomes. Let’s name the possibilities 1, 2, 3, 4, 5, and 6. The process is unbiased if none of the possibilities has a greater chance of occurring than does any other, in which case the chance of each possible outcome is 1/6. According to Dembski et al., if we deny that the biases of the process are due to design, then we essentially say that a stick of length 6 broke randomly into 6 segments, and that the lengths of the segments determined the biases. Suppose that the length of the 3rd segment of the broken stick is 2. Then the evolutionary process is biased in favor of outcome 3 by a factor of 2. The chance of the outcome is

$2 \times \frac{1}{6} = \frac{1}{3}.$

Suppose that the length of the 5th segment is 1/4. Then the process is biased against outcome 5 by a factor of 4, and the chance of the outcome is

$\frac{1}{4} \times \frac{1}{6} = \frac{1}{24}.$

These biases are what Dembski et al. refer to as active information. The term, in and of itself, begs the question of whether something actively formed the process with bias in favor of a desired outcome.

ETA: Text between the horizontal rules comes from an earlier attempt at improving the introduction to the technical material, developed in discussion here at TSZ. I’ve quoted a comment posted 17 days ago.

Dembski et al. do not allow that such deviations from the supposedly “natural” default of uniform chance might be brute facts of physical reality. There must be a reason for bias. If we do not allow that bias is possibly due to design of the process to serve a purpose, then Dembski et al. force on us the view that bias itself arises by chance. (This is multifariously outrageous, but for reasons that are not clearly tied to their math.) That is, the chances of the possible outcomes of the evolutionary process are determined by an antecedent process, which is also random. Talk about the chances of chances gets very confusing, very fast. So I say instead that the evolutionary process is randomly biased by a process that occurs before it does. The biases of the evolutionary process are just the chances of the 6 possible outcomes of the evolutionary process, multiplied by 6. Setting the chances randomly is equivalent to setting the biases randomly.

The broken stick is a conventional metaphor for probabilities that are themselves set randomly. (I follow Dembski in reserving the word chance for the probability of a physically random outcome.) The random lengths of the segments of the stick are the probabilities. The stick is ordinarily of unit length, because the probabilities must sum to 1. To visualize random biases, instead of random chances, I need only multiply the length of the stick by the number of possible outcomes, 6, and randomly break the stick into 6 pieces. Then the biases sum to 6.

I stipulate that the biasing process, i.e., stick breaking, is uniform, meaning that all possible biases of the evolutionary process are equally likely to arise. A tricky point is that Dembski et al. allow for uniform biasing, but do not require it. The essential justification of my approach is that I need consider only something, not everything, that they allow in order to demonstrate that the theorem does not apply to scientific investigation. What I consider is in fact typical. The uniform biasing process is the average of all biasing processes. Thus there can be no objection to my choice of it.

Dembski et al. refer to all random processes as “searches.” The term is nothing but rhetorical assertion of the conclusion they want to draw. The stick-breaking “search” (process), which determines the biases of the evolutionary “search” (process), is a visualization of what they call a “search for a search.” Dembski et al. allow for the biasing process itself to be biased by an antecedent process, in which case there is a “search for a search for a search.” In Being as Communion, Dembski avoids committing to Big Bang cosmology, and indicates that the regress of searches for searches might go back forever in time. Fortunately, we need not enter a quasi-mystical quagmire to get at a glaring error in logic.

Animation 1. In the analysis of Dembski, Ewert, and Marks, the biases of an evolutionary process are like control knobs, either set by design, or set randomly by another process. The random biasing process is like a stick breaking into pieces. The biases of an evolutionary process are the lengths of the segments of a broken stick. Here the number of possible outcomes of the evolutionary process is 6, and a stick of length 6 breaks randomly into 6 segments. No segmentation is more likely than any other. Before the stick starts breaking, we expect any given segment to be of length 1. But when a scientist investigates an evolutionary process, the stick has already broken. The scientist may target the outcome for which the bias is greatest, i.e., the outcome corresponding to the longest segment of a broken stick. With 6 possible outcomes, the expected maximum bias is 2.45. Generalizing to $n$ possible outcomes, the expected maximum bias of a randomly biased evolutionary process is a logarithmic function of $n$ . The quantity grows without bound as the number of possible outcomes of evolution increases. The Conservation of Information Theorem of Dembski et al. tells us that the greater the bias in favor of an outcome specified in advance, the less likely the bias is to have arisen by breaking a stick, no matter how many the possible outcomes of the evolutionary process. It depends on an assumption that does not hold in scientific study of evolution.

In the most important case of CoI, all possible segmentations of the stick have equal chances of occurring. Although the segments almost surely turn out to be different in length, they are indistinguishable in their random lengths. That is, the chance that a segment will turn out to be a given length does not depend on which segment we consider. This is far from true, however, if the segment that we consider depends on what the lengths have turned out to be. Dembski et al. neglect the difference in the two circumstances when they treat their theorem as though it were a law of nature. Here’s an example of what CoI tells us: the probability is at most 1/2 that the first segment’s length will turn out to be greater than or equal to 2. More generally, for any given segment, the probability is at most $1/b$ that the segment’s length with turn out to be greater than or equal to $b$ . This holds for sticks of all lengths $n$ , broken into $n$ segments. Recall that the random segment lengths are the random biases of the evolutionary process. CoI says that the greater the bias in favor of an outcome specified in advance, the less likely the bias is to have arisen by breaking a stick. The result is not useful in implicating design of biological evolution, as it assumes that an outcome was targeted in advance. To apply CoI, one must know not only that an outcome was targeted prior to the formation of the evolutionary process, but also which of the possible outcomes was targeted.³

Figure 2. In this frame from Animation 1, the segments of 20 broken sticks are colored according to their original positions. The expected length of each segment is 1, though the random lengths are highly variable. According to CoI, the probability is at most 1/2 that the length of the blue segment will turn out to be 2 or greater. More generally, for any given segment, the probability is at most $1/b$ that the length of the segment will turn out to be greater than or equal to $b.$ This does not hold if we specify a segment in terms of the outcome of the random segmentation of the stick. In particular, CoI does not apply to the longest segment.

Figure 3. In this frame from Animation 1, the segments of each of the 20 broken sticks have been sorted into ascending order of length, and recolored. The expected length of the longest (red) segment is 2.45. By the Broken Stick Theorem, the probability is .728 that at least one of the segments is of length 2 or greater. By misapplication of CoI, the probability is at most 1/2. For a stick of length $n$ , the probability is greater than 1/2 that at least one of the $n$ segments exceeds $\ln n$ in length. There is no limit on the ratio of probability 1/2 to the faux bound of $1/\ln n$ .

The Broken Stick Theorem tells us quite a bit about the lengths of segments. What is most important here is that, for any given length, we can calculate the probability that one or more of the segments exceeds that length. For instance, the probability is 1/2 that at least one of the segments is of length 2.338 or greater. If you were to misapply CoI, then you would say that the probability would be no greater than 1/2.338, which is smaller than 1/2. A simple way to measure the discrepancy is to divide the actual probability, 1/2, by the CoI bound, 1/2.338. The result, 1.169, is small only because the illustration is small. There is no limit on how large it can be for longer sticks. Let’s say that the stick is of length $n$ , and is broken into $n$ segments. Then the probability is greater than 1/2 that at least one of the segments exceeds $\ln n$ in length. Here $\ln n$ is the natural logarithm of $n$ . The details are not important. What matters is that we can drive the faux bound of $1 / \ln n$ arbitrarily close to 0 by making $n$ large, while the correct probability remains greater than 1/2.

Cool, but nonessential: The relation of the expected length of the $i$ -th longest segment of a broken stick to the harmonic numbers. Here $E[B_{(i)}]$ is the expected value of $B_{(i)}$ , the $i$ -th greatest of the random segment lengths (biases). As it happens, the notation $E[\cdot]$ , widely used in probability and statistics, was introduced by William Allen Whitworth, who derived the Broken Stick Theorem.

$\begin{align*} E[{B}_{(6)}] &= \frac{1}{1} + \frac{1}{2} + \frac{1}{3} + \frac{1}{4} + \frac{1}{5} + \frac{1}{6} = \mathcal{H}_6\\ E[{B}_{(5)}] &= \phantom{\frac{1}{1} +\;\,} \frac{1}{2} + \frac{1}{3} + \frac{1}{4} + \frac{1}{5} + \frac{1}{6} = \mathcal{H}_6 - \mathcal{H}_1\\ E[{B}_{(4)}] &= \phantom{\frac{1}{1} + \frac{1}{2} +\;\, } \frac{1}{3} + \frac{1}{4} + \frac{1}{5} + \frac{1}{6} = \mathcal{H}_6 - \mathcal{H}_2 \\ E[{B}_{(3)}] &= \phantom{\frac{1}{1} + \frac{1}{2} + \frac{1}{3} +\;\, } \frac{1}{4} + \frac{1}{5} + \frac{1}{6} = \mathcal{H}_6 - \mathcal{H}_3 \\ E[{B}_{(2)}] &= \phantom{\frac{1}{1} + \frac{1}{2} + \frac{1}{3} + \frac{1}{4} +\;\, } \frac{1}{5} + \frac{1}{6} = \mathcal{H}_6 - \mathcal{H}_4 \\ E[{B}_{(1)}] &= \phantom{\frac{1}{1} + \frac{1}{2} + \frac{1}{3} + \frac{1}{4} + \frac{1}{5} +\;\, } \frac{1}{6} = \mathcal{H}_6 - \mathcal{H}_5 \\ E[{B}_{(1)}] + \cdots + E[{B}_{(6)}] &= \frac{1}{1} + \frac{2}{2} + \frac{3}{3} + \frac{4}{4} + \frac{5}{5} + \frac{6}{6} = 6 \end{align*}$

For large $n$ , $\mathcal{H}_n \approx \ln n + \gamma,$ where $\gamma \approx 0.5772$ is the Euler-Mascheroni constant. So the expected maximum bias (“active information”) of a randomly biased process is logarithmic in the number of possible outcomes. For large $n$ ,

$P(B_{(n)} > \ln n) \approx 1 - \frac{1}{e} \approx .6321.$

The derivation is straightforward, but not brief. I decided that the loose bound

$P(B_{(n)} > \ln n) > \frac{1}{2}$

better serves present purposes.

Rather than simply argue that the analysis of Dembski et al. does not apply, I have identified a comparable analysis that does apply, and used it to quantify the error in misapplying their analysis. The expected maximum bias (“active information”) for a randomly biased process (“search”) grows without bound as the size of the space of possible outcomes (“search space”) increases. For $n$ possible outcomes, the probability is greater than $1/2$ that the maximum bias exceeds $\ln n$ . According to CoI, the probability is at most $1 / \ln n$ that the bias in favor of a given outcome is $\ln n$ or greater. The discrepancy is entirely a matter of whether a possible outcome is targeted in advance of generating the process (“hit this”), or the most probable outcome of the process is targeted after the fact (“this is what it hits”). It should be clear that a scientist is free to do the latter, i.e., to investigate the most probable outcome of a process observed in nature.⁴ In Dembskian terms, the active information measure permits us to inspect the distribution of arrows shot into a wall by a blind archer, and paint a target around the region in which the density of arrows is greatest. There is no requirement that the target have the detachable specification that Dembski emphasized in his earlier writings.

Why a bug is not a weasel

In 1986, Richard Dawkins published The Blind Watchmaker: Why the Evidence of Evolution Reveals a Universe without Design, a response to William Paley’s Natural Theology: or, Evidences of the Existence and Attributes of the Deity; Collected from the Appearances of Nature (1802). Dembski’s career in ID is largely a response to Dawkins. Indeed, the highlights are cooptations of ideas in The Blind Watchmaker. Dawkins characterizes objects that are complicated, and seemingly designed, as “statistically improbable in a direction that is specified not with hindsight.” Dembski elaborates on the self-same property in The Design Inference: Eliminating Chance through Small Probabilities (1998), taking it as information imparted to objects by design. A not-with-hindsight specification is, in his parlance, detachable from the specified event (set of objects), called the target. Dembski usually refers to complicatedness as complex specified information or specified complexity, but sometimes also as specified improbability. The last term gives the best idea of how it contrasts with active information, the elevated probability of an event not required to have a detachable specification. In No Free Lunch: Why Specified Complexity Cannot Be Purchased without Information, he states a Law of Conservation of Information for specified complexity. (As explained below, in Appendix 1, Dembski and Marks have never mentioned this LCI, since stating their LCI for active information.)

[This section is not complete. I expect to add the guts over the next day or so, which means, as Joe can tell you, that you should expect them sometime in December. The gist is that specified complexity does not apply to Dawkins’ Weasel program. Dembski has made much of the meaningfulness of the target sentence. But the fact of the matter is that the model is the same for all target sentences comprising 28 uppercase letters and spaces. The target need not be specified. The measure of active information formalizes Dawkins’ comparison the model to the proverbial monkeys at typewriters. It does not stipulate that the target have a detachable specification. Dembski and Marks seem to have thought that it was transparently obvious that Dawkins had selected a desired outcome in advance, and had informed (programmed) the evolutionary process to “hit the target.”]

Dembski discusses the Weasel program on pp. 176-180 of Being as Communion. Here is how he describes “search,” in general, and the Weasel program, in particular:

In The Blind Watchmaker, Dawkins purports to show how natural selection creates information. In that book, he gives his famous METHINKS IT IS LIKE A WEASEL computer simulation. A historian or literary scholar, confronted with the phrase METHINKS IT IS LIKE A WEASEL, would look to its human author, William Shakespeare, to explain it (the phrase is from Hamlet). An evolutionary theorist like Dawkins, by contrast, considers what it would take for an evolutionary process, simulated by an algorithm running on a computer, to produce this target phrase. All such algorithms consist of:

an initialization (i.e., a place where the algorithm starts — for Dawkins the starting point is any random string of letters and spaces the same length as METHINKS IT IS LIKE A WEASEL);

a fitness landscape (i.e., a measure of the goodness of candidate solutions — for Dawkins, in this example, fitness measures proximity to the target phrase so that the closer it is to the target, the more fit it becomes);

an update rule (i.e., a rule that says where to go next given where the algorithm is presently — for Dawkins this involves some randomization to existing candidate phrases already searched as well as an evaluation of fitness along with selection of those candidates with the better fitness);

a stop criterion (i.e., a criterion that says when the search has gone on long enough and can reasonably be ended — for Dawkins this occurs when the search has landed on the target phrase METHINKS IT IS LIKE A WEASEL).

Note that in these four steps, natural selection is mirrored in steps (2) and (3).

It is important to note that Dembski addresses algorithms, or designs of computer programs, in engineering terms, and does not address models (implemented by computer programs) in scientific terms. This amounts to a presumption, not a demonstration, that the computational process (running program) is designed to generate a desired outcome.

[What I hope to get across here is why Dembski et al. cannot misconstrue Felsenstein’s model, called the GUC Bug, as he does Dawkins’ model. Those of you who argue with ID proponents should put the tired old Weasel out to pasture, or wherever it is that old Weasels like to go, and give Felsenstein’s Killer Bug a try.]

Figure 4. Felsenstein’s GUC Bug model contrasts starkly with Dembski’s travesty of Dawkins’ Weasel program. There can be no argument that Felsenstein designed the model to hit a target, because we define the target in terms of the modeled process. The model implemented by the Weasel program is not terribly different. But it is terribly easy to brush aside the model, and focus upon the program. Then the claim is that Dawkins designed the program to hit a specified target with its output.

ID the future

It is telling, I believe, that Dembski gave a detachable specification, “teleological system/agent,” of the target for biological evolution in his seminar talk, and that Ewert gives a detachable specification of the event that he targets, birds, in his response to Felsenstein and me. Ewert addressed active information in his master’s thesis (2010), but developed a new flavor of specified complexity for his doctoral dissertation (2013; sequestered until 2018). He, Dembski, and Marks have published several papers on algorithmic specified complexity (one critiqued here, another here). Dembski indicates, in a footnote of Being as Communion, that he and Marks are preparing the second edition of No Free Lunch (presumably without changing the subtitle, Why Specified Complexity Cannot Be Purchased without Information). My best guess as to what to make of this is that Dembski et al. plan to reintroduce specification in LCI Version 3. One thing is sure: ever mindful of the next judicial test of public-school instruction in ID, they will not breathe a hint that their publications on active information are any less weighty than gold. Ewert has demonstrated some of the revisionary tactics to come.

Appendix 1: Contradictory laws on the books

There actually have been two Laws of Conservation of Information. The first, featured in No Free Lunch: Why Specified Complexity Cannot Be Purchased without Intelligence (Dembski, 2002), addresses the specified complexity, also known as the specified improbability, of an event. The second, featured in Being as Communion: A Metaphysics of Information (Dembski, 2014), addresses the active information of a process, supposedly necessary for “unnatural” elevation in probability of an event. Specified improbability is loosely the opposite of elevated probability. Dembski and Marks evidently saw better than to claim that both are conserved, as they have said nothing about the first law since coming up with the second. Although Dembski opens Being as Communion by indicating that it is the last book of a trilogy that includes No Free Lunch, his only mention of specified complexity is in a footnote listing examples of “materialist-refuting logic.” He also notes that he and Marks are preparing the second edition of No Free Lunch. To include both specified complexity and active information in the cutlery is to serve up free lunch. It equips the ID theorist to implicate design when an event is too improbable (relative to a probability induced by specification), and also when an event is too probable (relative to a probability asserted a priori).

Appendix 2: Remembrance of information past

Here I give ample evidence that the “search” really was supposed to search for the targeted event, and that “active information” really was supposed to account for its probability of success. I begin with two technical abstracts. If you find yourself getting bogged down, then read just the text I’ve highlighted. The first is for Dembski‘s seminar talk (August 2014).

Conservation of Information (CoI) asserts that the amount of information a search outputs can equal but never exceed the amount of information it inputs. Mathematically, CoI sets limits on the information cost incurred when the probability of success of a targeted search gets raised from p to q (p < q), that cost being calculated in terms of the probability p/q. CoI builds on the No Free Lunch (NFL) theorems, which showed that average performance of any search is no better than blind search. CoI shows that when, for a given problem [targeted event], a search outperforms blind search, it does so by incorporating an amount of information determined by the increase in probability with which the search outperforms blind search. CoI applies to evolutionary search, showing that natural selection cannot create the information that enables evolution to be successful, but at best redistributes already existing information. CoI has implications for teleology in nature, consistent with natural teleological laws mooted in Thomas Nagel’s Mind & Cosmos.

Apart from hiding a law of nature under a bushel, this is not much different from the abstract of “Life’s Conservation Law: Why Darwinian Evolution Cannot Create Biological Information” [sic] (Dembski and Marks, 2010; preprint 2008).

LCI characterizes the information costs that searches incur in outperforming blind search. Searches that operate by Darwinian selection, for instance, often significantly outperform blind search. But when they do, it is because they exploit information supplied by a fitness function — information that is unavailable to blind search. Searches that have a greater probability of success than blind search do not just magically materialize. They form by some process. According to LCI, any such search-forming process must build into the search at least as much information as the search displays in raising the probability of success. More formally, LCI states that raising the probability of success of a search by a factor of q/p (> 1) incurs an information cost of at least log(q/p). [… Conservation of information] theorems provide the theoretical underpinnings for the Law of Conservation of Information. Though not denying Darwinian evolution or even limiting its role in the history of life, the Law of Conservation of Information shows that Darwinian evolution is inherently teleological. Moreover, it shows that this teleology can be measured in precise information-theoretic terms.

The putative measure of teleology is log(q/p), the active information of the evolutionary search. Dembski also says in Being as Communion that a search is informed to find a target, not merely biased in favor of it.

A precise theoretical justification for the claim that natural selection is inherently teleological comes from certain recent mathematical results known as Conservation of Information (CoI) theorems [p. 148].

Simply put, searches, in finding targets output information. At the same time, to find targets, searches need to input information [p. 152].

CoI shows that successful search (i.e., one that locates a target) requires at least as much input of information as the search by its success outputs [p. 150].

The information that goes into formation of the search, to increase the probability that it finds the target, is active information. Returning to “Life’s Conservation Law” (Section 1, “The Creation of Information”):

Nature is a matrix for expressing already existent information. But the ultimate source of that information resides in an intelligence not reducible to nature. The Law of Conservation of Information, which we explain and justify in this paper, demonstrates that this is the case.

Dembski and Marks hold that the ultimate source of active information, which increases the probability that evolutionary search achieves a purpose, is supernatural intelligence. However, Ewert tells us that “active information,” regarded as bias instead of information, is not necessarily due to design.

The conservation of information does not imply a designer. It is not a fine-tuning argument. It is not our intent to argue that all active information derives from an intelligent source. To do any of those things, we’d have to introduce metaphysical assumptions that our critics would be unlikely to accept. Conservation of information shows only that whatever success evolutionary processes might have, it is due either to the original configuration of the universe or to design.

This reversal is not due to Ewert. He’s obviously adapting arguments in Being as Communion, though without citing a source.

Notes

1. Felsenstein and I give the bound for just the fittest of all genotypes. I’ve extended it to the set of all local maxima of the fitness landscape. We classify haploid organisms into genotypes according to their DNA bases in $L$ positions of the genome. The neighbors of a genotype are the $3L$ genotypes that differ from it in exactly one of the $L$ positions. We require that the fittest genotype of each neighborhood of $K=3L+1$ genotypes be unique. It follows immediately that at most one genotype per neighborhood is a local maximum of the fitness landscape, and that the ratio of the total number of genotypes to the number of local maxima is at least $K$ . Evolution begins with a random genotype, proceeds along the path of steepest ascent on the landscape, and ends at a local maximum. The minimum degree of constraint on the final genotype in the process is $K$ . This is also the minimum “active information” with respect to (targeting) the set of local maxima. That is, the probability is $q = 1$ that the final genotype is a local maximum. The uniform probability of the set of local maxima is $p \leq 1/K$ . Finally, the active information, without conversion to a log scale, is $q / p \geq K$ .

2. Although the term bias is technically acceptable — indeed, I have used it, and will continue to use it in contexts where constraint is inappropriate — Ewert earns scorn by abusing it in the most predictable of ways. The problem with referring to the bias of a natural process is that the general reader gets the idea that the process “naturally” ought to have behaved in some other way, and deviates only because something biased it. And thus the Designer enters through the back door, not by evidence or reason, but instead by rhetorical device. Usually, the meaning of bias is only that some of the possible outcomes of a process have different chances of occurring than do others. If this were always the case, then I would refer instead to the non-uniformity of the probability distribution on outcomes. By the way, I am not conflating all probabilities in scientific models with physical chances, as Dembski et al. generally do. Much of what is modeled as random in biological evolution is merely uncertain, not attributed to quantum chance. The vitally important topic of interpretations of probability, which Dembski deflects with a false analogy to interpretations of quantum mechanics (Being as Communion, p. 157), will have to wait for another post.

3. CoI applies more generally to events, meaning sets of possible outcomes. But that’s irrelevant to the logic, or lack thereof. For readers familiar with Dembski’s measure of specified complexity, I should mention that the measure of active information permits us to target any event whatsoever. There is no requirement that the event have a detachable specification. Dembski’s arguments to the effect that an event with a detachable specification might as well have been prespecified are irrelevant here.

4. What it means to investigate the most probable outcome of a process observed in nature is highly problematic. In particular, we generally cannot say anything sensible about the chances of possible outcomes of a one-shot process. Complex processes that have occurred once, and cannot be repeated, are what commonly interest evolutionary biologists. I should make it clear that I don’t agree with Dembski et al. that evolutionary biologists should make claims about the chances of this, that, and the other. I’m essentially playing along, to show that their math is not even applicable.

228 thoughts on “The Law of Conservation of Information is defunct”

Mung on December 5, 2015 at 12:45 am said:

Tom English: I felt like shit even before Pooh Bear dumped a honey-glazed load on us.

Are you talking about Winston? I saw him in person just a couple weeks ago and he didn’t look like a Pooh Bear to me, though I did find myself wanting to hug him. Weird that. Shakes self.
Tom English on December 5, 2015 at 6:34 am said:

Mung: I did find myself wanting to hug him. Weird that. Shakes self.

I long wanted to give him a good shaking. Not so weird, that.
Patrick on December 5, 2015 at 11:51 am said:

Mung,

Are you talking about Winston? I saw him in person just a couple weeks ago and he didn’t look like a Pooh Bear to me, though I did find myself wanting to hug him. Weird that. Shakes self.

Yeah, I feel bad for the kid too. He’s fallen in with a bad crowd. Perhaps he still has time to salvage his career, though.
stcordova on December 5, 2015 at 8:33 pm said:

As a long time ID proponent and acquaintance of Bill Dembski, Robert Marks and Winston Ewert, I respect what they’ve done in as much as I respect academic exercises.

But I’ve personally not been able to apply the CoI.

Say I have a 100 bit password. Hypothetically on average 1 out of 2^100 random algorithms could “solve” the password in the first pass because of dumb luck. I could then single out that one lucky 1 in 2^100 algorithm and say, the “teleological information content of this algorithm is 100 bits. Now I’ve measured the teleological capacity above random front loaded into the process that made the algorithm succeed.” Uh, that’s sort of trivial result. CoI deals with genetic type algorithms, but it strikes me that the teleological information (the fortuitous success of one algorithm for one sample space), is not much different than the “insight” in my toy example. It just takes more math to demonstrate the result. That’s a good academic exercise. But I haven’t been able to connect the CoI result in any meaningful evaluation of evolutionary theories.

Additionally, CoI seems like a misnomer. Most communication engineers I would suppose would wish information were actually conserved when their hard drives crash and their communication channels are scrambled, since if there were a CoI, then maybe they could concoct a way to recover what was lost (much like a pendulum recovering kinetic energy by converting the potential energy because of the law of conservation of energy in a conservative gravitational field). But alas, I don’t see anything resembling what I think a CoI should entail. I would think CoI is really a CoP (conservation of average performance over many samples spaces).

If we assume Darwinian selection succeeded in evolving the features of life, then one might assert, “WOW, selection was front loaded with a buzzilion bits above random”.

There is another hypothetical evolutionary route at least in terms of conception (not necessarily biological evolution). I’ve alluded to it many times in my essays on the net. Namely, why is there a need to search at all? I can write an algorithm that makes targets that it always hits. What do I mean? I can write an algorithm that describes the geometry of lock, and then it’s a matter of straightforward geometry to describe the geometry of a complementary key to open the lock. There is no absolute need for a genetic algorithm. One can see how this might be a way that biology evolves if it’s genome has some sort of foresight or goal like an artificial intelligence. That is the view of some like Caporale, namely, the mutations are not purely random with respect to possible novel function.

Or how about a hypothetical Robot that makes a large number of randomly specified Rube Goldberg machines where the specifications are somewhat randomly generated by the Robot, but the Robot can them implement the specifications provided the specifications are physically realizable and resource are available. Hence Specified Complexity (in the Orgel qualitative sense) in the form of more and more Rube Goldberg machines can increase in an unbounded fashion provided there are enough physical resources. The NFL theorems don’t even need to apply since they are irrelevant to such a scenario.

We can even do this in a computer. We can write a variant of a Quine program (call it a quasi-Quine) that will write itself, but then with each generation add a few more extra source code lines of functionally useless computation (with respect to reproduction) — like adding a for loop that adds numbers 1 to 10, and just stores the result with no further use. The extra code can be seeded by a random number generator but then constrained to preserve the fundamental goal of the quasi-Quine to keep reproducing and increasing in source code size with each generation. It will grow in Rube Goldberg complexity ( more conditional statements and loops increase computation complexity, at least according to typical software metrics).

Each generation makes it do a few more unnecessary computations with the only constraint the next Quine generation is reproductively viable and conveys the policy of making a bigger quasi-Quine with each generation. Specified complexity will grow unbounded since this sort of strategy is exempt from the bounds of NFL since the quasi-Quine builds only targets it knows it can hit. After a few thousand generations, the resulting quasi-Quine contraption might be as revolting to Darwin as a Peacock’s tail.

I think the more fruitful approach for ID as far as critiques of Darwinian evolution has been simply leveraging the work of the population geneticists like Kimura and his successors. With a successful refutation of Darwinian evolution (as I believe Kimura had done), then other modes of evolution (neutral theory) can be also critiqued.

I also have been thinking along some other lines which I hope to write on in the future. I’ve hinted at the approach based on the idea of Multiplicity and analogies to the Law of Large numbers and challenges of biochemistry.

PS
Glad to see Tom citing Mark Toissant. Thought I’d never hear that name again. I hope Mark is well. They guy was pretty bright from what I read of his research. A beautiful mind indeed.
Mung on December 6, 2015 at 3:04 am said:

Part 4: Breaking Sticks
Alan Fox on December 6, 2015 at 11:23 am said:

Nick Matzke at Pandas Thumb highlights the key concession from Winston Ewert. That fitness landscapes (the niche environment in reality) impart non-randomness to the population of organisms. And, as Nick indicates, Jason Rosenhouse also points out some problems with Ewert’s latest offerings at EN&V.

Ewert:

I agree that weak long-range interactions should produce a fitness landscape somewhat smoother than random chance and this fitness landscape would thus be a source of some active information.

ETA: link
rob on December 11, 2015 at 9:31 pm said:

Coming back from a long hiatus, I read Tom’s post and Winston Ewert’s response, which surprised me. Ewert is nothing if not intelligent, but he seems to not understand Tom’s illustration, nor the point of his post. I’m guessing that Ewert was in a hurry when he read and responded. Regardless, Ewert deserves a lot of credit for actually engaging his critics, and doing so without a lot of tap dancing and bafflegab.

In reponse to one of Tom’s simple examples, Ewert says

This calculation is absolutely not active information. Active information is defined as being the logarithm of relative probabilities. English can be forgiven for skipping the logarithm..

I should let that last statement pass, but I can’t resist commenting. Log-transformed probabilities are handy in some fields, particularly those in which the concept of entropy is useful. DEM’s CoI is not one of those fields, as seen by two failed attempts to establish the utility of “active entropy”.

Note that when Dembski is trying to be really clear, he drops the log transformation. Do Dembski and Tom really need forgiveness for omitting something that serves no purpose?

Ewert continues:

but using the observed length of a stick instead of a probability is nonsensical. If English wishes to criticize active information, he should actually follow the definitions of active information.

Maybe if I explain it, then the combination of Tom’s explanation and mine will light the bulb. Here’s a few sentences from Tom:

The random lengths of the segments of the stick are the probabilities. The stick is ordinarily of unit length, because the probabilities must sum to 1.

Since the mapping from segment number to segment length is a function whose values sum to 1 and are obviously all positive, the function is a valid probability distribution. If Ewert wants more justification than this, he needs to tell us what he’s looking for.

Active information (without the log transformation) is the factor by which a probability deviates from a baseline probability. In terms of sticks, let’s define our baseline to be a unit-length stick divided into N equal segments, each of length 1/N. Now consider an alternate stick scenario in which the segment lengths deviate from the baseline. If a certain segment has length L, then the factor by which it deviates from the baseline is L/(1/N), or N*L. Now imagine scaling our sticks by N, so the new segment length L’ is N*L, which is the same value as the deviation factor we just calculated. So if we just use sticks of length N rather than 1, the active information is the segment length L’.

With regards to the point of Tom’s post, the broken sticks illustration shows that DEM’s CoI theorems don’t work if we choose the target post hoc. If we try to apply CoI to evolution’s bias toward birds, we’re choosing the target based on the outcome of the higher-level search (the higher-level search being whatever it is that gave birth to evolution as we know it), which violates an implicit assumption of CoI.
rob on December 12, 2015 at 12:54 am said:

One more thing. Ewert says:

[Tom] argues that we can obtain active information without any bias in the initial process. Since active information is a measurement of bias in a system, this would mean that our use of active information was fundamentally broken.

If by “initial process” Ewert means “higher-level search”, then obtaining active information from an unbiased initial process is commonplace. Every higher-level search in DEM’s CoI theorems yields active info quite readily, even though its average output has zero active info. Furthermore, if the maximally likely outcome is chosen as the target after each higher-level trial, a la Tom’s post, the average will be more than zero.

And even further, consider that each CoI theorem is conditioned on more than just a uniform higher-level distribution. A higher-level search can be uniformly distributed, but not meet the other conditions of any of the CoI theorems, and yield active info on average.

The above facts are fatal to DEM’s attempts to use CoI in their ID arguments. For someone to legitimately apply CoI to a biological process (or any other real world process), they would need to:

1) State which CoI theorem they’re appealing to.

2) Show that the overall process meets the conditions of the CoI theorem, which would entail modeling it mathematically.

3) Determine the average output of the higher-level process, which would probably require multiple trials.

4) Choose the target before the trials take place, or explain why choosing the target post hoc doesn’t invalidate the appeal to CoI. (Ewert may appeal to specification for this one, which I don’t think would work, but that’s another post.)

Obviously nothing like the four steps above has been done, or likely can be done, for biological evolution or any other process of interest to ID. So in the race to support ID, CoI remains in the starting gate.
petrushka on December 12, 2015 at 1:33 am said:

I find Terry Pratchett’s law of conservation of reality much more useful. It applies directly to any putative designer of life.

The effort required to design or create life using poofery must be equal the effort required using evolution.
Tom English on December 12, 2015 at 10:54 am said:

rob gets it!

Rob, I’d appreciate suggestions on how to improve the presentation, and get the general reader to get it. Sorry not to respond to particular points. I’m feeling a mite puny.
rob on December 12, 2015 at 11:52 am said:

Tom, I can’t improve on your presentation. The animation is a great visual aid. Like I said, Ewert seems to have been phoning it in — his failure to understand says nothing about your pedagogical skills or his intelligence.
Tom English on December 12, 2015 at 6:00 pm said:

rob,

Thanks. Now a technical question… Is the “search for a search” anything more than a mixture distribution?
Joe Felsenstein on December 13, 2015 at 5:38 pm said:

This discussion is very helpful to me in understanding Tom’s example. I was missing the explanation that what the breaking of the stick was doing was coming up with probabilities of outcomes. And also the explanation of how the Target was chosen. I am still a little fuzzy on that: does “after the fact” mean after the probabilities are assigned, or after the occurrence of the event that has those probabilities?

I note also that if we break the stick at random into N pieces, and then use their lengths as probabilities, and chose one piece, there is another way to get the same result. That is to choose a point on the original unit stick at random, then break the stick into N pieces at random, and then choose the piece that contains that predesignated point. Does that equivalence somehow help in the argument?
Tom English on December 16, 2015 at 1:55 am said:

Joe Felsenstein,

I’m trying to strip away as much as possible of the teleology-begging terminology pushed by Dembski et al. Bias is measured with respect to an event — any event. Lately I’ve come to believe that it’s a bad idea to refer to that event as a target. The problem with saying that we target an event when measuring bias is that it fosters the misimpression that bias is measured on the target prespecified in the CoI theorem, or on a target with a specification that is detachable, in the sense of Dembski, from the event itself.

In the GUC Bug example, we post-specify the fittest genotype, and because it has no property other than fitness, it has no detachable specification. We could have post-specified instead the genotype $g^*$ terminating the greatest number of lineages, which is the most probable outcome of the GUC Bug. Then $g^*$ is the (atomic) event for which the bias, calculated with reference to the uniform distribution on genotypes, is maximal.

If we play along with the notion that the GUC Bug’s probability distribution was drawn uniformly at random from the set of all distributions on the genotypes, and visualize the probabilities of genotypes as random lengths of segments of a broken stick of unit length, then the probability of $g^*$ is the length of the longest segment. To make this somewhat more rigorous, let’s say that the set of all genotypes is indexed

$G = \{g_1, g_2, \ldots, g_n\},$

with $n = 4^{1000}$ for the GUC Bug. Also let

$\mathbf{Q} = (Q_1, Q_2, \ldots, Q_n)$

be uniformly distributed on the unit $(n-1)$ -simplex. It is $\mathbf{Q}$ that we visualize as a stick of unit length, broken randomly into $n$ pieces. The probability is $Q_i$ that genotype $g_i$ is the outcome. For a uniform reference distribution, the corresponding biases are

$\mathbf{B} = \frac{1}{1/n} \mathbf{Q} = (n Q_1, n Q_2, \ldots, n Q_n).$

It is $\mathbf{B}$ that we visualize as a broken stick $n$ units in length. The CoI theorem tells us that for prespecified target $\{g_1\},$

$\Pr \left \{B_1 \geq b \right \} \leq \frac{1}{b},$

$b \geq 1.$ The bias in favor of our post-specified event $g^*$ is $B_{(n)},$ the maximum of the $n$ random biases, i.e., the length of the longest segment of a length- $n$ stick broken randomly into $n$ pieces. The expected bias for $g_1$ is $E[B_1] = 1,$ while the expected bias in favor of $g^*$ is

$\begin{align*} E\!\left [ B_{(n)} \right ] &\approx \ln n + \gamma \\ &\approx \ln 4^{1000} + 0.5772 \\ &\approx 1387, \end{align*}$

where $\gamma$ is the Euler-Mascheroni constant. However, it follows from our previous analysis of the GUC Bug that the bias in favor of genotype $g^*$ is at least 3001. Given that 3001 is more than twice the expected value of $B_{(n)},$ it should not be terribly hard to accept that $\Pr\{B_{(n)} \geq 3001\}$ is very close to zero, though I omit the details here.

The upshot is that, even when I patch up the analysis of Dembski et al., the GUC Bug exposes the silliness of the notion that a randomly generated probability distribution on genotypes is somehow relevant to analysis of biological evolution.
Joe Felsenstein on December 16, 2015 at 11:07 pm said:

Tom English,

OK, I think I see how your broken-stick process fits in. It is, in effect, assigning bias to the segments, where the probability of choosing that segment is proportional to that bias. (Right?).

So we don’t have a process that always ends up choosing the longest stick.

I need to get that clear.
Tom English on December 18, 2015 at 2:57 am said:

Joe Felsenstein (and Alan Fox),

You’ve helped me a lot, identifying points that are unclear, and that are hard for me to see are unclear. But the big tip on how to clarify and simplify (enormously) came from Alan.

Alan Fox: Assume a roulette wheel with six sectors. By varying the segments and playing, the chance of the pill dropping into any sector varies as to the size of each sector, while the overall probability of the pill landing in a sector is 1. Is this equivalent to your sticks?

My next animation will randomly slice a big, easy-to-see pie, and straighten arcs of the perimeter into line segments. I won’t display the segments as a broken stick. What I have in mind will allow me to label the sectors with the 8 “genotypes” 000, 001, …, 111.

Here is my new point of departure: In their analysis, Dembski et al. render evolution indistinguishable from a spin of a biased roulette wheel with, say, genotypes labeling the sectors. The wheel is biased in the sense that the sectors differ in width, with some genotypes more likely than others to be the outcome of the spin. If the sector for a genotype is wider than average, then evolution is biased in favor of that genotype. If the sector is narrower than average, then evolution is biased against the genotype.

What Dembski et al. refer to as an evolutionary “search” is a spin of the “wheel of evolution.” What they refer to as the “search for a search” is a process that randomly splits the wheel into sectors, before it is spun, thereby determining for each genotype the chance that it will be the outcome of evolution. The CoI theorem addresses the random formation of the wheel, not the subsequent spin of the wheel. It tells us, for instance, that the probability is at most 1/2 that a given sector will turn out to be at least 2 times as wide as the average sector. More generally, the probability is at most $1/b$ that a given sector will turn out to be at least $b$ times as wide as the average sector. This assumes that all possible splits of the wheel of evolution into sectors, one for each genotype, are equally likely to occur.

That’s just a first stab at a better explanation. It’s clearly going much better than the post did.

So we don’t have a process that always ends up choosing the longest stick.

Now we’re talking about the widest sector of the wheel of evolution, corresponding to the genotype that is the most likely outcome of the evolutionary process. The scientist is free to choose the likeliest genotype, precisely because it is the likeliest, and to compare its chance of occurrence (the width of its sector) to the average chance of occurrence of a genotype (the average width of the sectors). The CoI theorem says nothing as to what the width of the widest sector will turn out to be, in random formation of the wheel of evolution.
Alan Fox on December 18, 2015 at 10:41 am said:

Tom English: But the big tip on how to clarify and simplify (enormously) came from Alan.

*blushes* Glad to have (inadvertently) been of assistance, Tom. Look forward to developments.
Joe Felsenstein on December 18, 2015 at 3:26 pm said:

Tom, the roulette wheel analogy is a good one. I suggest that you not unroll the wheel to make it into a line segment. Just keep it a wheel and spin it. Let’s assume that the final posititon of the ball on the circular track is uniformly distributed. So the fraction of the track that is in one of the bins is also the probability that the ball ends up there.

To get the CoI theorem to apply, we would need to have a uniform distribution on the space of linear combinations of outcomes. I don’t see where that is here. Then we would show that the probability assigned to a particular bin is on average across all of those combinations equal to 1/N (when there are N bins).

If we pick our Target by picking one of the N bins at random, it is of course true that the probability that the ball lands in it is 1/N. If we pick our Target by picking a bin with probability proportional to its size, then the probability of the ball landing in that Target bin is of course greater than 1/N. And if we pick it by picking always the largest bin, then the math you gave earlier for sticks applies. In those cases we must have violated the CoI conditions somehow. I think we violate it by choosing the bin nonuniformly.

As for the mathematical conditions you give, they are not, as far as I can see, the CoI. The statement that

the probability is at most 1/b that a given sector will turn out to be at least b times as wide as the average sector

is true, but I don’t see that it is the CoI, but rather Markov’s Inequality for nonnegative random variables.
Tom English on December 18, 2015 at 7:55 pm said:

Joe Felsenstein: I suggest that you not unroll the wheel to make it into a line segment. Just keep it a wheel and spin it.

I’d planned on doing that also — just didn’t mention it.

Joe Felsenstein: If we pick our Target by picking a bin with probability proportional to its size, then the probability of the ball landing in that Target bin is of course greater than 1/N.

Similarly, we can wait to see where the ball lands. Let’s say that the roulette wheel is of circumference 1. For all biased roulette wheels, the expected width (arc length) of the bin in which the ball lands is greater than $1/N.$ Put simply, we expect the process to be biased in favor of the outcome we actually observe.

Joe Felsenstein: In those cases we must have violated the CoI conditions somehow. I think we violate it by choosing the bin nonuniformly.

We can commit absolutely to the first bin, and CoI works fine. We violate a CoI condition when the choice of bin depends in some way on the width of the bin.

Joe Felsenstein: As for the mathematical conditions you give, they are not, as far as I can see, the CoI.

I’ve written $b$ where Dembski et al. write $q/p,$ and $1/b$ where they write $p/q.$ With $p=1/N,$ we have $q/p = N q$ and $p/q = 1/(N q).$ I think it’s clearer to write $b = N q.$

Joe Felsenstein: Markov’s Inequality for nonnegative random variables.

Very nice catch.

I want to triple-check before I crow. But I believe that you, Alan, and I, bouncing ideas off each other in plain sight, have reduced the CoI theorem (uniform case) to an application of Markov’s inequality.
Joe Felsenstein on December 18, 2015 at 11:45 pm said:

I’m not sure that I see all the steps of the reduction of the CoI to Markov’s Inequality. The inequality is of course fairly intuitive: if we have a random variable X that has mean M, and if we consider values of X that are greater than MB, those can’t occur more than 1/B of the time (or else too great a contribution is made to the mean). It’s not one of the deep profound theorems that gives us unexpected insights.

I think that Tom’s work on the CoI and mine now are addressing different issues. His asks whether the method of choice of the target T, in models of “evolutionary search” makes it invalid to use the Dembski-Ewert-Marks Conservation of Information theorem.

Mine accepts the theorem but questions its application. DEM show that, starting from a gemisch of all possible “searches”, ones that do well in improving adaptation are rare. I point out that the gemisch includes all sorts of silly-ass searches, including ones that actively try to find genotypes of worse fitness. And discarding the silly-ass ones, keeping all the ones that have genotypes which have fitnesses, and whose fitnesses actually affect the reproduction of the individuals, enriches greatly for searches that improve adaptation.

So I think that these are different arguments, though not contradictory.
Tom English on December 19, 2015 at 4:44 am said:

Joe Felsenstein: I think that Tom’s work on the CoI and mine now are addressing different issues.

I’ve tried all along to get that across. In the introduction to the post, I say that you dispatched the notion that bias (“active information”) implies purpose, and that I will be taking on the CoI theorem. Measuring the GUC Bug’s bias with respect to the fittest genotype is not a response to the theorem, because the theorem assumes that the target is specified independently of the randomly generated probability distribution (“roulette wheel”), and the fittest genotype is instead part of our definition of the distribution.

DEM show that, starting from a gemisch of all possible “searches”, ones that do well in improving adaptation are rare. I point out that the gemisch includes all sorts of silly-ass searches, including ones that actively try to find genotypes of worse fitness.

You seem to be thinking of the “search” representation in the first three sections of the paper. Dembski et al. show, i.e., prove, nothing in terms of that representation. The CoI theorem addresses nothing less abstract than randomly generated probability distributions. (That is why it’s hard to explain.) You played along with their preliminary formulation of search, adapting an old model from theoretical biology to suit it. But Dembski et al. throw away the number $m$ of steps in the process. The CoI theorem doesn’t distinguish between a tornado in a junkyard and evolution over a period of four billion years. It’s based on a silly-ass representation (which Ewert silently abandons in “The GUC Bug”).

And discarding the silly-ass ones, keeping all the ones that have genotypes which have fitnesses, and whose fitnesses actually affect the reproduction of the individuals, enriches greatly for searches that improve adaptation.

This makes sense only in terms of the preliminary representation (stochastic process), not the representation in the CoI theorem (probability distribution).

Looking back at our PT post, trying to understand how we’re missing each other, I find this error:

Remember that if we drew at random from a distribution (a “search”) which itself was randomly chosen from the simplex of all probability distributions, we would have only a probability p of hitting the target.

We’ve slipped into using the term target equivocally. (At last I’ve put my finger on what was bothering me a year ago.) We measure bias with respect to a post-specified event, the fittest genotype. So we shouldn’t speak, as we do here, as though the event were a prespecified target. I suggested several comments ago that we avoid confusion by reserving the term target for a prespecified event. Now I see that it’s ourselves, and not just others, that we need to avoid confusing.
Tom English on December 19, 2015 at 6:15 am said:

Joe Felsenstein: I’m not sure that I see all the steps of the reduction of the CoI to Markov’s Inequality. The inequality is of course fairly intuitive: if we have a random variable X that has mean M, and if we consider values of X that are greater than MB, those can’t occur more than 1/B of the time (or else too great a contribution is made to the mean). It’s not one of the deep profound theorems that gives us unexpected insights.

There’s indeed nothing deep in Markov’s inequality. It’s important to show just how little there actually is to the CoI theorem (uniform case, at least). I know that I need to be explicit. It would be a pain, under the best of circumstances. And this is far from the best. Here is what I can easily bash out.

Let non-negative, integrable random variables $Q_1, Q_2, \ldots, Q_N$ be jointly distributed, such that

$Q_1 + Q_2 + \cdots + Q_N = 1.$

[ETA: The random widths of the cups of the unit-circumference roulette wheel sum to 1.] Then the random variable

$Q^\prime = Q_1 + Q_2 + \cdots + Q_K,$

$K \leq N,$ is also non-negative and integrable. [ETA: This combines the first $K$ cups into a target. We can choose any $K$ of the cups, but I’m going with the first $K$ to match the statement of the CoI theorem.] Write $p = E[Q^\prime].$ By Markov’s inequality,

(1) $\begin{equation*} \Pr\{Q^\prime \geq q\} \leq \frac{p}{q},$$ \end{equation*}$

$q > 0.$ That’s all, folks. [ETA: The overall width of the target is on average $p.$ For bias $b=q/p,$ the factor by which the overall width of the target differs from the average, $\Pr\{Q^\prime/p \geq b\} \leq 1/b.$ The probability is at most $1/b$ that the bias is at least $b.$ The uniform case of the CoI theorem is a special case of this elementary result.]

If the random vector $(Q_1, Q_2, \ldots, Q_N)$ is distributed uniformly on the unit $(N-1)$ -simplex, then $E[Q_i] = 1/N,$ $i = 1, 2, \ldots, N,$ and

$p = E[Q^\prime] = \frac{K}{N}.$

(I need a fairly rigorous statement that the uniform distribution on the simplex is equivalent to the uniform measure on probability measures. You can see at least that the value of $p$ matches the uniform probability $\mathbf{U}(T)$ of the target

$\begin{align*} T &=\{\omega_1, \omega_2, \ldots, \omega_K\} \\ &\subseteq \{\omega_1, \omega_2, \ldots, \omega_N\} \\ &= \Omega \end{align*}$

specified in CoI theorem.) From the abstract of Dembski’s seminar talk:

Mathematically, CoI sets limits on the information cost incurred when the probability of success of a targeted search gets raised from $p$ to $q$ ( $p$ < $q$ ), that cost being calculated in terms of the probability $p/q$ .

Please, please, please tell me that you can see in Eqn. 1 the probability of the target going from $p$ to $q$ in the left-hand side, and the “probabilistic cost” of $p/q$ in the right-hand side. [ETA: This is clearer to me when I write $\Pr\{Q^\prime/p \geq q/p\} \leq p/q.$ ]
Joe Felsenstein on December 19, 2015 at 11:05 pm said:

Tom English: We can choose any of the cups, but I’m going with the first to match the statement of the CoI theorem.] Write By Markov’s inequality,

$\Pr\{Q' \geq q\} \leq \frac{q}{p}$

That’s all, folks.

I hope that’s not all, because it went by me too fast. Up till that point, I didn’t see $p$ and $q$ anywhere. And suddenly they appear out of nowhere. So some more words are needed.

(I need a fairly rigorous statement that the uniform distribution on the simplex is equivalent to the uniform measure on probability measures.

That’s the lifting/lowering machinery of Ewert, Dembski, and Marks. I suspect you need to explain something like that, maybe more simply than they do.
Tom English on December 20, 2015 at 12:20 am said:

Joe Felsenstein: That’s the lifting/lowering machinery of Ewert, Dembski, and Marks. I suspect you need to explain something like that, maybe more simply than they do.

The lifting/lowering plays no role in the uniform case of the CoI theorem. As for the general case, I don’t believe that the lifting is coherent. And neither does DiEb.

Joe Felsenstein: Up till that point, I didn’t see p and q anywhere.

You’ve lost, copying from my comment, the expression that comes in the middle of “Write By Markov’s inequality,” defining $p = E[Q^\prime],$ and the expression that comes immediately after the display equation, stipulating $q > 0.$
Tom English on December 22, 2015 at 12:59 am said:

Joe Felsenstein: DEM show that, starting from a gemisch of all possible “searches”, ones that do well in improving adaptation are rare. I point out that the gemisch includes all sorts of silly-ass searches, including ones that actively try to find genotypes of worse fitness.

What you’re saying here need not be changed much, I think, to be fairly accurate. Here’s a quick hint: The distribution on genotypes “arrived at” in evolution by natural selection is strongly nonuniform because most genotypes are probably supplanted by others that are higher in fitness. (There is no reason to prefer, in the measurement of bias, a nonuniform reference distribution to the uniform distribution on genotypes.)
Tom English on December 22, 2015 at 1:06 am said:

I’m tending away from the River Styx now, perhaps toward the Land of the Living. Apologies to all, and especially to Lizzie, for being so slow in completing the post. I’ve made some progress today.
Joe Felsenstein on December 22, 2015 at 1:21 am said:

Tom English: The distribution on genotypes “arrived at” in evolution by natural selection is strongly nonuniform because most genotypes are probably supplanted by others that are higher in fitness.

The important point, to me, is that evolutionary processes arrive at a much-better-than-uniform distribution even if the fitness surface is a “white noise” surface. And in addition, “because physics”, the fitness surface is expected to be much smoother than that, so one would get much more success than that. So if we see Active Information we aren’t necessarily seeing any evidence of Design of the fitness surface.

I think your (Tom’s) main point is different, and concerns whether it is valid to use the CoI theorem if the target is chosen after the fact, or in the midst of things.
J-Mac on September 12, 2017 at 3:14 am said:

Tom English: “By the way, I am not conflating all probabilities in scientific models with physical chances, as Dembski et al. generally do. Much of what is modeled as random in biological evolution is merely uncertain, not attributed to quantum chance. The vitally important topic of interpretations of probability, which Dembski deflects with a false analogy to interpretations of quantum mechanics (Being as Communion, p. 157), will have to wait for another post.

Did you keep your word and expanded this idea? I would like to read it and comment on it… especially on the clear implication of quantum coherence and quantum entanglement in cell-differentiation and morphogensis… I think you and Joe Felsenstein would like that … 😉