LCI or No LCI, Information Can Appear by Chance

(Preamble: I apologize in advance for cluttering TSZ with these three posts. There are very few people on either side of the debate that actually care about the details of this “conservation of information” stuff, but these posts make good on some claims I made at UD.)

To see that active information can easily be created by chance, even when the LCI holds, we’ll return to the Bertrand’s Box example. Recall that the LCI holds for this example, and all choices are strictly random. Recall further that choosing the GG box gives us 1 bit of active information since it doubles our chance of getting a gold coin. If we conduct 100 trials, we expect to get the GG box about 33 times, which means we expect 33 bits of active information to be generated by nothing but chance.

But before we say QED, we should note a potential objection, namely that we also expect to get SS about 33 times, and each such outcome gives us negative infinity bits of active information. So if we include the SS outcomes in our tally of active information, the total is negative infinity. Be that as it may, the fact remains that in 33 of the trials, 1 bit of information was generated. This fact is not rendered false by the outcomes of other trials, so those 33 trials produced 33 bits of information.

A Free Lunch of Active Info, with a Side of LCI Violations

(Preamble: I apologize in advance for cluttering TSZ with these three posts. There are very few people on either side of the debate that actually care about the details of this “conservation of information” stuff, but these posts make good on some claims I made at UD.)

Given a sample space Ω and a target T ⊆ Ω, Dembski defines the following information measures:

Endogenous information: IS ≡ -log2( P(T) )
Exogenous information: IΩ ≡ -log2( |T|/|Ω| )
Active information: I+ ≡ IΩ – IS = log2( P(T) / |T|/|Ω|)

Active information is supposed to indicate design, but in fact, the amount of active info attributed to a process depends on how we choose to mathematically model that process. We can get as much free active info as we want simply by making certain modeling choices.

Free Active Info via Individuation of Possibilities

Dembski is in the awkward position of having impugned his own information measures before he even invented them. From his book No Free Lunch:

This requires a measure of information that is independent of the procedure used to individuate the possibilities in a reference class. Otherwise the same possibility can be assigned different amounts of information depending on how the other possibilities in the reference class are individuated (thus making the information measure ill-defined).

He used to make this point often. But two of his new information measures, “endogenous information” and “active information”, depend on the procedure used to individuate the possible outcomes, and are therefore ill-defined according to Dembski’s earlier position.

To see how this fact allows arbitrarily high measures of active information, consider how we model the rolling of a six-sided die. We would typically define Ω as the set {1, 2, 3, 4, 5, 6}. If the goal is to roll a number higher than one, then our target T is {2, 3, 4, 5, 6}. The amount of active information I+ is log2(P(T) / (|T|/|Ω|)) = log2((5/6) / (5/6)) = 0 bits.

But we could, instead, define Ω as {1, higher than 1}. In that case, I+ = log2((5/6) / (1/2)) = .7 bits. What we’re modeling hasn’t changed, but we’ve gained active information by making a different modeling choice.

Furthermore, borrowing an example from Dembski, we could distinguish getting a 1 with the die landing on the table from getting a 1 with the die landing on the floor. That is, Ω = { 1 on table, 1 on floor, higher than 1 }. Now I+ = log2((5/6) / (1/3)) = 1.3 bits. And we could keep changing how we individuate outcomes until we get as much active information as we desire.

This may seem like cheating. Maybe if we stipulate that Ω must always be defined the “right” way, then active information will be well-defined, right? But let’s look into another modeling choice that demonstrates that there is no “right” way to define Ω in the EIL framework.

Free Active Info via Inclusion of Possibilities

Again borrowing an example from Dembski, suppose that we know that there’s a buried treasure on the island of Bora Bora, but we have no idea where on the island it is, so all we can do is randomly choose a site to dig. If we want to model this search, it would be natural to define Ω as the set of all possible dig sites on Bora Bora. Our search, then, has zero active information, since it is no more likely to succeed than randomly selecting from Ω (because randomly selecting from Ω is exactly what we’re doing).

But is this the “right” definition of Ω? Dembski asks the question, “how did we know that of all places on earth where the treasure might be hidden, we needed to look on Bora Bora?” Maybe we should define Ω, as Dembski does, to include all of the dry land on earth. In this case, randomly choosing a site on Bora Bora is a high-active-information search, because it is far more likely to succeed than randomly choosing a site from Ω, i.e. the whole earth. Again, we have changed nothing about what is being modeled, but we have gained an enormous amount of active information simply by redefining Ω.

We could also take Dembski’s question further by asking, “how did we know that of all places in the universe, we needed to look on Bora Bora?” Now it seems that we’re being ridiculous. Surely we can take for granted the knowledge that the treasure is on the earth, right? No. Dembski is quite insistent that the zero-active-information baseline must involve no prior information whatsoever:

The “absence of any prior knowledge” required for uniformity conceptually parallels the difficulty of understanding the nothing that physics says existed before the Big Bang. It’s common to picture the universe before the Big Bang is a large black void empty space. No. This is a flawed image. Before the Big Bang there was nothing. A large black void empty space is something. So space must be purged from our visualization. Our next impulse is then, mistakenly, to say, “There was nothing. Then, all of a sudden…” No. That doesn’t work either. All of a sudden presupposes there was time and modern cosmology says that time in our universe was also created at the Big Bang. The concept of nothing must exclude conditions involving time and space. Nothing is conceptually difficult because the idea is so divorced from our experience and familiarity zones.

and further:

The “no prior knowledge” cited in Bernoulli’s PrOIR is all or nothing: we have prior knowledge about the search or we don’t. Active information on the other hand, measures the degree to which prior knowledge can contribute to the solution of a search problem.

To define a search with “no prior knowledge”, we must be careful not to constrain Ω. For example, if Ω consists of permutations, it must contain all permutations:

What search space, for instance, allows for all possible permutations? Most don’t. Yet, insofar as they don’t, it’s because they exhibit structures that constrain the permissible permutations. Such constraints, however, bespeak the addition of active information.

But even if we define Ω to include all permutations of a given ordered set, we’re still constraining Ω, as we’re excluding permutations of other ordered sets. We cannot define Ω without excluding something, so it is impossible to define a search without adding active information.

Active information is always measured relative to a baseline, and there is no baseline that we can call “absolute zero”. We therefore can attribute an arbitrarily large amount of active information to any search simply by choosing a baseline with a sufficiently large Ω.

Returning our six-sided die example, we can take the typical definition of Ω as {1, 2, 3, 4, 5, 6} and add, say, 7 and 8 to the set. Obviously our two additional outcomes each have a probability of zero, but that’s not a problem — probability distributions often include zero-probability elements. Inclusion of these zero-probability outcomes doesn’t change the mean, median, variance, etc. of the distribution, but it does change the amount of active info from 0 to log2((1/6) / (1/8)) = .4 bits (given a target of rolling, say, a one).

Free Violations of the LCI

Given a chain of two searches, the LCI says that the endogenous information of the first search is at least as large as the active information of the second. Since we can model the second search to have arbitrarily large active information, we can always model it such that its active information is larger than the first search’s endogenous information. Thus any chain of searches can be shown to violate the LCI. (We can also model the first search such that its endogenous information is arbitrarily large, so any chain of searches can also be shown to obey the LCI.)

The Law(?) of Conservation of Information

(Preamble: I apologize in advance for cluttering TSZ with these three posts. There are very few people on either side of the debate that actually care about the details of this “conservation of information” stuff, but these posts make good on some claims I made at UD.)

For the past three years Dembski has been promoting his Law of Conservation of Information (LCI), most recently here. The paper he most often promotes is this one, which begins as follows:

Laws of nature are universal in scope, hold with unfailing regularity, and receive support from a wide array of facts and observations. The Law of Conservation of Information (LCI) is such a law.

Dembski hasn’t proven that the LCI is universal, and in fact he claims that it can’t be proven, but he also claims that to date it has always been confirmed. He doesn’t say whether he as actually tried to find counterexamples, but the reality is that they are trivial to come up with. This post demonstrates one very simple counterexample.


First we need to clarify Dembski’s terminology. In his LCI math, a search is described by a probability distribution over a sample space Ω. In other words, a search is nothing more than an Ω-valued random variable. Execution of the search consists of a single query, which is simply a realization of the random variable. The search is deemed successful if the realized outcome resides in target T ⊆ Ω. (We must be careful to not read teleology into the terms search, query, and target, despite the terms’ connotations. Obviously, Dembski’s framework must not presuppose teleology if it is to be used to detect design.)

If a search’s parameters depend on the outcome of a preceding search, then the preceding search is a search for a search. It’s this hierarchy of two searches that is the subject of the LCI, which we can state as follows.

Given a search S, we define:

  • q as the probability of S succeeding
  • p2 as the probability that S would succeed if it were a uniform distribution
  • p1 as the probability that a uniformly distributed search-for-a-search would yield a search at least as good as S

The LCI says that p1 ≤ p2/q.


In thinking of a counterexample to the LCI, we should remember that this two-level search hierarchy is nothing more than a chain of two random variables. (Dembski’s search hierarchy is like a Markov chain, except that each transition is from one state space to another, rather than within the same state space.) One of the simplest examples of a chain of random variables is a one-dimensional random walk. Think of a system that periodically changes state, with each state transition represented by a shift to the left or to the right on an state diagram. If we know at a certain point in time that it is in one of, say, three states, namely n-1 or n or n+1, then after the next transition it will be in n-2, n-1, n, n+1, or n+2, as in the following diagram:

Assume that the system is always equally likely to shift left as to shift right, and let the “target” be defined as the center node n. If the state at time t is, say, n-1, then the probability of success q is 1/2. Of the three original states, two (namely n-1 and n+1) yield this probability of success, so p1 is 2/3. Finally, p2 is 1/5 since the target consists of only one of the final five states. The LCI says that p1 ≤ p2/q. Plugging in our numbers for this example, we get 2/3 ≤ (1/5)/(1/2), which is clearly false.

Of course, the LCI does hold under certain conditions. To show that the LCI to biological evolution, Dembski needs to show that his mathematical model of evolution meets those conditions. This model would necessarily include the higher-level search that gave rise to the evolutionary process. As will be shown in the next post, the good news for Dembski is that any process can be modeled such that it obeys the LCI. The bad news is that any process can also be modeled such that it violates the LCI.

Apologies to Kairosfocus and Petrushka

It seems I have given great offence to the commenter, Kairosfocus, at Uncommon Descent with my comment:

I see Kairosfocus is reading comments here.


I can’t tell for sure but is KF owning up to or denying banning mphillips? In case he finds time to read more…

Come on over, KF and, so long as you don’t link to porn and can be succinct enough not to overload the software, you will be very welcome, I’m sure!


I would like first to point out to Kairosfocus that he is mistakenly attributing the comment to Petrushka, a fellow commenter here and elsewhere. I would like to say sorry to Petrushka too for apparently initiating the misdirected criticism she has received.

I am sorry it wasn’t as obvious to Kairosfocus as it was to others that my invitation to post here contained a light-hearted reference to the only (as far as I am aware) IP ban ever meted out at The Skeptical Zone,  received by Joe Gallien in response to his linking a graphically pornographic image here. I hope that clears up any misunderstanding and if Kairosfocus changes his mind about commenting here, I am sure he will find the moderation rules will be adhered to fairly.

KF vs Toronto & Petrushka

Over at UD, KF has started a new thread criticizing Toronto.  He had earlier started a thread criticizing Petrushka.

It would have been nicer if KF had joined here to launch his criticism, instead of taking pot shots from UD where it is my understanding that both Toronto and Petrushka have been banned.

In any case, this is where the two accused can set the record straight by explaining what they actually meant.  Others can join in.  I may add something later.

Let’s keep it polite.  No character attacks.  Let’s stick to clear explanations of positions that KF might have misunderstood.  And let’s remember the rules of The Skeptical Zone and keep it civil.

Open for discussion.

Science and quantification

Barry Arrington has started a new thread over at UD, where he argues that it is perfectly okay if CSI (complex specified information) cannot be quantified.

I responded, in a comment, that he seems to be making the case that CSI is not a scientific concept.  It occurred to me that folk here might want to have a discussion on what is required of a concept, for it to be part of a scientific theory.

It seems to me that, at the very least, the concept has to be defined precisely enough that one can form testable hypotheses.  Moreover, these hypotheses need to be testable by independent researchers, and the tests need to provide some kind of reliability (i.e. reasonable agreement between results obtained by independent researchers).  Perhaps that’s weaker that quantifiable.  However, I think even that weaker requirement is a problem for CSI, as it is currently “defined”.

The topic is open for discussion.