A Free Lunch of Active Info, with a Side of LCI Violations

(Preamble: I apologize in advance for cluttering TSZ with these three posts. There are very few people on either side of the debate that actually care about the details of this “conservation of information” stuff, but these posts make good on some claims I made at UD.)

Given a sample space Ω and a target T ⊆ Ω, Dembski defines the following information measures:

Endogenous information: IS ≡ -log2( P(T) )
Exogenous information: IΩ ≡ -log2( |T|/|Ω| )
Active information: I+ ≡ IΩ – IS = log2( P(T) / |T|/|Ω|)

Active information is supposed to indicate design, but in fact, the amount of active info attributed to a process depends on how we choose to mathematically model that process. We can get as much free active info as we want simply by making certain modeling choices.

Free Active Info via Individuation of Possibilities

Dembski is in the awkward position of having impugned his own information measures before he even invented them. From his book No Free Lunch:

This requires a measure of information that is independent of the procedure used to individuate the possibilities in a reference class. Otherwise the same possibility can be assigned different amounts of information depending on how the other possibilities in the reference class are individuated (thus making the information measure ill-defined).

He used to make this point often. But two of his new information measures, “endogenous information” and “active information”, depend on the procedure used to individuate the possible outcomes, and are therefore ill-defined according to Dembski’s earlier position.

To see how this fact allows arbitrarily high measures of active information, consider how we model the rolling of a six-sided die. We would typically define Ω as the set {1, 2, 3, 4, 5, 6}. If the goal is to roll a number higher than one, then our target T is {2, 3, 4, 5, 6}. The amount of active information I+ is log2(P(T) / (|T|/|Ω|)) = log2((5/6) / (5/6)) = 0 bits.

But we could, instead, define Ω as {1, higher than 1}. In that case, I+ = log2((5/6) / (1/2)) = .7 bits. What we’re modeling hasn’t changed, but we’ve gained active information by making a different modeling choice.

Furthermore, borrowing an example from Dembski, we could distinguish getting a 1 with the die landing on the table from getting a 1 with the die landing on the floor. That is, Ω = { 1 on table, 1 on floor, higher than 1 }. Now I+ = log2((5/6) / (1/3)) = 1.3 bits. And we could keep changing how we individuate outcomes until we get as much active information as we desire.

This may seem like cheating. Maybe if we stipulate that Ω must always be defined the “right” way, then active information will be well-defined, right? But let’s look into another modeling choice that demonstrates that there is no “right” way to define Ω in the EIL framework.

Free Active Info via Inclusion of Possibilities

Again borrowing an example from Dembski, suppose that we know that there’s a buried treasure on the island of Bora Bora, but we have no idea where on the island it is, so all we can do is randomly choose a site to dig. If we want to model this search, it would be natural to define Ω as the set of all possible dig sites on Bora Bora. Our search, then, has zero active information, since it is no more likely to succeed than randomly selecting from Ω (because randomly selecting from Ω is exactly what we’re doing).

But is this the “right” definition of Ω? Dembski asks the question, “how did we know that of all places on earth where the treasure might be hidden, we needed to look on Bora Bora?” Maybe we should define Ω, as Dembski does, to include all of the dry land on earth. In this case, randomly choosing a site on Bora Bora is a high-active-information search, because it is far more likely to succeed than randomly choosing a site from Ω, i.e. the whole earth. Again, we have changed nothing about what is being modeled, but we have gained an enormous amount of active information simply by redefining Ω.

We could also take Dembski’s question further by asking, “how did we know that of all places in the universe, we needed to look on Bora Bora?” Now it seems that we’re being ridiculous. Surely we can take for granted the knowledge that the treasure is on the earth, right? No. Dembski is quite insistent that the zero-active-information baseline must involve no prior information whatsoever:

The “absence of any prior knowledge” required for uniformity conceptually parallels the difficulty of understanding the nothing that physics says existed before the Big Bang. It’s common to picture the universe before the Big Bang is a large black void empty space. No. This is a flawed image. Before the Big Bang there was nothing. A large black void empty space is something. So space must be purged from our visualization. Our next impulse is then, mistakenly, to say, “There was nothing. Then, all of a sudden…” No. That doesn’t work either. All of a sudden presupposes there was time and modern cosmology says that time in our universe was also created at the Big Bang. The concept of nothing must exclude conditions involving time and space. Nothing is conceptually difficult because the idea is so divorced from our experience and familiarity zones.

and further:

The “no prior knowledge” cited in Bernoulli’s PrOIR is all or nothing: we have prior knowledge about the search or we don’t. Active information on the other hand, measures the degree to which prior knowledge can contribute to the solution of a search problem.

To define a search with “no prior knowledge”, we must be careful not to constrain Ω. For example, if Ω consists of permutations, it must contain all permutations:

What search space, for instance, allows for all possible permutations? Most don’t. Yet, insofar as they don’t, it’s because they exhibit structures that constrain the permissible permutations. Such constraints, however, bespeak the addition of active information.

But even if we define Ω to include all permutations of a given ordered set, we’re still constraining Ω, as we’re excluding permutations of other ordered sets. We cannot define Ω without excluding something, so it is impossible to define a search without adding active information.

Active information is always measured relative to a baseline, and there is no baseline that we can call “absolute zero”. We therefore can attribute an arbitrarily large amount of active information to any search simply by choosing a baseline with a sufficiently large Ω.

Returning our six-sided die example, we can take the typical definition of Ω as {1, 2, 3, 4, 5, 6} and add, say, 7 and 8 to the set. Obviously our two additional outcomes each have a probability of zero, but that’s not a problem — probability distributions often include zero-probability elements. Inclusion of these zero-probability outcomes doesn’t change the mean, median, variance, etc. of the distribution, but it does change the amount of active info from 0 to log2((1/6) / (1/8)) = .4 bits (given a target of rolling, say, a one).

Free Violations of the LCI

Given a chain of two searches, the LCI says that the endogenous information of the first search is at least as large as the active information of the second. Since we can model the second search to have arbitrarily large active information, we can always model it such that its active information is larger than the first search’s endogenous information. Thus any chain of searches can be shown to violate the LCI. (We can also model the first search such that its endogenous information is arbitrarily large, so any chain of searches can also be shown to obey the LCI.)

8 thoughts on “A Free Lunch of Active Info, with a Side of LCI Violations

  1. R0b – I for one am very interested in the details of the LCI – so thanks for these posts. I have spent the morning thinking about them.

    I am trying to imagine Dembski’s response in each case.

    Free Active Info via Individuation of Possibilities

    I would guess that Dembksi would respond that whatever rule you use to individuate Ω you should use the same rule for endogenous and exogenus information. In your example you have used Ω as the set {1, 2, 3, 4, 5, 6} to calculate the endogenous information but Ω as {1, higher than 1} to calculate the exogenous information. If you use the same rule for both then the active information remains at 0 bits whichever rule you choose.

    Free Active Info via Inclusion of Possibilities

    It is real problem for the LCI to try and define what it means to have no information and it is clear that Dembski is uneasy about the wholesale use of the principle of indifference to try and solve it as he and Marks wrote a paper trying to justify it.

    I think there is somewhat plausible response he could make (although I am pretty sure he never has responded this way and maybe never will).

    He could respond by saying that whatever your views about what it means to have no information, the LCI also applies relative to some predefined state rather than “no information”. So in the case of the treasure on Bora Bora he could duck the problem of what it means to have no information about the treasure and just talk about the gain in information over the initial state of knowing it is somewhere on the island. Then all(!) he is left with is the non-trivial problem translating that into evolutionary terms i.e. defining what that initial state is for the evolution of life, what the “target” is, and calculating the probability of the target given the initial state – this would be his exogenous information.

    Another way of looking at it

    I like to express the deep problem with the LCI a different way. As your example of Bertrand’s box rather brilliantly shows all that the LCI amounts to is that given that 10% of coins are gold and 90% are silver, and they are sorted into boxes in some unknown way, you can’t increase your chances of getting a gold coin by choosing a coin from one box rather than another. Dembski tries to deduce from this that the probability of a mechanism existing that sorts gold coins preferentially into one box must be lower than the gain in probability from that box. The confusion is in identifying a search mechanism with a subset of the search space.

    The LCI argument is essentially that the probability of an outcome is low, therefore the probability of a mechanism which finds it is low – we know that the chances of getting a gold coin are low therefore the probability of finding a box which contains a lot of them must be low. But the whole point is while the probability of life may appear to be low, in fact it isn’t. And this is because there is a mechanism that preferentially selects features needed for life. To claim that the probability of such a mechanism existing must be low is equivalent to saying that the probability of a mechanism existing that sorts gold coins into one box is dependent on the proportion of gold coins in the population – clear and utter nonsense.

  2. great posts sir, I’ll have to agree with mark on this particular example. Consider the
    probability mass vectors and the SIC (shannon information content),

    Px = [1/6;1/6;1/6;1/6;1/6;1/6]
    SIC = 2.58

    Expand the (uniform) alphabet by two,

    Px1 = [1/8;1/8;1/8;1/8;1/8;1/8;1/8;1/8]
    SIC = 3

    Then shift the probabilities back on the original six,

    Px2 = [1/6;1/6;1/6;1/6;1/6;1/6;0;0]
    SIC = 2.58

    i’m pretty sure the active info would be 0 (if I have active info correct)

    H(x2) – H(x) = 0

    I’m going to skim the active info paper, I’ interested in discussing this topic this week.

  3. Mark, as always, thanks for your comments.

    I would guess that Dembksi would respond that whatever rule you use to individuate Ω you should use the same rule for endogenous and exogenus information. In your example you have used Ω as the set {1, 2, 3, 4, 5, 6} to calculate the endogenous information but Ω as {1, higher than 1} to calculate the exogenous information. If you use the same rule for both then the active information remains at 0 bits whichever rule you choose.

    Actually I was using Ω={1, higher than 1} to calculate both the endogenous and exogenous information, but I didn’t state it very clearly. For a normal die, the probability of the outcome “higher than 1” is 5/6.

    He could respond by saying that whatever your views about what it means to have no information, the LCI also applies relative to some predefined state rather than “no information”.

    I agree that this would be a wise approach, and Dembski’s insistence that we eliminate “familiarity zones” so our baseline is really “nothing” is misguided.

  4. R0b

     

    For a normal die, the probability of the outcome “higher than 1″ is 5/6

    There are two ways you might come up with that – throwing a dice lots of times or using the principle of indifference. You don’t have the first option for measuring endogenous information – so you have to use the principle of indifference.  I have severe doubts about the principle of indifference but if you are going to apply it then Dembski can reasonably argue that you have to apply it consistently. If it gives p=5/6 for the exogenous information then it should also give p=5/6 for the endogenous information – alternatively if it gives p=1/2 one it should give p=1/2 for the other.

  5. If it gives p=5/6 for the exogenous information then it should also give p=5/6 for the endogenous information – alternatively if it gives p=1/2 one it should give p=1/2 for the other.

    My understanding is that exogenous and endogenous information deal with two different searches, a baseline search characterized by a uniform distribution, and an alternate search usually characterized by a non-uniform distribution. If we expected both searches to have the same probabilities, then active information would always be zero.

    In the example of the die, the alternate search is biased toward the outcome “higher than 1”, as we can determine by the geometry of the die or by conducting a lot of trials. But the baseline search is, by definition, unbiased, so both outcomes (“1” and “higher than 1”) have the same probability.

    The baseline search is make-believe, not empirically based, a fact that Dembski seems to lose sight of. Here’s a statement from his recent article:

    But insofar as evolutionary search renders aspects of a biological configuration space probabilistically accessible where previously, under blind search, they were probabilistically inaccessible, conservation of information says that evolutionary search achieves this increase in search performance at an informational cost.

    (Emphasis mine.)

    “Previously?” Previous to what? When was the search ever blind?

  6. R0b

    I agree that that exogenous and endogenous information deal with two different searches, but surely both of them use the principle of indifference to come up with their probability/information values.  The result need not be the same because the assisted search may change the ratio of results which are targets to results which are possible.  However, it seems reasonable to insist that the principle of individuation be the same for both searches.

     

  7. Mark:

    I agree that that exogenous and endogenous information deal with two different searches, but surely both of them use the principle of indifference to come up with their probability/information values. The result need not be the same because the assisted search may change the ratio of results which are targets to results which are possible.

    Certainly the assisted search could be modeled using the principle of indifference, but it could also be determined empirically or by some other means.

    However, it seems reasonable to insist that the principle of individuation be the same for both searches.

    I agree completely. Ω should be defined exactly the same for both searches. To do otherwise would be obvious cheating.

  8. Rob, I think the dice example is the issue. The 5/6 example might go something like this:

    Px = [1/2;1/10;1/10;1/10;1/10;1/10]
    SIC = 2.16

    The sum of the probabilities of 2-6 is 1/2. You have an equal chance of
    rolling a 1 as 2-6. If the goal is to increase the odds of rolling
    specifically a 2,3,4,5 or 6 then narrowing it down from the subset {2,3,4,5,6} to a specific outcome of say 4 (from {1/10;1/10;1/10;1/10;1/10} to {0;0;1/2;0;0}) the active info should be:

    -log(1/10/1/2) = -log(1/10) – log(1/2) = 2.32 = I+

    Px1 = [1/2;1/2;0;0;0;0]
    SIC = 1

    The uniform baseline in this case is the probability mass vector of the subset of {2,3,4,5,6}. Adding the extra 7,8 slots to the subset {1/10;1/10;1/10;1/10;1/10;0;0}, {0;0;1/2;0;0;0;0}, if they were 0 prob shouldn’t alter the active information outcome

Leave a Reply