Evo-Info 4: Non-conservation of algorithmic specified complexity

Introduction to Evolutionary Informatics, by Robert J. Marks II, the “Charles Darwin of Intelligent Design”; William A. Dembski, the “Isaac Newton of Information Theory”; and Winston Ewert, the “Charles Ingram of Active Information.” World Scientific, 332 pages.
Classification: Engineering mathematics. Engineering analysis. (TA347)
Subjects: Evolutionary computation. Information technology–Mathematics.

The greatest story ever told by activists in the intelligent design (ID) socio-political movement was that William Dembski had proved the Law of Conservation of Information, where the information was of a kind called specified complexity. The fact of the matter is that Dembski did not supply a proof, but instead sketched an ostensible proof, in No Free Lunch: Why Specified Complexity Cannot Be Purchased without Intelligence (2002). He did not go on to publish the proof elsewhere, and the reason is obvious in hindsight: he never had a proof. In “Specification: The Pattern that Signifies Intelligence” (2005), Dembski instead radically altered his definition of specified complexity, and said nothing about conservation. In “Life’s Conservation Law: Why Darwinian Evolution Cannot Create Biological Information” (2010; preprint 2008), Dembski and Marks attached the term Law of Conservation of Information to claims about a newly defined quantity, active information, and gave no indication that Dembski had used the term previously. In Introduction to Evolutionary Informatics, Marks, Dembski, and Ewert address specified complexity only in an isolated chapter, “Measuring Meaning: Algorithmic Specified Complexity,” and do not claim that it is conserved. From the vantage of 2018, it is plain to see that Dembski erred in his claims about conservation of specified complexity, and later neglected to explain that he had abandoned them.

Algorithmic specified complexity is a special form of specified complexity as Dembski defined it in 2005, not as he defined it in earlier years. Yet Marks et al. have cited only Dembski’s earlier publications. They perhaps do not care to draw attention to the fact that Dembski developed his “new and improved” specified complexity as a test statistic, for use in rejecting a null hypothesis of natural causation in favor of an alternative hypothesis of intelligent, nonnatural intervention in nature. That’s obviously quite different from their current presentation of algorithmic specified complexity as a measure of meaning. Ironically, their one and only theorem for algorithmic specified complexity, “The probability of obtaining an object exhibiting \alpha bits of ASC is less then [sic] or equal to 2^{-\alpha},” is a special case of a more general result for hypothesis testing, Theorem 1 in A. Milosavljević’s “Discovering Dependencies via Algorithmic Mutual Information: A Case Study in DNA Sequence Comparisons” (1995). It is odd that Marks et al. have not discovered this result in a literature search, given that they have emphasized the formal similarity of algorithmic specified complexity to algorithmic mutual information.

Lately, a third-wave ID proponent by the name of Eric Holloway has spewed mathe­matical­istic nonsense about the relationship of specified complexity to algorithmic mutual information. [Note: Eric has since joined us in The Skeptical Zone, and I sincerely hope to see that he responds to challenges by making better sense than he has previously.] In particular, he has claimed that a conservation law for the latter applies to the former. Given his confidence in himself, and the arcaneness of the subject matter, most people will find it hard to rule out the possibility that he has found a conservation law for specified complexity. My response, recorded in the next section of this post, is to do some mathematical investigation of how algorithmic specified complexity relates to algorithmic mutual information. It turns out that a demonstration of non-conservation serves also as an illustration of the sense­less­ness of regarding algorithmic specified complexity as a measure of meaning.

Let’s begin by relating some of the main ideas to pictures. The most basic notion of nongrowth of algorithmic information is that if you input data x to computer program p, then the amount of algorithmic information in the output data p(x) is no greater than the amounts of algorithmic information in input data x and program p, added together. That is, the increase of algorithmic information in data processing is limited by the amount of algorithmic information in the processor itself. The following images do not illustrate the sort of conservation just described, but instead show massive increase of algorithmic specified complexity in the processing of a digital image x by a program p that is very low in algorithmic information. At left is the input x, and at right is the output p(x) of the program, which cumulatively sums of the RGB values in the input. Loosely speaking, the n-th RGB value of Signature of the Id is the sum of the first n RGB values of Fuji Affects the Weather.

Effects of Fuji on Weather
Fuji Affects the Weather
[13 megabits of “meaning”]
Awesome Glory of the Id
Signature of the Id
[24 megabits of “meaning”]

What is most remarkable about the increase in “meaning,” i.e., algorithmic specified complexity, is that there actually is loss of information in the data processing. The loss is easy to recognize if you understand that RGB values are 8-bit quantities, and that the sum of two of them is generally a 9-bit quantity, e.g.,

    \begin{align*} \phantom{+}& \phantom{01}11111111 \\ + & \phantom{01}00000001 \\ = & \phantom{0}100000000 \end{align*}

The program p discards the leftmost carry (either 1, as above, or 0) in each addition that it performs, and thus produces a valid RGB value. The discarding of bits is loss of information in the clearest of operational terms: to reconstruct the input image x from the output image p(x), you would have to know the bits that were discarded. Yet the cumulative summation of RGB values produces an 11-megabit increase in algorithmic specified complexity. In short, I have provided a case in which methodical corruption of data produces a huge gain in what Marks et al. regard as meaningful information.

An important detail, which I cannot suppress any longer, is that the algorithmic specified complexity is calculated with respect to binary data called the context. In the calculations above, I have taken the context to be the digital image Fuji. That is, a copy of Fuji has 13 megabits of algorithmic specified complexity in the context of Fuji, and an algorithmically simple corruption of Fuji has 24 megabits of algorithmic specified complexity in the context of Fuji. As in Evo-Info 2, I have used the methods of Ewert, Dembski, and Marks, “Measuring Meaningful Information in Images: Algorithmic Specified Complexity” (2015). My work is recorded in a computational notebook that I will share with you in Evo-Info 5. In the meantime, any programmer who knows how to use an image-processing library can easily replicate my results. (Steps: Convert fuji.png to RGB format; save the cumulative sum of the RGB values, taken in row-major order, as signature.png; then subtract 0.5 Mb from the sizes of fuji.png and signature.png.)

As for the definition of algorithmic specified complexity, it is easiest to understand when expressed as a log-ratio of probabilities,

    \begin{equation*} A(x; c,P) = \log_2 \frac{ M_c(x) }{ P(x) }, \end{equation*}

where x and c are binary strings (finite sequences of 0s and 1s), and P and M_c are distributions of probability over the set of all binary strings. All of the formal details are given in the next section. Speaking loosely, and in terms of the example above, M_c(x) is the probability that a randomly selected computer program converts the context c into the image x, and P(x) is the probability that image x is the outcome of a default image-generating process. The ratio of probabilities is the relative likelihood of x arising by an algorithmic process that depends on the context, and of x arising by a process that does not depend on the context. If, proceeding as in statistical hypothesis testing, we take as the null hypothesis the proposition that x is the outcome of a process with distribution P, and as an alternative hypothesis the proposition that x is the outcome of a process with distribution M_c, then our level of confidence in rejecting the null hypothesis in favor of the alternative hypothesis depends on the value of A(x; c,P). The one and only theorem that Marks et al. have given for algorithmic specified complexity tells us to reject the null hypothesis in favor of the alternative hypothesis at confidence level 2^{-\alpha} when A(x; c,P) = \alpha.

What we should make of the high algorithmic specified complexity of the images above is that they both are more likely to derive from the context than to arise in the default image-generating process. Again, Fuji is just a copy of the context, and Signature is an algorithmically simple corruption of the context. The probability of randomly selecting a program that cumulatively sums the RGB values of the context is much greater than the probability of generating the image Signature directly, i.e., without use of the context. So we confidently reject the null hypothesis that Signature arose directly in favor of the alternative hypothesis that Signature derives from the context.

This embarrassment of Marks et al. is ultimately their own doing, not mine. It is they who buried the true story of Dembski’s (2005) redevelopment of specified complexity in terms of statistical hypothesis testing, and replaced it with a fanciful account of specified complexity as a measure of meaningful information. It is they who neglected to report that their theorem has nothing to do with meaning, and everything to do with hypothesis testing. It is they who sought out examples to support, rather than to refute, their claims about meaningful information.

Algorithmic specified complexity versus algorithmic mutual information

This section assumes familiarity with the basics of algorithmic information theory.

Objective. Reduce the difference in the expressions of algorithmic specified complexity and algorithmic mutual information, and provide some insight into the relation of the former, which is unfamiliar, to the latter, which is well understood.

Here are the definitions of algorithmic specified complexity (ASC) and algorithmic mutual information (AMI):

    \begin{align*} A(x; c, P)  &:=  -\!\log P(x) - K(x|c) &\text{(ASC)} \\ I(x: c)    &:=  K(x) - K(x|c^*),  &\text{(AMI)} \\ \end{align*}

where P is a distribution of probability over the set \{0, 1\}^* of binary strings, x and c are binary strings, and binary string c^* is a shortest program, for the universal prefix machine in terms of which the algorithmic complexity K(\cdot) and the conditional algorithmic complexity K(\cdot|\cdot) are defined, outputting c. The base of the logarithm is 2. Marks et al. regard algorithmic specified complexity as a measure of the meaningful information of x in the context of c.

The law of conservation (nongrowth) of algorithmic mutual information is:

    \begin{equation*} I(f(x) : z) \leq I(x:z) + K(f) + O(1) \end{equation*}

for all binary strings x and z, and for all computable [ETA 12/12/2019: total] functions f on the binary strings. The analogous relation for algorithmic specified complexity does not hold. For example, let the probability P(\lambda) = 0, where \lambda is the empty string, and let P(x) be positive for all nonempty binary strings x. Also let f(x) = \lambda for all binary strings x. Then for all nonempty binary strings x and for all binary strings c,

    \begin{equation*} A(f(x); c, P) - A(x; c, P) = \infty \end{equation*}

because

    \begin{align*} A(f(x); c, P) &= -\!\log P(f(x)) - K(f(x) | c) \\ &= -\!\log P(\lambda) - K(\lambda | c) \\ &= -\!\log 0 - K(\lambda | c) \\ &= \infty - K(\lambda | c) \\ &= \infty - O(1) \\ \end{align*}

is infinite and A(x; c, P) is finite. [Edit: The one bit of algorithmic information theory you need to know, in order to check the proof, is that K(x|c) is finite for all binary strings x and c. I have added a line at the end of the equation to indicate that K(\lambda|c) is not only finite, but also constant.] Note that this argument can be modified by setting P(\lambda) to an arbitrarily small positive number instead of to zero. Then the growth of ASC due to data processing f(x) can be made arbitrarily large, though not infinite.

There is no upper bound on the increase of algorithmic specified complexity due to data processing.

There’s a simple way to deal with the different subexpressions K(x|c) and K(x|c^*) in the definitions of ASC and AMI, and that is to restrict our attention to the case of A(x; c^*\!, P), in which the context is a shortest program outputting c.

    \begin{align*} A(x; c^*\!, P)  &\phantom{:}=  -\!\log P(x) - K(x|c^*) &\text{(ASC)} \\ I(x: c)    &:=  K(x) - K(x|c^*)  &\text{(AMI)} \\ \end{align*}

This is not an onerous restriction, because there is for every string c a shortest program c^* that outputs c. In fact, there would be no practical difference for Marks et al. if they were to require that the context be given as c^*. We might have gone a different way, replacing the contemporary definition of algorithmic mutual information with an older one,

    \begin{align*} I(x:c) &:= K(x) - K(x|c). &\text{(old AMI)} \\ \end{align*}

However, the older definition comes with some disadvantages, and I don’t see a significant advantage in using it. Perhaps you will see something that I have not.

The difference in algorithmic mutual information and algorithmic specified complexity,

    \begin{equation*} I(x:c) - A(x; c^*\!, P) = K(x) + \log P(x), \end{equation*}

has no lower bound, because K(x) is non-negative for all x, and \log P(x) is negatively infinite for all x such that P(x) is zero. It is helpful, in the following, to keep in mind the possibility that P(x) = 1 for a string x with very high algorithmic complexity, and that P(x) = 0 for all strings w \neq x. Then the absolute difference in I(w:c) and A_P(w|c) [added 11/11/2019: the latter expression should be A(w; c^*\!, P)] is infinite for all w \neq x, and very large, though finite, for w = x. There is no limit on the difference for x. Were Marks et al. to add a requirement that P(w) be positive for all w, we still would be able, with an appropriate definition of P, to make the absolute difference of AMI and ASC arbitrarily large, though finite, for all w.

There is no upper bound on \min_x |I(x: c) - A(x; c^*\!, P)|. Thus algorithmic mutual information and algorithmic specified complexity are not equivalent in any simple sense.

The remaining difference in the expressions of ASC and AMI is in the terms -\!\log P(x) and K(x). The easy way to reduce the difference is to convert the algorithmic complexity K(x) into an expression of log-improbability, -\!\log M(x), where

    \begin{equation*} M(x) := 2 ^ {-K(x)} \end{equation*}

for all binary strings x. An elementary result of algorithmic information theory is that the probability function M satisfies the definition of a semimeasure, meaning that \sum_x M(x) \leq 1. In fact, M is called the universal semimeasure. A semimeasure with probabilities summing to unity is called a measure. We need to be clear, when writing

    \begin{align*} A(x; c, P)  &=  -\!\log P(x) - K(x|c^*) &\text{(ASC)} \\ I(x: c)    &=  -\!\log M(x) - K(x|c^*),  &\text{(AMI)} \\ \end{align*}

that P is a measure, and that M is a semimeasure, not a measure. (However, in at least one of the applications of algorithmic specified complexity, Marks et al. have made P a semimeasure, not a measure.) This brings us to a simple characterization of ASC:

The algorithmic specified complexity A(x; c^*\!, P) is the algorithmic mutual information I(x:c) with an arbitrary probability measure P substituted for the universal semimeasure M.

This does nothing but to establish clearly a sense in which algorithmic specified complexity is formally similar to algorithmic mutual information. As explained above, there can be arbitrarily large differences in the two quantities. However, if we consider averages of the quantities over all strings x, holding c constant, then we obtain an interesting result. Let random variable X take values in \{0, 1\}^* with probability distribution P. Then the expected difference of AMI and ASC is

    \begin{align*} E[I(X : c) - A(X; c^*\!, P)] &= E[K(X) + \log P(X)]    \\ &= E[K(X)] - E[-\!\log P(X)]    \\ &= E[K(X)] - H(P). \end{align*}

The expected difference in algorithmic mutual information I(X:c) and algorithmic specified complexity A(X; c^*\!, P) is the difference of the expected algorithmic complexity, E[K(X)], and the entropy H(P) of the probability distribution P.

Introducing a requirement that function P be computable, we get lower and upper bounds on the expected difference from Theorem 10 of Grünwald and Vitányi, “Algorithmic Information Theory“:

    \begin{equation*} 0 \leq E_P[K(X)] - H(P) \leq K(P) + O(1). \end{equation*}

Note that the algorithmic complexity of computable probability distribution P is the length of the shortest program that, on input of binary string x and number q (i.e., a binary string interpreted as a non-negative integer), outputs P(x) with q bits of precision (see Example 7 of Grünwald and Vitányi, “Algorithmic Information Theory“).

If the probability distribution P of random variable X is computable, then the expected difference in value of the algorithmic mutual information I(X:c) and the algorithmic specified complexity A(X; c^*\!, P) is at least zero, and is at most K(P) + O(1). Equivalently,

    \begin{align*} E[A(X; c^*\!, P)] &\leq E[I(X:c)] \\ &\leq E[A(X; c^*\!, P)] + K(P) + O(1). \end{align*}

How much the expected value of the AMI may exceed the expected value of the ASC depends on the length of the shortest program for computing P(x).

[Edit 10/30/2019: In the preceding box, there were three instances of x that should have been X.] Next we derive a similar result, but for individual strings instead of expectations, applying a fundamental result in algorithmic information theory, the MDL bound. If P is computable, then for all binary strings x and c,

(MDL)   \begin{align*} K(x) &\leq -\!\log P(x) + K(P) + O(1)  \\ K(x) + \log P(x) &\leq K(P) + O(1)  \\ I(x:c) - A(x; c^*\!, P) &\leq K(P) + O(1).  \\ \end{align*}

Replacing string x in this inequality with a string-valued random variable X, as we did previously, and taking the expected value, we obtain one of the inequalities in the box above. (The cross check increases my confidence, but does not guarantee, that I’ve gotten the derivations right.)

If the probability distribution P is computable, then the difference in the algorithmic mutual information I(x:c) and the algorithmic specified complexity A(x; c^*\!, P) is bounded above by K(P) + O(1).

Finally, we express K(x|c) in terms of the universal conditional semimeasure,

    \begin{equation*} M_c(x) = 2 ^ {-K(x|c)}. \end{equation*}

Now K(x|c) = -\!\log_2 M_c(x), and we can express algorithmic specified complexity as a log-ratio of probabilities, with the caveat that \sum M_c(x) < 1.

The algorithmic specified complexity of binary string x in the context of binary string c is

    \begin{equation*} A(x; c, P) = \log_2 \frac{ M_c(x) }{ P(x) }, \end{equation*}

where P is a distribution of probability over the binary strings \{0, 1\}^* and M_c is the universal conditional semimeasure.

An old proof that high ASC is rare

Marks et al. have fostered a great deal of confusion, citing Dembski’s earlier writings containing the term specified complexity — most notably, No Free Lunch: Why Specified Complexity Cannot Be Purchased without Intelligence (2002) — as though algorithmic specified complexity originated in them. The fact of the matter is that Dembski radically changed his approach, but not the term specified complexity, in “Specification: The Pattern that Signifies Intelligence” (2005). Algorithmic specified complexity derives from the 2005 paper, not Dembski’s earlier work. As it happens, the paper has received quite a bit of attention in The Skeptical Zone. See, especially, Elizabeth Liddle’s “The eleP(T|H)ant in the room” (2013).

Marks et al. furthermore have neglected to mention that Dembski developed the 2005 precursor (more general form) of algorithmic specified complexity as a test statistic, for use in rejecting a null hypothesis of natural causation in favor of an alternative hypothesis of intelligent, nonnatural intervention in nature. Amusingly, their one and only theorem for algorithmic specified complexity, “The probability of obtaining an object exhibiting \alpha bits of ASC is less then or equal to 2^{-\alpha},” is a special case of a more general result for hypothesis testing. The result is Theorem 1 in A. Milosavljević’s “Discovering Dependencies via Algorithmic Mutual Information: A Case Study in DNA Sequence Comparisons” (1995), which Marks et al. do not cite, though they have pointed out that algorithmic specified complexity is formally similar to algorithmic mutual information.

According to the abstract of Milosavljević’s article:

We explore applicability of algorithmic mutual information as a tool for discovering dependencies in biology. In order to determine significance of discovered dependencies, we extend the newly proposed algorithmic significance method. The main theorem of the extended method states that d bits of algorithmic mutual information imply dependency at the significance level 2^{-d+O(1)}.

However, most of the argument — particularly, Lemma 1 and Theorem 1 — applies more generally. It in fact applies to algorithmic specified complexity, which, as we have seen, is defined quite similarly to algorithmic mutual information.

Let P_0 and P_A be probability distributions over sequences (or any other kinds of objects from a countable domain) that correspond to the null and alternative hypotheses respectively; by p_0(t) and p_A(t) we denote the probabilities assigned to a sequence t by the respective distributions.

[…]

We now define the alternative hypothesis P_A in terms of encoding length. Let A denote a decoding algorithm [our universal prefix machine] that can reconstruct the target t [our x] based on its encoding relative to the source s [our context c]. By I_A(t|s) [our K(x|c)] we denote the length of the encoding [for us, this is the length of a shortest program that outputs x on input of c]. We make the standard assumption that encodings are prefix-free, i.e., that none of the encodings represented in binary is a prefix of another (for a detailed discussion of the prefix-free property, see Cover & Thomas, 1991; Li & Vitányi, 1993). We expect that the targets that are similar to the source will have short encodings. The following theorem states that a target t is unlikely to have an encoding much shorter than -\!\log p_0(t).

THEOREM 1 For any distribution of probabilities P_0, decoding algorithm A, and source s,

    \begin{equation*} P_0\{-\!\log p_0(t) - I_A(t|s) \geq d\} \leq 2^{-d}. \end{equation*}

Here is a specialization of Theorem 1 within the framework of this post: Let X be a random variable with distribution P_0 of probability over the set \{0, 1\}^* of binary strings. Let context c be a binary string. Then the probability of algorithmic specified complexity A(X; c, P_0) \geq d is at most 2^{-d}, i.e.,

    \begin{align*} \Pr\{ A(X; c, P_0) \geq d \} &= \Pr\{ -\!\log P_0(X) - K(X|c) \geq d \} \\ &\leq 2^{-d}. \end{align*}

For a simpler formulation and derivation, along with a snarky translation of the result into IDdish, see my post “Deobfuscating a Theorem of Ewert, Marks, and Dembski” at Bounded Science.

I emphasize that Milosavljević does not narrow his focus to algorithmic mutual information until Theorem 2. The reason that Theorem 1 applies to algorithmic specified complexity is not that ASC is essentially the same as algorithm mutual information — we established above that it is not — but instead that the theorem is quite general. Indeed, Theorem 1 does not apply directly to the algorithmic mutual information

    \begin{equation*} I(x:c) = -\!\log_2 M(x) - K(x|c^*), \end{equation*}

because M is a semimeasure, not a measure like P_0. Theorem 2 tacitly normalizes the probabilities of the semimeasure M, producing probabilities that sum to unity, before applying Theorem 1. It is this special handling of algorithmic mutual information that leads to the probability bound of 2^{-d+O(1)} stated in the abstract, which is different from the bound of 2^{-d} for algorithmic specified complexity. Thus we have another clear indication that algorithmic specified complexity is not essentially the same as algorithmic mutual information, though the two are formally similar in definition.

Conclusion

In 2005, William Dembski made a radical change in what he called specified complexity, and developed the precursor of algorithmic specified complexity. He presented the “new and improved” specified complexity in terms of statistical hypothesis testing. That much of what he did made good sense. There are no formally established properties of algorithmic specified complexity that justify regarding it as a measure of meaningful information. The one theorem that Marks et al. have published is actually a special case of a 1995 result applied to hypothesis testing. In my own exploration of the properties of algorithmic specified complexity, I’ve seen nothing at all to suggest that it is reasonably regarded as a measure of meaning. Marks et al. have attached “meaning” rhetoric to some example applications of algorithmic specific complexity, but have said nothing about counter­examples, which are quite easy to find. In Evo-Info 5, I will explain how to use the methods of “Measuring Meaningful Information in Images: Algorithmic Specified Complexity” to measure large quantities of “meaningful information” in nonsense images derived from the context.

[Some minor typographical errors in the original post have been corrected.]

220 thoughts on “Evo-Info 4: Non-conservation of algorithmic specified complexity

  1. phoodoo: Keep does not=create.

    Who or what creates information phoodoo? Presumably you think some sort of deity does it? When does that happen? How does that happen?

  2. Alan Fox:
    Hat-tip to Dan Eastwood at Peaceful Science for linking to this article;
    Computing the origin of life which seems to address some of phoodoo’s concerns.

    A tip of the hat to Dr. Seuss for trying to make sense of some of Alan’s confusion:

    I am Daniel

    I am Sam
    Sam I am

    That Sam-I-am
    That Sam-I-am!
    I do not like
    That Sam-I-am

    Do you like
    Green eggs and ham

    I do not like them,
    Sam-I-am.
    I do not like
    Green eggs and ham.

    Would you like them
    Here or there?

    I would not like them
    Here or there.
    I would not like them
    Anywhere.
    I do not like
    Green eggs and ham.
    I do not like them,
    Sam-I-am

    Would you like them
    In a house?
    Would you like them
    With a mouse?

    I do not like them
    In a house.
    I do not like them
    With a mouse.
    I do not like them
    Here or there.
    I do not like them
    Anywhere.

  3. dazz: Another argument about islands of function, heh.

    No, just a counter example to Corneel’s claim:

    > Since natural selection depends on fitness variation, I don’t see how the target can possibly be independent from the evolutionary process.

    Just because no evolutionary pathway exists from A to B does not imply that no pathway exists. I cannot turn a carriage into a car through Darwinian mechanisms, but I can through the incremental intelligent design of engineers.

    Same with programming. I cannot progress from a simple hello world BASIC program to a 3D Java game through selection and random mutation, but I can through my own incremental learning and engineering.

    And, we can justify this mathematically by the observation that a halting oracle O can create algorithmic mutual information, whereas randomness R and computation C cannot:

    I(O(x):y) > I(x:y) >= E[I(C(R,x):y)]

  4. dazz: Does your argument/proof work regardless of the how smooth or rugged the fitness landscape is?

    If the fitness landscape is a smooth mountain with one peak, then my argument does not work. The more the landscape varies from this scenario the more my argument applies.

  5. EricMH: If the fitness landscape is a smooth mountain with one peak, then my argument does not work. The more the landscape varies from this scenario the more my argument applies.

    Does your math reflect this?
    Also, another way to put it would be, the more the fitness landscape looks like islands of function, or hills that are far and between, the more your argument applies, right?

  6. dazz:

    EricMH: If the fitness landscape is a smooth mountain with one peak, then my argument does not work. The more the landscape varies from this scenario the more my argument applies.

    Does your math reflect this?
    Also, another way to put it would be, the more the fitness landscape looks like islands of function, or hills that are far and between, the more your argument applies, right?

    How does this work? I thought that Eric Holloway’s argument worked for all fitness surfaces. (But then I don’t understand how Holloway’s argument is connected to any fitness surfaces).

  7. EricMH:

    Consider rolling a die.Relabeling the sides doesn’t decrease the probability of a certain side showing up.

    It does if we relabel them all to the same value!
    This may seem frivolous, but it is a very rough idea of what population genetics claims can happen to genomes of populations over time.

  8. EricMH:What this means in more concrete terms I’m still thinking through.

    I hope you mean you are trying to provide the detail needed to justify why your mathematical results have any application to biological evolution.

    I think your overall argument follows this structure:
    P1 E is a process which is consistent with known physics operating on Earth.
    P2 M is a mathematical model of E.
    P3 O are observations purportedly produced by E.
    P4 M says O contradicts E.

    Conclusion: E is false. That is, the process which produced O is not consistent with known physics operating on Earth.

    For your argument as I understand it, E is the mechanism of natural selection operating under evolutionary theory, M is your AI model and Levin’s results, and O is the observed appearance of design in living organisms. Because of this conclusion, you say a halting oracle embodied by intelligence is required to produce O (of course, halting oracles are inconsistent with physics operating on Earth)..

    But the real problem with this argument is premise 2. I see no reason to accept it over P1 in the case of your model applying to biological evolution, given the success of P1 throughout science. (It is of course MN, but this thread is not the place to discuss MN).

  9. EricMH,

    Eric:
    I find it somewhat ironic that you have posted separately about the Wigner 1960 paper on the unreasonable effectiveness of math in physics.

    Wigner’s examples, as well as his overall argument, start with scientists observing regularities and then seeking the mathematical equations which could model them. His examples involve Newton modeling observed motions on earth and in the solar system, and Born and others modeling observed results of experiments with quantum entities.

    In both cases, science starts with unexplained empirical observations and then proceeds to find relevant math.

    So far, your approach seems to be the opposite: start with math, then claim observations must fit it.

  10. BruceS: It does if we relabel them all to the same value!
    This may seem frivolous, but it is a very rough idea of what population genetics claims can happen to genomes of populations over time.

    But isn’t evolution a process of constant relabeling?

  11. BruceS: In both cases, science starts with unexplained empirical observations and then proceeds to find relevant math.

    Are you trying to supply the Creationists with yet more reasons to believe that evolutionary theory is not science?

  12. BruceS: So far, your approach seems to be the opposite: start with math, then claim observations must fit it.

    I’ve noticed this sort of response a lot, that somehow the physical world can contradict mathematics. It is Dr. Swamidass’ go to line over at PS whenever it seems the mathematics is correct. However, the physical world can never contradict mathematics. If our observations do not match our math, then the only possibility is a mistake has been made somewhere:

    1. Flaw in measurement

    2. Flaw in application of math

    3. Flaw in mathematical proof

    If 1-3 do not apply, then math and the physical world will line up perfectly.

    I believe the disagreement between ID and the TSZ folks is deeper than the biology. The fundamental disagreement is TSZ believes the physical world can truly contradict mathematics.

  13. EricMH: The fundamental disagreement is TSZ believes the physical world can truly contradict mathematics.

    Those of us who make mathematical models of the real world are painfully aware that reality can, and usually does, fail to match our models. I’m unaware of anyone who makes these models trying to explain away that discrepancy by asserting that they have contradicted mathematics itself.

  14. EricMH: I’ve noticed this sort of response a lot, that somehow the physical world can contradict mathematics.

    Just to echo Joe Felsenstein here, Eric. Ask yourself, if a mathematical model doesn’t match reality, which is likely to be incorrect, reality or the math model?

  15. Eric is saying that when a mathematical model conflicts with reality, that we here at TSZ then go around saying that reality contradicts, not that particular model, but the basic structures of mathematics itself.

    Of course it would be silly to do that. Do you have an example of someone here doing that, Eric?

  16. These latest comments from BruceS, Eric, Alan, and Joe are really getting to the heart of the matter.

  17. EricMH: The fundamental disagreement is TSZ believes the physical world can truly contradict mathematics.

    No. I would in fact state that there is no logically possible world in which mathematics does not obtain. But mathematics does not have to be designed for that to be the case, rather it must necessarily be so. In so far as some world is logically coherent, then it must be possible to count, and relations of quantity must follow.

    This is not something which God is required to explain, or create, or for which mathematics has to be designed, rather it cannot be otherwise. To say anything else is absolutely gibberish, nonsense, poppycock. It is literally inconceivable, and that really is the word I intend to employ.

  18. Since Eric conceded that his argument does not apply to all fitness surfaces, doesn’t that mean it fails as a general proof that natural selection cannot increase specified complexity?

    And can we work through the example, please? It has a simple fitness surface with a single peak, so I gues we should be able to establish why the proof fails in this particular scenario.

  19. EricMH:

    I believe the disagreement between ID and the TSZ folks is deeper than the biology.The fundamental disagreement is TSZ believes the physical world can truly contradict mathematics.

    Eric: I notice you say “mathematics” and not “mathematical model”, as others responding to your post here are taking you to say.

    If you did mean to refer to mathematical model, then your list fails to appreciate how and why scientists build models. First, most models deliberately omit features of reality through abstraction or idealization. Second, many scientific models are not mathematical since modes of explanation must be tailored to the explanatory needs of a scientific domain. See this article in SEP if you are interested in learning about scientific modeling and here for scientific explanation.

    But suppose you meant “mathematics” per se. In this case, I take you to be asserting that the world has a fundamental structure which can be captured mathematically (that would include the Tegmarkian position that the world is mathematical). I happen to sympathize with the position that fundamentally the world can be described by mathematical structure, expresssed as the philosophical position of structural realism, but even if that controversial view is correct, we still have this key issue:

    How do we limited, fallible humans discover the mathematical structure of the world?

    As I have said in past posts, I see ID theorists as modern day rationalists: “rationalism is defined as a methodology or a theory “in which the criterion of the truth is not sensory but intellectual and deductive” (quoted in Wiki). Most ID theorists seem to believe that armchair theorizing using the mathematics of information and complexity combined with combinatorial probability are the correct ways to understanding the nature of reality; they reject modern scientifically-grounded empiricism.

    This is the fundamental difference between science and ID theory.

  20. Corneel:

    And can we work through the example, please? It has a simple fitness surface with a single peak, so I gues we should be able to establish why the proof fails in this particular scenario.

    Eric has a lot of work to do to make explicit how his mathematics says anything at all about biology, let alone fitness surfaces. Any comments on those topics he posts are his personal view of what his theory says. He has admitted in other posts he knows little of biology.

    Until that work is done, working through any examples would be premature.

    There are several threads at Peaceful Science where Eric explains his theory in more detail.
    https://discourse.peacefulscience.org/u/ericmh/summary

  21. BruceS: Until that work is done, working through any examples would be premature.

    There are several threads at Peaceful Science where Eric explains his theory in more detail.

    Thanks. I followed several of those but they fail to illuminate. “X is a biological organism” is not something I can work with; I need to know how to represent that organism in mathematical terms. I suspect working through the example will remain perpetually premature, but at least we get to find out what variables Eric thinks are appropriate to feed into the equations.

    ETA: loved your comment about the failure to distinguish mathematics from mathematical models: spot on.

  22. I think when Corneel says “the mathematical model”, what was meant was my simple population-genetic model of natural selection in an organism, the one in this 2012 post that I referred Eric to. The equations are simple, and they show that the model is, generation by generation, enriching the frequency of the highly-fit haploid genomes, by increasing the gene frequency of the advantageous allele at each locus. Further knowledge of the biology is not needed — I called the beasts “wombats” just for fun, but they could equally easily be jellyfish or warble flies or baobab trees.

    I don’t see what stops EricMH from showing how his ASC argument applies to my model. Or showing how the simple fitness surface it has — one that is smooth and has a single peak, violates some condition in his argument. I don’t see any condition or assumption in his argument that says anything about the smoothness issue. But EricMH is much more familiar with his argument than I am, so if that condition is there, presumably he can show us where in the argument it is.

    I disagree with BruceS that looking in EricMH’s argument for this is premature.

    I can promise you that if we find out how EricMH’s model shows that fitness cannot increase in a simple population-genetic model like this, the effect of this on population genetics would be very large — some of the foundational papers on natural selection would have to be retracted, 94 years of work down the drain.

  23. Joe Felsenstein:

    I don’t see what stops EricMH from showing how his ASC argument applies to my model. Or showing how the simple fitness surface it has —

    I agree and posted some time ago in your thread that Eric start there and then respond to your feedback and that of the other biologists. I understood that Corneel was asking for something much more specific than that as a starting point, but of course that could just reflect my own ignorance of biology.

    ETA: With my limited knowledge in mind: my best understanding of Eric’s MI theory is that he is looking at (Kolmogorov) MI of a given genome to its mutated successor. (And evolved successor too perhaps, thought is unclear to me what that could mean)

    But what he should be looking at, I think, is
    the MI between the aggregate genome of a population and the environment
    versus
    the MI of the aggregate genome of successor populations and the environment after population genetics mechanisms have operated on the aggregated genome.

    How exactly that aggregate genome/environment MI is modeled using Algorithmic Complexity is a problem I leave for Eric!

  24. BruceS: I understood that Corneel was asking for something much more specific than that as a starting point, but of course that could just reflect my own ignorance of biology.

    Haha, all the talk of fitness surfaces threw you off, I’ll bet. I was indeed referring to Joe’s “haploid wombat” example. It too has a fitness surface associated with it, which can be easily represented for any single locus by plotting the allele frequency at the x-axis and the average population fitness on the y-axis. I believe it is a simple straight line going up with increasing frequency of the “1”-allele.

    Plotting multi-locus genotypes will be messy, but the fitness landscape is straightforward and gently sloping up, with a single peak at the population that is fixed for “1” at all loci.

    ETA: fixed mistake

  25. Corneel: I believe it is a simple straight line going up with increasing frequency of the “1”-allele.

    If the gene frequency at a locus is p, the mean fitness at that locus is 1+0.01 p in the numerical example, which is indeed linear. For 100 loci the mean fitnesses at the 100 loci can be multiplied to get the overall mean fitness. In my example the gene frequencies at the 100 loci are all the same, p, so that the mean fitness of the population is then (1+0.01 p)^{100}, which rises in an exponential curve from 1.6566 when p = 0.5 to 2.7048 when p = 1.

  26. Ah, if all loci have the same frequency for the “1” allele, then we can actually make a nice visualization of the fitness landscape.

  27. Joe Felsenstein: I think when Corneel says “the mathematical model”, what was meant was my simple population-genetic model of natural selection in an organism, the one in this 2012 post that I referred Eric to.

    According to Alan organisms do not adapt. Is it then only parts of organisms that adapt? Organisms do not adapt but alleles do adapt?

  28. Corneel: the talk of fitness surfaces

    Well, my education (if not vocation) was in math, so I’m OK with the math bit. It’s how much biological basic work needs to be in place to even talk about fitness surfaces that I wondered about.

    I think Eric needs to start with explaining some very basic aspects of how his theory is meant to model biology. Like what strings his mutual information is supposed to be between. Or how relative fitness is captured. Or how mechanisms like NS are captured.

    I understand he thinks his theory can rule out naturalistic explanations of NS, but first I think he needs to explain how all of these concepts are even represented in models based on his theory.

    Based on his latest post, the issue may be that Eric does agree with the scientific approach to modeling.

  29. BruceS: As I have said in past posts, I see ID theorists as modern day rationalists: “rationalism is defined as a methodology or a theory “in which the criterion of the truth is not sensory but intellectual and deductive” (quoted in Wiki). Most ID theorists seem to believe that armchair theorizing using the mathematics of information and complexity combined with combinatorial probability are the correct ways to understanding the nature of reality; they reject modern scientifically-grounded empiricism.

    This is the fundamental difference between science and ID theory.

    Wouldn’t the proper comparison be between ID and the armchair theorizing of evolutionary biology?

  30. EricMH: The fundamental disagreement is TSZ believes the physical world can truly contradict mathematics.

    F=ma. Except when it doesn’t.

  31. Mung: According to Alan organisms do not adapt. Is it then only parts of organisms that adapt? Organisms do not adapt but alleles do adapt?

    Its all in the niche.

  32. Mung: Wouldn’t the proper comparison be between ID and the armchair theorizing of evolutionary biology?

    No. Because such theorizing is only part of science, not its whole, and conforms to norms of scientific modeling if it is to be accepted by scientific community (ETA as something more than a useful teaching tool or as a part of a popularization).

    For an interesting discussion that involves this point among others, see Daniel Ang’s posts about Dark Matter and about Math and Physics at PS.

    Eric tried to make the same point about thought experiments by eg Einstein at one point, and he was wrong for the same reason.

    Wait, this was a Mung post ETA 2: But if your post was not meant to be taken seriously, then never mind.

  33. BruceS: I think Eric needs to start with explaining some very basic aspects of how his theory is meant to model biology. Like what strings his mutual information is supposed to be between. Or how relative fitness is captured. Or how mechanisms like NS are captured.

    I understand he thinks his theory can rule out naturalistic explanations of NS, but first I think he needs to explain how all of these concepts are even represented in models based on his theory.

    Unless his goal is to join de DI and sell books to creationists

  34. Corneel: we can actually make a nice visualization of the fitness landscape.

    Good plot, but would be more informative if you had the argument ylim=c(0,3) in your R plot command so as not to give the impression that mean fitness is zero at the left end of the plot.

  35. Mung:

    Joe Felsenstein: I think when Corneel says “the mathematical model”, what was meant was my simple population-genetic model of natural selection in an organism, the one in this 2012 post that I referred Eric to.

    According to Alan organisms do not adapt. Is it then only parts of organisms that adapt? Organisms do not adapt but alleles do adapt?

    OK, I said “in an organism” when I should have said, more clearly, “in a population”.

    Individual organisms can change adaptively (or maladaptively). That is an interesting topic for someone else to discuss somewhere else.

  36. BruceS: Well, my education (if not vocation) was in math, so I’m OK with the math bit. It’s how much biological basic work needs to be in place to even talk about fitness surfaces that I wondered about.

    Sure, and I appreciated the opportunity to demystify fitness landscapes. Nothing fancy needed as you can see; you could make one for any model where genotypes / phenotypes need to be mapped to fitness.

    BruceS: I think Eric needs to start with explaining some very basic aspects of how his theory is meant to model biology. Like what strings his mutual information is supposed to be between. Or how relative fitness is captured. Or how mechanisms like NS are captured.

    I understand he thinks his theory can rule out naturalistic explanations of NS, but first I think he needs to explain how all of these concepts are even represented in models based on his theory.

    100% agree

    BruceS: Based on his latest post, the issue may be that Eric does [not] agree with the scientific approach to modeling.

    The issue may be that he is stalllllling 😉

  37. Joe Felsenstein: Good plot, but would be more informative if you had the argument ylim=c(0,3) in your R plot command so as not to give the impression that mean fitness is zero at the left end of the plot.

    Agreed, and I think that a line looks nicer too. Here comes take two:

  38. The issue may be that he is stalllllling

    I don’t think that this is fair, even in jest. Everyone should have time to think about what they have to say, especially when addressing such a fundamental issue.

  39. Joe Felsenstein:
    Those of us who make mathematical models of the real world are painfully aware that reality can, and usually does, fail to match our models. I’m unaware of anyone who makes these models trying to explain away that discrepancy by asserting that they have contradicted mathematics itself.

    It’s kindergarten stuff learning that the math models the reality, but it’s not the same as the reality. Eric might have missed kindergarten.

  40. dazz: Unless his goal is to join de DI and sell books to creationists

    I understand you are not serious.

    So in that spirit, let me point out (before Mung does!). That same concern has been expressed about string theory and the new physicists who concentrate on it rather than looking for new ideas in unifying GR/QFT. Namely, string theory is the main game in town for such physicists, and so they have to study it and support it to build a career by publishing papers and to earn a living as a postdoc.

    But this is not to question their sincerity in believing in string theory as a valid approach to the issue. Similarly, from all I know of Eric here and at PS, I believe he is sincere in his belief that ID theory is a valid approach to understanding the appearance of design in nature.

    Corneel

    The issue may be that he is stalllllling

    Again, I understand you are being humorous (despite me being on the spectrum with respect to to reading the emotional import of emojis). But I do want to emphasize that based on all the posts I have seen from Eric here and at PS, I think he is sincere and honest about his beliefs.

  41. EricMH:
    I believe the disagreement between ID and the TSZ folks is deeper than the biology. The fundamental disagreement is TSZ believes the physical world can truly contradict mathematics.

    Mathematics are used to try and represent observations. It’s the observations that rule the math, not the other way around. So, if this is The Disagreement, then it is one where ID would be making a fundamental cart-before-the-horse philosophical blunder. It’s obvious that the math is used to make representations of phenomena, not phenomena to make representations of the math.

    To be fair, some actual scientists talk, or even write whole books, making the very same cart-before-the-horse blunders. For example, in some interview, Kraus, or some other physicist, talked about “the equations that govern the universe.” I thought, what an idiot, and stopped reading right there.

    We seem to be failing at teaching philosophy properly after kindergarten, and most people forget kindergarten, so we’re left with this mess.

  42. Entropy: Eric might have missed kindergarten.

    I don’t think the abuse is appropriate. I would like to hear more about what Eric thinks about the relationship between modeling and the ID movement.

    Edit: Because I think he is articulating a faulty point of view that underlies the work of Dembski and Marks.

  43. Entropy: For example, in some interview, Kraus, or some other physicist, talked about “the equations that govern the universe.” I thought, what an idiot and stopped reading right there.

    You are correct with regards to mathematics.
    But there is a real issue with the nature of the laws of physics. Namely, are they necessary: do they always apply? If so, how can necessity be a feature of the world? If not, why is physics so successful in explaining and predicting regularities?

    One can try arguing that such regularities are simply things we humans impose on reality, but that view has issues as well.
    https://plato.stanford.edu/entries/laws-of-nature/

    Philosophy for sure, though maybe a bit advanced for kindergarten.

    Stephen Hawking:
    [start of quote]
    Even if there is only one possible unified theory, it is just a set of rules and equations. What is it that breathes fire into the equations and makes a universe for them to describe?
    (A Brief History of Time, p. 190)

Leave a Reply