Algorithmic Specified Complexity and the Game of Life

Ewert, Dembski, and Marks have a forthcoming paper: “Algorithmic Specified Complexity and the Game of Life” – It appears to be behind paywalls though. Can anyone provide a copy?

Please note, any comments not directly addressing the math or mechanics of this post will moved to Guano (thanks Neil and Alan)

My earlier post:

1. In Conway’s Life: http://en.wikipedia.org/wiki/Conway%27s_Game_of_Life
2. There is the Glider-Producing Switch Engine http://conwaylife.com/wiki/Glider-producing_switch_engine
3. It is coded by 123 “On Cells” but requires a space of 67×60 in a specific configuration.
4. That’s 4,020 bits, > UPB.
5. It contains well matched parts : 4bli,3blo,2bee,1boat,1loaf,1ship,1glider http://wwwhomes.uni-bielefeld.de/achim/moving.html
6. It occurs naturally out of randomly configured dust : http://wwwhomes.uni-bielefeld.de/achim/moving.html
7. It can evolve from a much smaller entity (“time bomb” – 17 active cells): http://conwaylife.appspot.com/pattern/timebomb

Possible criticisms:

Information is hidden somewhere
This is under “standard” Life rules (B3/S23) which means there is precious little exogenous information:

1.Any live cell with fewer than two live neighbours dies, as if caused by under-population.
2.Any live cell with two or three live neighbours lives on to the next generation.
3.Any live cell with more than three live neighbours dies, as if by overcrowding.
4.Any dead cell with exactly three live neighbours becomes a live cell, as if by reproduction.

(Source: http://en.wikipedia.org/wiki/Conway%27s_Game_of_Life#Rules)

These are not self-replicating
This is not actually a requirement of Specified Complexity and it does send off some of its parts into the life universe.

Also interesting – some musings on how big a life universe might have to be to support self-replicating life: http://ieet.org/index.php/IEET/more/prisco20140915

106 thoughts on “Algorithmic Specified Complexity and the Game of Life

  1. Cross-posting some relevant comments from the other thread.

    Tom English on March 19, 2015 at 4:58 am said:

    Richardthughes,

    I’m interested in neither FIASCO nor the present fiasco. I’d like to see you turn your post into a response to Ewert, Dembski, and Marks, “Algorithmic Specified Complexity and the Game of Life” (forthcoming). I’ve studied the paper, and would be interested in discussing it with folks here, provided that someone shovels guano.

  2. And:

    Richardthughes on March 19, 2015 at 5:10 am said:

    Hi Tom!
    Long time fan, hope you’re well.
    Although I think there is something in the notion that the game of life can create SC / Information, A paper that’s worthy is likely beyond me; not that I wouldn’t like to contribute in some way.

    I find ‘life’ fascinating and the designed ‘machines’ are creative marvels, I was wondering what the most complex thing that could naturally emerge was.

    I’d be happy to start a more academic thread (I was hoping the the tone here would be better but that has sadly slipped). I’m sure that Alan and Neil our two able moderators will Guano any off topic responses (even my own!). Would you care to join me in such a post? I’ll put the kettle on…

  3. And:

    keiths on March 19, 2015 at 6:59 am said:
    Hi Tom,

    Nice to see you.

    I just started reading the Ewert/Dembsi/Marks paper, but after two and a half pages my hand-waving detector is already going off.

    Here’s what triggered it:

    In the game, determining the probability of a pattern arising from a random configuration of cells is difficult. The complex interactions of patterns arising from such a random configuration makes it difficult to predict what types of patterns will eventually arise. It would be straightforward to calculate the probability of a pattern arising directly from some sort of random pattern generator. However, once the Game of Life rules are applied, determining what patterns would arise from the initial random pattern is nontrivial. In order to approximate the probabilities, we will assume that the probability of a pattern arising is about the same whether or not the rules of the Game of Life are applied, i.e., the rules of the Game of Life do not make interesting patterns much more probable than they would otherwise be.

    [Emphasis added]

    They’ve run head-on into the problem that has bedeviled [Dembski’s] specified complexity all along: no one can calculate the relevant probabilities. If these guys can’t even estimate the probabilities in a vastly simplified universe like the Game of Life, what hope do they have of applying ASC to the real world?

    Just imagine trying to make this argument about the real world: “The laws of nature make it difficult to predict the probability of meaningful patterns arising from our initial conditions, so we’re just going to assume that they don’t make any difference, and that interesting patterns aren’t much more probable with the laws of nature than without.”

    I hope the paper gets better. This is certainly an inauspicious start.

  4. Can’t promise instant attention to inappropriate comments but I will attempt to apply Dr Liddle’s rules as requested for this thread. That means comments that I perceive as being outside the rules will move to guano and off-topic comments may move elsewhere.

    I have to say I think the idea of being a little less proactive and waiting for someone to flag a rule-breaking comment seems to be working quite well, on the whole.

  5. Alan,

    I have to say I think the idea of being a little less proactive and waiting for someone to flag a rule-breaking comment seems to be working quite well, on the whole.

    Enthusiastically seconded!

  6. I have not yet read the two recent Ewert-Dembski-Marks papers. But in the past William Dembski has used compressibility of the phenotype as one possible scale on which to make his CSI argument. In effect, he argued that if the phenotype could be coded for by a nonrandomly short program, then this could be sufficiently improbable that one could call it CSI.

    All other problems with CSI aside, it is mysterious to me why this is even relevant. A spherical balloon can be described very simply. A hummingbird takes a much longer description. Does the balloon then have more specification than the hummingbird?? Dembski never clarifies why this is relevant, and to what. At times he described algorithmic information as fundamental to his argument, at other times it simply is one of the scales that can be used to “cash out” the complexity of the organism. Even calling it “complexity” is strange because it is the extreme simplicity of the algorithm that is measured.

    In the 2014 and 2015 papers, there seem to be some new steps to the argument, so perhaps these issues are clarified there.

  7. From the paper:

    “We made a simplified assumption about the probabilities of
    various pattern arising. We have merely calculated the probability
    of generating the pattern through some simply random
    process not through the actual Game of Life process.”

    Anyone see a problem with this?

  8. Also, minor typo?

    “Fig 4. Gosper gliding gun” is actually a “Gosper glider gun”.

  9. The Ewert/Dembski/Marks paper is an abysmal mess.

    Details to follow, but I needed to vent. 🙂

  10. I was expecting a Google Scholar alert, and didn’t know that the paper had been published. The PDF is available at the Evolutionary Informatics Lab.

  11. Richardthughes and keiths,

    It’s good to be here with you and the others. I do follow TSZ. But I’ve been trying to focus (not my strong suit) on pinning Dembski down in terms that an ID-sympathetic judge cannot easily ignore.

    Let’s start with the acknowledgement at the end of the paper:

    The authors would like to thank the reviewers’ suggestion of contrasting our paper with the study of temporal emergence properties of cellular automata.

    This means that a reviewer complained that what Ewert, Dembski, and Marks (EDM) regard as the probability of a pattern occurring “naturally” has absolutely nothing to do with the physics of the virtual universe. In the passages that Rich and Keith quoted, EDM are trying to satisfy the associate editor managing the review process that they’ve responded to the criticism. It’s possible that they were required to run a revised draft past the reviewers. But I can’t believe that anyone who objected to the probability distribution would accept the revisions. Here is the “contrasting”:

    Table III shows the calculated values of OASC for the various oscillators. The Pr [X] column derives from experiments on random soups [50]. The missing entries do not appear to have been observed in random trials.

    That is the full extent of the explanation. EDM do not want us to know that the probabilities of patterns present at the end of a Life simulation are much greater than the probabilities that they use. You’re supposed to be “contrasting” the Pr[X] column with the “Complexity” column, which contains the negative base-2 logarithms of the stupid probabilities. In the crude table below, I provide directly comparable values. In all cases where the alternative complexity value -log Pr[X] is available, it is less than the (upper bound on) conditional Kolmogorov complexity K(X|C). This means that the alternative OASC values (not shown) are all negative, and not the least indicative of design.

    Complexity(X), -log Pr[X], K(X|C)

    block: 12.68, 1.630, 38
    blinker: 10.68, 1.603, 40
    caterer: 61.68, 33.60, 41
    mazing: 60.83, 27.71, 42
    pseudo-barberpole: —
    unix: 75.96, 30.66, 43
    burloaferimeter: —
    figure eight: 50.91, 24.98, 44
    29p9: —

    (Keep it in mind that halving the complexity value corresponds to taking the square root of the probability. The differences are extreme.)

    There’s an unexplained “Bound” column in Table III. One naturally guesses that it is related to Pr[X], because the two columns are formatted identically, and because neither of the columns appears in the analogous Tables IV and V. But I determined by guessing again that the “Bound” values are simply 2 ^ -OASC. There’s no good reason for the column, as the other tables testify. It does, however, serve as a smoke screen for the Pr[X] column. It keeps the untransformed Pr[X] values from standing out. Most readers will simply give up trying to make sense of both columns.

    Now let’s return to the conclusion. EDM say of their physics-ignorant probability assignment,

    This appears to work, with the exception of the unix pattern. However, even that pattern was less than an order of magnitude more probable than the bound suggested.

    So the “bound” pops up in the conclusion, even though it’s explained nowhere in the paper? Going back to Table III, Pr[X] and 2 ^ -OASC (“Bound”) indeed differ by less than a factor of ten for the unix pattern. But there is absolutely no sense in comparing those numbers. The negative base-2 logarithm of Pr[X] should be compared to Complexity(X), as I did in my table. It’s pretty hard to believe that this is just “oopsies” when DEM have so obviously gone out of their way to conceal the truth.

    EDM would put the alternative Pr[X] values into Table III only under duress. The reviewer no doubt expected to see a column of alternative values of OASC, calculated as –log Pr[X] — K(X|C). I’m guessing that the associate editor was supposed to check whether required revisions were actually completed satisfactorily. The authors returned the revised paper with a note listing revisions: “Done. Done. Done.” I can’t say whether the associate editor was lazy, incompetent, or currying favor with a fellow of the IEEE (Marks).

    By the way, EDM’s reference [50], from which the Pr[X] values come, is the link that Rich gives in points 5 and 6 of the post. The page has been on the Web for 20 years. The details of the simulation are given here, following the list of the 100 most common objects in the “ash” of a Life run.

  12. I have just finished reading both of the new Ewert/Dembski/Marks papers, the 2015 one on the Game of Life and the 2014 paper on Algorithmic Specified Complexity (ASC) from the Bartlett et al. multi-author volume. Except for page 136 of the 2015 paper, which was left out of the posted PDF, most likely by accident rather than Design.

    Tom’s points are interesting, and suggest that their methodology does not in fact detect Design in the cases they give in the 2015 paper. But let me raise another issue, more or less the one I raised in my earlier comment in this thread.

    ASC is defined as the difference between the ordinary Shannon information and the Kolmogorov complexity. So it gets large when there is a pattern that consists of many bits, but which can be coded for by a short program. A random bit string has small ASC, a long bit string coding a known mathematical constant (such as γ = 0.57721566…, or the square root of 1/2) will have a substantial ASC.

    Most of both papers consists of calculating ASC for various examples. No part of either paper explains what this has to do with Design.

    In what sense does that indicate that the bit string is designed? It is hard to believe that it is one of those arguments that mathematics is so beautiful that the Universe must be designed. There is also no argument anywhere in either paper that natural selection cannot bring about high values of ASC.

    (Of course the argument could be on page 136 of the 2014 paper).

  13. Help, please.

    The article claims that “ASC was introduced by Dembski” in The Design Inference (1998). I don’t own a copy of the book, and I can’t see much of it at Google Books. I know that Dembski mentions context, and also Kolmogorov complexity. But I don’t recall that he says anything in his solo work about using conditional Kolmogorov complexity to take context into account. I’d gotten the impression that Ewert was taking credit for the generalization of K(X) to K(X|C), with

    ASC(X, P, C) = –log P(X) — K(X|C).

    Here X is an object, P(X) is the chance of it “naturally” arising, and C is the context.

  14. Joe,

    In what sense does that indicate that the bit string is designed?

    It’s the same argument as for “regular” specified complexity. Anything sufficiently specified and complex is designed, according to EDM. In the ASC papers, ‘specified’ means ‘has low Kolmogorov complexity’, and ‘complex’ means ‘has low probability’ (high Shannon information).

    EDM try to mash the two together into a single number, ASC, by subtracting KC from Shannon information.

    There are huge problems with all of this, which I’ll address in later comments.

    There is also no argument anywhere in either paper that natural selection cannot bring about high values of ASC.

    Physics subsumes natural selection, and in the Life paper EDM are trying to show that the Life world’s “physics” prohibits high ASC except via design.

    They fail, of course, because their argument depends on the faulty assumption that we’ve been criticizing throughout the thread:

    In order to approximate the probabilities, we will assume that the probability of a pattern arising is about the same whether or not the rules of the Game of Life are applied, i.e., the rules of the Game of Life do not make interesting patterns much more probable than they would otherwise be.

  15. Tom,

    That’s one of the many things about this paper that raised my eyebrows. They cite The Design Inference but give no page numbers. They do the same in Algorithmic Specified Complexity and in On the Improbability of Algorithmic Specified Complexity.

    I have a copy of The Design Inference. It’s physical, unfortunately, so I can’t search it electronically, but I haven’t found anything in it that resembles ASC, either in the index or in the text. And I certainly don’t remember seeing any such thing when I read it years ago.

    I wonder: Is Ewert brown-nosing his mentor?

  16. Keith:

    I’ve hit upon a simple, substantive, and indisputable error. Please contact me by email (see “About Me” at my blog). I remember your last name, from way back when.

  17. Tom English and keiths,

    I’ll take a look at No Free Lunch and see if I can find all the references to Kolmogorov complexity. It does not help that I have a paper copy rather than an electronic one.

    One rather silly detail of the ASC formula

    ASC(X, P, C) = – log(P(X)) – K(X|C)

    is the inclusion of P as a separate parameter. After all, if P is a function of X then it cannot be varied independently of X. I suppose one can say that defining P is part of stating the problem, so one should include it. Or that it is part of the context C, so that one should not include it. I doubt that this is a fundamental problem, but it is puzzling. But then again, why this formula has anything to do with Design is puzzling too.

  18. Joe Felsenstein,

    Joe, maybe P stands for Puzzling and was put there on purpose to puzzle non-IDiots. :p

    When I read comments by you and others who spot the flaws in the equations/formulas and other ‘mathy’ stuff that IDCs like Ewert, Marks, Dembski, Sewell, etc., put forth I find myself wondering why those guys apparently think that no one is going to notice the flaws. And if those guys aren’t aware of the flaws (that are usually quickly spotted by others) then what does that say about their alleged (and/or self-proclaimed) expertise in math?

    And yeah, what does that formula have to do with ‘design’, and especially ‘design’ (i.e. creation) by the imaginary characters yhwh-jesus-holy-ghost or any other so-called ‘designer’ (i.e. creator) of the universe? How on Earth can Ewert, Dembski, Marks, Sewell, etc., think that no one will notice that they have not and cannot connect their ‘mathy’ stuff or any of their other ‘sciency’ sounding stuff to ‘design’ (i.e. creation) by their imaginary, so-called ‘God’?

    Hmm, I wonder if kairosfocus will add the letter A (for Algorithmic) to FSCO/I? If so, then even he could call it ‘FIASCO’. 🙂

  19. Tom English:

    Keith:

    I’ve hit upon a simple, substantive, and indisputable error. Please contact me by email (see “About Me” at my blog). I remember your last name, from way back when.

    Hi Tom,

    I sent you an email as requested.

  20. Let me start dribbling out my criticisms of the EDM paper as time permits. Here are some comments on the concept of Algorithmic Specified Complexity itself, defined as – log2(P(X)) – K(X|C) .

    1. Shannon information and conditional Kolmogorov complexity are not commensurable, so it really makes no sense to subtract one from the other. Shannon information is a probability, and Kolmogorov complexity is a program length. EDM seem to have been bamboozled by the fact that both are expressed in bits.

    2. The name doesn’t correspond to the metric. ASC can be negative. How does it make sense to speak of negative complexity?

    3. EDM are reifying the difference – log2(P(X)) – K(X|C) with no justification. If ASC(X, P, C) = ASC(Y, P, C) = 500 bits, one might reasonably expect this to point to some essential commonality between X and Y. They have the same “algorithmic specified complexity”, after all. But -log2(P(X)) might be 1000 bits, with K(X|C) equal to 500 bits, while -log2(P(Y)) is 100,000,000,000 bits, with K(Y|C) equal to 99,999,999,500 bits. Other than the fact that they happen to share the same difference between two incommensurable quantities, what is the essential similarity?

    4. Why settle on -log2(P(X)) – K(X|C), instead of on -k1 * log2(P(X)) – k2 * K(X|C), with k1 and k2 not equal to 1? EDM offer no justification, but the values of k1 and k2 make all the difference in the world to the value of ASC, and hence to the design inference.

  21. All good questions.

    Worth noting is that ASC is a number. Complex Specified Information, as used by Dembski was not a number but a true/false assessment. Leslie Orgel’s original “specified information” (discussed in his 1973 book The Origins of Life on page 189), Szostak and Hazen’s “functional information” (here, here, and here) and my own less-well-discussed “adaptive information” (here) are all defined on numerical scales.

    Dembski’s Complex Specified Information is a true/false statement that there is more than 500 bits of specified information. So you can’t actually say what the amount of CSI is — it isn’t a number. However in his 2005 or 2006 article Specification: The Pattern that Signifies Intelligence (here) Dembski defines Specified Complexity using a numerical formula, with the declaration that there is Complex Specified Information being the statement that SC > 0.

    ASC is close to Dembski’s SC, except that Kolmogorov Complexity is used. In the Dembski 2005/2006 article, Kolmogorov Complexity if discussed, but is only used as one of the possible ways of showing that a pattern is improbable.

    A major difference between the Orgel-Hazen-Szostak-(me) type of information and Dembski’s specified complexity is that the former has no term that assesses whether or not it can be achieved by evolutionary processes, including natural selection. Dembski’s SC has the infamous term P(T|H) which includes the probability of achieving the pattern, whatever it is, by natural evolutionary processes. Nobody knows how to compute that term.

    Interestingly, ASC does not have P(T|H) in it. Which is why we get to ask how it is known that it signifies Intelligence or Design and cannot be made larger than zero by natural selection.

  22. keiths:

    I’m playing along with an error of EDM, which I don’t care to explain to them.

    1. Shannon information and conditional Kolmogorov complexity are not commensurable, so it really makes no sense to subtract one from the other.Shannon information is a probability, and Kolmogorov complexity is a program length. EDM seem to have been bamboozled by the fact that both are expressed in bits.

    Dembski has in fact subtracted apple bits from orange bits in some instantiations of specified complexity. I raised this issue in my first (bleary-eyed) response to ASC, back in the summer. After sleeping on it, I realized that K(X|C) corresponds to an algorithmic probability distribution, with

    P'(X) = 2 ^ –K(X|C).

    So ASC is implicitly

    –log P(X) — (–log P'(X)) = log(P'(X) / P(X)),

    which is formally similar to active information in the case of a singleton target X. However, the sum of P'(X) over all objects X is less than 1. The same holds for the P-probabilities in the Life article. So we have a new issue, the difference in normalizing constants for the two distributions.

    2. The name doesn’t correspond to the metric. ASC can be negative. How does it make sense to speak of negative complexity?

    It doesn’t. But given the authors’ interpretation of ASC, I think it’s better to ask how the quantity of meaningful information can be negative. I think that the only way to ensure non-negative values is to let P be the universal distribution, with P(X) = 2 ^ -K(X). Then

    ASC(X, P, C) = K(X) – K(X|C),

    the algorithmic mutual information of X and C. But that would not make EDM happy.

    3. EDM are reifying the difference – log2(P(X)) – K(X|C) with no justification. If ASC(X, P, C) = ASC(Y, P, C) = 500 bits, one might reasonably expect this to point to some essential commonality between X and Y. They have the same “algorithmic specified complexity”, after all. But -log2(P(X)) might be 1000 bits, with K(X|C) equal to 500 bits, while -log2(P(Y)) is 100,000,000,000 bits, with K(Y|C) equal to 99,999,999,500 bits.Other than the fact that they happen to share the same difference between two incommensurable quantities, what is the essential similarity?

    ID is all about reification — intelligence is an abstraction — but I’m not following you here. The example is provocative. But you have to revise what you’re saying about it, in light of the fact that the choice of “context” C and a universal computer is implicitly a choice of an alternative probability distribution P’. I’m sure you’ve noted that C is in practice always the decoder for a coding system tailored to the application.

    By the way, EDM evidently have parted ways with the universal probability bound, not only in their work on ASC, but also in their work on active information.

    4. Why settle on -log2(P(X)) – K(X|C), instead of on -k1 * log2(P(X)) – k2 * K(X|C), with k1 and k2 not equal to 1? EDM offer no justification, but the values of k1 and k2 make all the difference in the world to the value of ASC, and hence to the design inference.

    ASC is a log-ratio of unnormalized probabilities. The normalizing constants are additive under log transformation. I don’t know what to make of the exponentiation implied by your multiplicative constants.

  23. Joe Felsenstein:

    Worth noting is that ASC is a number.Complex Specified Information, as used by Dembski was not a number but a true/false assessment.Leslie Orgel’s original “specified information” (discussed in his 1973 book The Origins of Life on page 189), Szostak and Hazen’s “functional information” (here, here, and here) and my own less-well-discussed “adaptive information” (here) are all defined on numerical scales.

    Checking Dembski’s claim in his Chicago talk that Dawkins had tacitly admitted to targets in biological evolution, I read Chapter 1 of the The Blind Watchmaker for the first time in forever. Dawkins’ characterization of a complicated object matches up perfectly with the conventional explanation of specified complexity. In his 2002 review of No Free Lunch, Richard Wein suggests in footnote 44 that Dembski may have gotten the concept from Dawkins, having ruled out Davies and Orgel in Section 6.7. Given Dembski’s rhetoric in subsequent years, the connection is quite clear.

    Dembski’s Complex Specified Information is a true/false statement that there is more than 500 bits of specified information.So you can’t actually say what the amount of CSI is — it isn’t a number.However in his 2005 or 2006 article Specification: The Pattern that Signifies Intelligence (here) Dembski defines Specified Complexity using a numerical formula, with the declaration that there is Complex Specified Information being the statement that SC > 0.

    He quantified specified complexity in No Free Lunch. His so-called Law of Conservation of [Complex Specified] Information allows for a gain of up to 500 bits of specified complexity by a strictly “naturalistic” process. In “Specification,” he lumps the universal probability bound (reduced to 400 bits) into the measure, treating it as “replicational resources” (maybe “probabilistic resources” — I don’t care to look it up). Formally, you can take it out, and declare design when SC > 400 bits. That gets us a step closer to ASC.

    ASC is close to Dembski’s SC, except that Kolmogorov Complexity is used.In the Dembski 2005/2006 article, Kolmogorov Complexity if discussed, but is only used as one of the possible ways of showing that a pattern is improbable.

    Dembski does discuss Kolmogorov complexity. And you’re right that it is only one approach to measuring the descriptive complexity of a pattern.

    You’ve set me up to put EDM’s “context” into historical context. In “Specification,” the measure of descriptive complexity of events is parameterized by a “semiotic agent.” Semiotic agents assign strings of symbols to events. They have different vocabularies (finite sets of symbols that they use). For a given semiotic agent, the descriptive complexity of an event is determined by the length of the string assigned by the agent, and the size of the agent’s vocabulary. EDM have made the semiotic agent inexplicit, and have limited it to a binary vocabulary. It associates events (atomic, thus far) with binary descriptions. No description is a prefix of any other. The descriptive complexity of an event X is just the length of its binary description, which I’ll express as DL(X). What EDM call the “observed observed [sic] algorithmic specified complexity (OASC),” we can write

    OASC = -log P(X) – DL(X) bits.

    Here the P(T|H) of “Specification” has changed to P(X). There are no alternatives to the chance hypothesis H. EDM do not refer to a “target,” so variable T has become a generic X.

    The “context” C is a program that transforms the description of X into object X. K(X|C) is the length of the shortest program that, supplied with C as an input, outputs X. In this measure of descriptive complexity, “context” C corresponds to what Dembski formerly called a semiotic agent. Somebody decides on how to associate objects with binary descriptions.

    A major difference between the Orgel-Hazen-Szostak-(me) type of information and Dembski’s specified complexity is that the former has no term that assesses whether or not it can be achieved by evolutionary processes, including natural selection. Dembski’s SC has the infamous term P(T|H) which includes the probability of achieving the pattern, whatever it is, by natural evolutionary processes. Nobody knows how to compute that term.

    Dembski was interested primarily in biological evolution, to be sure. But P(T|H) — now P(X) — is the chance of the target event T — now X, not referred to as the target — under strictly “naturalistic” processes, not necessarily evolutionary processes. As Keith commented early on, EDM have perfect knowledge of the physics of the Life universe, and still can’t calculate probabilities of objects.

    Interestingly, ASC does not have P(T|H) in it. Which is why we get to ask how it is known that it signifies Intelligence or Design and cannot be made larger than zero by natural selection.

    Elsberry and Shallit came up with examples of unbounded CSI gain that did not involve natural selection. I think we need to take a close look at Rich’s example, the glider-producing switch engine, at some point.

  24. Remotely related – the ‘fine tuning’ of life:

    Does anyone know how sensitive the evolution of these shapes are to initial seed. I’m presuming that the probability of any cell being alive at the start is 0.5? Are different seeds ever used with different probabilities?

  25. For evolution to work, the only relevant probability is that of finding a non-fatal mutation within the reach of a single mutational step, given 20 or so kinds of mutation, plus recombination.

  26. The next item on my list merits its own comment, I think, because it is one of the most glaring problems with the EDM paper.

    5. Algorithmic specified complexity is a function not only of P, the probability distribution, and X, the pattern in question, but also of C, which EDM refer to as the “context”.

    The problem is that K(X|C) is highly dependent on C. In fact, for any specific pattern X, I can construct a context C1 in which X is compressible to a single bit. I can also construct a C2 in which X is incompressible.

    This leads to hugely different values of ASC depending on the context, and therefore to conflicting design inferences.

    In other words, any sufficiently improbable pattern X will yield a design inference if a suitable context C is chosen.

    What is the “right” C, how do we know, and how do we find it? EDM don’t say. Without that, their claims for ASC are empty.

  27. Heh.

    I just skimmed through EDM’s 2012 paper Algorithmic Specified Complexity and found the following. It shows that they were aware of the problem mentioned in my previous comment, but that they tried to brush it off:

    4.2 Context is Subjective

    The ASC of any object will depend on the context chosen. Any object can be made to have high ASC by using a speci cally chosen context. But this appears to be the way that information works. If the authors, who do not understand Arabic, look at Arabic text, it appears to be no better then scribbling. The problem is not that Arabic lacks information content, but that we are unable to identify it without the necessary context. As a result, this subjectivity appears to capture something about the way information works in the human experience.

    As with speci cation, it is important the context be chosen independent of object under investigation. While a speci cation will rarely be independent of the object under investigation, we believe it is much easier to maintain this independence in the case of a context.

    [All typos in original]

    This just reinforces my point. Any sufficiently improbable X will exhibit high ASC if C is chosen correctly. EDM handwave this away by saying that the context needs to be independent of the object under investigation, but how do they guarantee this? What is the range of ASC (or OASC) values for contexts that are chosen independently, and how can they avoid bogus design inferences?

    Also, it’s obvious that the context C they selected in the Game of Life paper was not at all independent of the objects under investigation. It was specifically designed to compactly capture objects they considered interesting, such as spaceships and oscillators!

  28. Richardthughes,

    So is if fair to say they’ve left the objective empirical domain?

    Well, if ASC tells us that every single object is designed, depending on the choice of C, and EDM give us no trustworthy way to decide which contexts are admissible (and hence which ASC values are admissible and which are not), then the whole project seems pretty hopeless.

  29. On to my next complaint about the paper.

    6. The very first sentence of the abstract reads:

    Algorithmic specified complexity (ASC) measures the degree to which an object is meaningful.

    Interesting claim. How to test it?

    Well, EDM certainly can’t prove it, so the test has to be empirical.

    Okay, let’s collect a bunch of ASC values (or OASC values, which are lower bounds for the true ASC values), then collect the meaningfulness values for the same objects, and test the strength of the correlation.

    Oops, we can’t do that, because no one knows what the “meaningfulness” values are. That’s what ASC is supposed to provide us: the first viable tool for measuring the degree of meaningfulness of an object.

    Okay, then let’s look at a bunch of objects that Life aficionados are interested in. Surely those objects are meaningful. And we may not be able to assign meaningfulness values to them in order to see how well they correlate with the ASC values, but at least we can see if the ASC values exceed some threshold for all of these “meaningful” objects.

    Aw, crap. Two of the most important objects in the Life universe, blocks and blinkers, have -25 and -29 bits of OASC, respectively. They’re meaningful, but they fail the ASC test.

    So much for this:

    Algorithmic specified complexity (ASC) measures the degree to which an object is meaningful.

  30. Richardthughes:

    Thanks Tom. I suspect this opens up a “fine tuning” argument.

    I stumbled across a posting somewhere that initial densities ranging from .11 to .5 on a finite grid gave similar results. However, people interested in the emergence of self-reproducing patterns and evolution stipulate very low densities (I don’t know how low), and ask how large a region is required to observe the phenomena with some given probability.

  31. Just to add to the list of mysteries: The assertion is made by Ewert et al. (2015) that the Kolmogorov Complexity of X may differ depending on what kind of computer is executing X as a program. But that different computers differ by at most a constant.

    At the same time, it seems to be true that by choosing a suitable computer, K(X) can be made small. In effect, there is no unique compression algorithm. The Kolmogorov (-Chaitin-Solomonoff) argument uses the fact that the distribution of amounts of compression can be known, since there are at most 2^n things that can be losslessly compressed down to n bits. But that argument does not say which things those are.

    That would seem to invalidate the point about the amount of compression differing by at most a constant. No matter what constant you choose, there is some object which looks random under one compression method but is very nonrandom under another. And that difference in its computed value of K(X) under the two methods exceeds your chosen constant.

    EDM 2015 also say that

    However, the true KCS complexity is always equal to or less
    than any achieved lossless compression.

    How can that be true if by changing computers I can make K(X) for any given X small, or large?

    I must have missed something about Kolmogorov Complexity. Is the at-most-a-constant constant validated there?

  32. Joe Felsenstein: That would seem to invalidate the point about the amount of compression differing by at most a constant.

    Well, no, it doesn’t invalidate that.

    The real point is that Kolmogorov complexity is mainly of interest for it’s asymptotic value. That is to say, it is a mathematical theory, of interest to mathematicians who care about what happens as you approach infinity. Its use for ordinary non-mathematical objects outside of asymptotic behavior, is dubious.

  33. On to the next item on my list. It’s a doozy.

    7. EDM get conditional Kolmogorov complexity backwards, confusing program with data and vice-versa.

    K(X|C) is defined as the length of the shortest program that can produce the pattern X on a given machine, with C as input.

    EDM describe a setup in which the machine runs a program called the interpreter, which produces Game of Life patterns from their compressed descriptions. The interpreter is included as part of the context C.

    EDM treat the length of the compressed description as an upper bound on the conditional Kolmogorov complexity of the corresponding object. But the description is not a program. It is data that is fed to the interpreter. And the interpreter is a program, not data, yet it is being fed to the machine as part of the context C, which is data.

    They’re treating data as program, and program as data — and then they’re measuring the length of the compressed description — the data — to estimate the conditional Kolmogorov complexity, when they are supposed to be measuring the length of the program.

  34. Had they been smarter, they would have stuck to regular Kolmogorov complexity instead of trying to use conditional KC. Then the interpreter could have been built into the machine, and the compressed descriptions actually would count as programs.

    They’d still have the problem of justifying their choice of machine, however. It’s equivalent to the problem I described earlier with justifying their choice of C.

  35. Joe,

    EDM 2015 also say that

    However, the true KCS complexity is always equal to or less
    than any achieved lossless compression.

    How can that be true if by changing computers I can make K(X) for any given X small, or large?

    They’re saying that for a given machine M, the Kolmogorov complexity is less than or equal to any achieved lossless compression on that same machine.

  36. On to the next flaw on my list:

    8. EDM incorrectly cite an upper bound for the ASC of ‘ash’ objects.

    They write:

    We observe that these [ash] objects are fairly small, and thus will not exhibit much complexity. The largest bounding box is 4 × 4 which will require at most 1 + l(4) + l(4) + 16 = 1 + 7 + 7 + 16 = 31 bits. Describing the simplest still life required 26 bits, leaving at most 4 bits of ASC. Consequently, none of these exhibit a large amount of ASC.

    Even ignoring all of the errors pointed out elsewhere in the thread, their reasoning here is incorrect. OASC can only be a lower bound for ASC. They have no basis for claiming an upper bound.

  37. Moving down my list:

    10. EDM determine the bounding box dimensions incorrectly.

    In EDM’s “standard encoding”, a pattern is represented by a length, a width, and a bit for each cell indicating whether it is alive or dead. The problem is that the cells within the pattern interact not just with each other, but also with the cells in the surroundings. As more generations elapse, a given cell is influenced by cells farther and farther away from it. Live cells in the surroundings can influence or destroy the specified pattern.

    Therefore, the bounding box should be drawn larger than the pattern — large enough to cover the period of interest. EDM mention a related problem in their Appendix, probably at the behest of a reviewer, but they fail to realize that it affects their P(X) encodings too.

    A particularly egregious case is the Gosper glider gun, which they use as an encoding example. EDM specify an insufficient bounding box of 36×9, so the length of their standard encoding is l(36) + l(9) +36 x 9.

    However, the Gosper glider gun has a period of 30(!), so the correct bounding box needs to be much, much bigger than the 36×9 box that they use. Since the bounding box factors into the encoding size, and the encoding size determines I(X), the ASC gets hosed when EDM use the wrong bounding box size.

  38. A comment I regret having withheld:

    K(X|C) is defined only if X and C are binary strings. In the Game of Life, the objects X are 2-dimensional arrays of binary values. The only fix is to introduce an encoding d(X) of all objects X as binary strings, and to write K(d(X)|C) instead of K(X|C). Think of d(X) as raw data registered on observation of object X.

    It happens that EDM explicitly introduce a second code, in terms of which complexity is defined. (The probability distribution P is implicit.) If we were to reuse the code, then we would have

    ASC = ℓ(d(X)) — K(d(X)|C),

    where ℓ(d(X)) is the length of d(X). Intuitively, making sense of the data is a matter of providing a concise account of it. Don’t take that as a glowing endorsement. It’s simply what I think EDM are coming from with talk of “meaningful information.”

  39. keiths:

    7. EDM get conditional Kolmogorov complexity backwards, confusing program with data and vice-versa.

    I don’t agree with this one.

    EDM treat the length of the compressed description as an upper bound on the conditional Kolmogorov complexity of the corresponding object.But the description is not a program.It is data that is fed to the interpreter.And the interpreter is a program, not data, yet it is being fed to the machine as part of the context C, which is data.

    They’re treating data as program, and program as data — and then they’re measuring the length of the compressed description — the data — to estimate the conditional Kolmogorov complexity, when they are supposed to be measuring the length of the program.

    Programs may be treated as data, and programs may contain data. Embed EDM’s description of X in a program p that does nothing but to run program C with the description as input. Program C outputs d(X) on input of the description, and then halts. The length of p is an upper bound on K(d(X)|C). The shortest program that outputs d(X) on input of C, and then halts, doesn’t necessarily work at all like p. It may even ignore C.

  40. keiths:

    This goes to a significant error in reporting. The “K(X}C)” values used in calculating OASC are way too small.

    Joe,

    They’re saying that for a given machine M, the Kolmogorov complexity is less than or equal to any achieved lossless compression on that same machine.

    From Section II-A of the article:

    However, the true KCS complexity is always equal to or less than any achieved lossless compression. This means that the true ASC is always equal to or more than an estimate. We will refer to the known estimate as the observed observed algorithmic specified complexity (OASC).

    Let’s write c(X) for EDM’s description of object X. They use the description length ℓ(c(X)) in place of K(d(X)|C) in calculating OASC. They are ignoring an additive constant corresponding to the length of the program that contains c(X), and arranges for program C to run on input of c(X). Greg Chaitin developed his Tiny Lisp system in order to get concrete bounds on Kolmogorov complexity, so I naturally think of how to write the program in LISP:

    (lambda ( C) (funcall C ‘011010001))

    This takes a supplied function C, and calls it on a binary description. Chaitin’s dialect is more concise. But I’m guessing that the program containing the description c(X) is more than 50 bits longer than the description alone. I suppose I’ll have to download Tiny Lisp and see what I get.

    Formally, combining a shortest program p and c(X) in a bigger program (p, c(X)),

    K(p, c(X)) = ℓ(p) + K(c(X)|p) + O(1),

    which is less than or equal to

    ℓ(p) + ℓ(c(X)) + O(1).

    The embedded program p works with c(X) for all X, so ℓ(p) is a constant. I illustrated this with the LISP expression.

  41. Tom,

    Programs may be treated as data, and programs may contain data.

    Sure, but data can’t be treated as programs, at least by my interpretation of conditional KC. The definitions I’ve seen draw a sharp distinction between program and input. For example:

    Definition: The Kolmogorov complexity of a string w with respect to L, denoted K_L(w) is the shortest program written in the language L which produces w as output. The conditional Kolmogorov complexity with respect to a string x, denoted K_L(w | x) (spoken w given x, as in probability theory), is the length of the shortest program which, when given x as input, outputs w.

    In normal usage it makes no difference, because people generally compare KC without using actual numerical values. Whether you include a fixed-length interpreter in C makes no difference as long as K(X|C) and K(Y|C) both benefit from it. However, EDM do depend on the specific numerical value of K(X|C) to compute ASC, so sneaking the interpreter into C makes the KC less than it would be otherwise and increases the OASC.

    Embed EDM’s description of X in a program p that does nothing but to run program C with the description as input.

    Even if you relax the distinction between program and input and allow the machine to directly execute C, EDM are still messing up because they don’t measure the length of p. They measure the length of the description embedded in p.

    Again, it wouldn’t matter in normal usage, but it does matter to EDM.

    Both problems go away if you simply use unconditional KC on a machine that directly executes the EDM description language.

Leave a Reply