Fallacy of the Phylogenetic Signal: Nucleotide Level


For the past month or so I’ve been investigating the claim that the phylogenetic signal is evidence that a dataset shares common descent.

Supposedly, the phylogenetic signal is one of, if not the, strongest pieces of evidence for common descent. It is one of the first of the 29+ evidences for evolution offered over at Talk Origins (TO).

Quoting from the article:

The degree to which a given phylogeny displays a unique, well-supported, objective nested hierarchy can be rigorously quantified. Several different statistical tests have been developed for determining whether a phylogeny has a subjective or objective nested hierarchy, or whether a given nested hierarchy could have been generated by a chance process instead of a genealogical process (Swofford 1996, p. 504). These tests measure the degree of “cladistic hierarchical structure” (also known as the “phylogenetic signal”) in a phylogeny, and phylogenies based upon true genealogical processes give high values of hierarchical structure, whereas subjective phylogenies that have only apparent hierarchical structure (like a phylogeny of cars, for example) give low values (Archie 1989; Faith and Cranston 1991; Farris 1989; Felsenstein 1985; Hillis 1991; Hillis and Huelsenbeck 1992; Huelsenbeck et al. 2001; Klassen et al. 1991).


I’ve been skeptical of this claim. A tree is just one kind of directed acyclic graph (DAG), and my hunch is many kinds of DAGs will also score highly on metrics for phylogenetic signal. I picked one metric, the consistency index (CI), which according to Klassen 1991 is the most widely used metric. It also is the featured metric in the above TO article. Plus, it is very simple to calculate. So, I’ve focused my efforts on the CI metric. Continue reading

If ID is false, why can we detect human engineered virii?

There is a strong suspicion the coronavirus is an escaped specimen from a Chinese lab.


An Indian scientist has purportedly discovered HIV inserted into the coronavirus. If true, this is pretty conclusive evidence the virus is humanly engineered, i.e. intelligently designed.

So, herein lies the conundrum. According to popular imagination, ID is both bad science and false (remember, good science is the falsifiable sort ;). If true, then it should not be possible to detect intelligent intervention in the genetic code.

Yet, this recent news item purports to be exactly that: identification of intelligent intervention in the genetic code.

Please explain this to me like I am 5: how can ID be both bad science and false, yet at the same time it is possible to identify intelligent intervention in the genetic code? If we can do so for the recent past, why can’t we do the same for the distant past?


Genetic code is a programming language

Specifically, it looks like the genetic code is a LISP dialect.

Operon structure of genetic code:

LISP function structure:

Nested genes:

Nested LISP cons cells:

Correspondences between ID theory and mainstream theories

Per Gregory and Dr. Felsenstein’s request, here is an off the top of my head listing of the major theories I can think of that are related to Dembski’s ID theory. There are many more connections I see to mainstream theory, but these are the most easily connected. I won’t provide links, at least in this draft, but the terms are easily googleable. I also may update this article as I think about it more.

First, fundamental elements of Dembski’s ID theory:

  1. We can distinguish intelligent design from chance and necessity with complex specified information (CSI).
  2. Chance and necessity cannot generate CSI due to the conservation of information (COI).
  3. Intelligent agency can generate CSI.

Things like CSI:

  • Randomness deficiency
  • Martin Löf test for randomness
  • Shannon mutual information
  • Algorithmic mutual information

Conservation of information theorems that apply to the previous list:

  • Data processing inequality (chance)
  • Chaitin’s incompleteness theorem (necessity)
  • Levin’s law of independence conservation (both chance and necessity addressed)

Theories of things that can violate the previous COI theorems:

  • Libertarian free will
  • Halting oracles
  • Teleological causation

Mutual Algorithmic Information, Information Non-growth, and Allele Frequency

Due to popular demand I will take a quick stab at explaining the applicability of mutual algorithmic information and the information non-growth law to an allele frequency scenario.

First, I’ll outline the allele frequency scenario.

The alleles are 1s and 0s, and the gene G a bitstring of N bits.  A gene’s fitness is based on how many 1s it has, so fitness(G) = sum(G).  The population consists of a single gene, and evolution proceeds by randomly flipping one bit, and if fitness is improved, it keeps that gene, otherwise it keeps the original.  Once fitness(G) = N, the evolutionary algorithm stops and outputs G, which consists of N 1s.  The bitstring that is N 1s will be denoted Y.  We will denote the evolutionary algorithm E, and it is prefixed on an input bitstring X of length N that will be turned into the bitstring of N 1s, so executing the pair on a universal Turing machine outputs the bitstring of 1s: U(E,X) = Y.

Second, I’ll briefly state the required background knowledge on algorithmic mutual information.

Continue reading

Breaking the law of information non growth

Part of my broader point that Dembski’s argument is support by standard information theory.

The article is pretty technical, though not as technical as the paper by Leonid Levin I got the ideas from.  If you skim it, you can get the basic idea.


First, I prove that randomness and computation cannot create mutual information with an independent target, essentially a dumbed down version of Levin’s proof.  Dembski’s CSI is a form of mutual information, and this proof is a more limited version of Dembski’s conservation of information.

Next, I prove that a halting oracle (intelligent agent) can violate the conservation of information and create information.

Finally, I show that there is a great deal of mutual information between the universe and mathematics.  Since mathematics is an independent target, then by the conservation of information there should be zero mutual information.  Therefore, the universe must have been created by something like a halting oracle.

I cannot promise a response to everyone’s comments, but the more technical and focussed, the more likely I will respond.