Winston Ewert, William Dembski, and Robert J. Marks II have made available a paper titled Climbing the Steiner Tree — Sources of Active Information in a Genetic Algorithm for Solving the Euclidean Steiner Tree Problem wherein they claim to have identified sources of “active information” in a genetic algorithm used to find solutions to a Steiner problem. The problem referenced in this paper was originally described by Dave Thomas on the Panda’s Thumb almost six years ago. I developed a GA to solve the problem that Dave posed as a challenge a few weeks later, as did a number of other people.
This paper suffers from numerous flaws, starting with a fundamental mischaracterization of the purpose of Thomas’s solution and including general misunderstanding of genetic algorithms, misapplication of the No Free Lunch theorems, spurious claims about “active information”, and incorrect and unsupported assertions regarding the impact of certain GA implementation details.
Right from the abstract, this paper mischaracterizes Thomas’s intentions:
In particular, the success of such search algorithms is said to show that intelligent design has no scientific value.
Thomas never claimed that GAs showed that ID has no scientific value. His goal was simply to demonstrate the fact that GAs don’t need explicit targets.
(As an aside, it is true that ID has no scientific value but that is because ID proponents explain no empirical observations, articulate no scientific theory, and make no testable predictions. ID is simply a vacuous argument from incredulity.)
The authors of this paper do appear to grudgingly recognize that typical GAs are not like Richard Dawkins’ well known Weasel program:
Nevertheless, many simulations do not contain a fully articulated solution hidden behind lines of code that can be readily reconstructed by inspecting the program without running it.
Speaking of weasels, this phrasing does have the mustelid nature. The authors could have been more forthright and simply noted that most GAs specify only the problem and a fitness function of some sort. The solution is completely unknown to the programmer in most cases.
Even this rhetorical squirming, though, is insufficient for the authors who add a caveat immediately:
Rather, its code is front-loaded with specialized information for how to find that solution. Instead of the explicit answer, the means by which the eventual product can be found is front-loaded.
The authors are inadvertently correct here. The evolutionary mechanisms being modeled by the GA (“the means by which the eventual product can be found”) are indeed useful in finding solutions to some problems.
Properties of Genetic Algorithms
There are several statements in the paper that suggests the authors have some fundamental misunderstandings of genetic algorithms and the observed phenomena on which they are based. As early as page 2, for example, is this:
The Darwinist claim is that no such assistance is required. Rather, natural selection is innately capable of solving any biological problem that it faces.
No “Darwinist” (a term that reflects ID’s creationist roots) claims this. Extinction is known to happen.
Another, from page 4:
This genome is of a fixed length, always containing the coordinates and connections for the maximum number of interchanges. This simplifies the genetic algorithm because it not [sic] necessary to implement deletion or insertion mutations.
While the design of a GA engine is somewhat simplified by limiting genomes to a fixed length, such a constraint doesn’t prevent modeling insertions and deletions. Further, the lack of insertion and deletion mutations actually makes finding a solution more difficult by limiting the number of evolutionary mechanisms that can be brought to bear.
On page 5 we find:
Does the genetic algorithm have the ability to solve the problem given only a description of the same?
This question demonstrates a profound confusion about what is being modeled. The authors almost seem to be suggesting that they expect a GA to find a solution without any feedback whatsoever. That would certainly fail to reflect the biological situation.
In fact, the fitness function is a description of the problem domain, corresponding to the environment of biological evolution. In that sense a GA does have the ability to solve some types of problems “given only a description of the same.”
No Free Lunch Theorems
No paper authored by Dembski would be complete without some mention, however gratuitous, of the No Free Lunch theorems:
Conservation of information theorems, such as the No Free Lunch Theorems, place limitations on the abilities of search algorithms. Any search algorithm uses resources to try to find its target. Different algorithms exist, but as long as they avoid searching the same place multiple times, given the same resources they will have the same average performance. As a result, excluding algorithms that visit the same place multiple times, no algorithm can be claimed to be superior to any other algorithm. At best, it can be superior over some subset of all fitness functions.
First, the NFL theorems have nothing to do with “conservation of information.” Dembski’s purported “Law of Conservation of Information” has been demonstrated to be bogus.
Second, this is a typical gross misapplication of the NFL theorems by Dembski et al. The NFL theorems say that averaged over all fitness landscapes, no search algorithm is superior to random search. Over a specific fitness landscape, however, some algorithms may well be significantly better than others. Given that GAs, like the natural systems on which they are based, operate within a single fitness landscape (albeit one that is constantly changing, particularly in the real world case), the NFL theorems are simply a red herring in this discussion.
The authors define “active information” as:
I+ = -log2(p/q)
where p is probability of a “baseline” search finding a solution and q is the probability of the GA finding a solution.
A baseline search is typically taken to be a single random guess at the answer.
“Active information” is thus simply a comparison of the efficacy of a particular search algorithm to that of a random search using an equiprobable distribution. The term is misleading in that it doesn’t correspond to any commonly used definitions of “information” nor does it demonstrate the “active” involvement of any external actor.
In order to determine how Thomas’s GA compares to a random search algorithm, the authors ran a relatively large number of tests:
In what follows we will be comparing Thomas’s genetic algorithm to that of a random query search. Data on the performance of the search algorithm was obtained by Monte Carlo simulation. In the case of genetic algorithms, we ran 5615 distinct simulations, keeping track of how many times a particular cost was found. In the case of random query searches, we ran 11,228,106,000 random queries to determine the empirical probability distribution for the cost of a random solution. The actual performance of a random query search was calculated by using probability theory and this empirical distribution.
“Probability theory and this empirical distribution”? With over eleven billion data points, why did the authors not simply use the observed values?
The results reported on page 5 of the paper are surprising:
Figure 4 shows the performance of the genetic algorithm presented by Thomas as compared to a random query algorithm. For Thomas’s algorithm, each generation is run for 1000 generations giving a total of 2,000,000 queries or cost evaluations performed. That is, the algorithm considers two million distinct solutions and measures the cost of each solution. At the end of that process the best solution found during those queries is produced. The random query algorithm makes 2,000,000 random queries and reports the best of all of those queries. Figure 4 shows that obtaining a low cost solution to the Steiner tree problem is not as hard as it would at first appear.
The random query algorithm produces a cost of 1246 with probability slightly lower than 0.7.
The authors seem to be claiming that two million random searches will yield a reasonable solution to the Steiner problem nearly 70% of the time. That’s hard to reconcile with further details provided on the very next page:
Baseline for good and adequate thresholds.
We empirically estimated the probability of both the good and adequate thresholds by running Monte Carlo experiments that tested random solutions. 11,228,106,000 random queries were performed to obtain an empirical estimation of the probability of finding the target. The probabilities for finding the good and adequate targets were 1.7812e-09 and 1.9905e-07, respectively. The empirically determined probability for finding the optimal target was zero, because none of the random queries was successful in finding the target.
The 70% chance of success of the random search algorithm seems extremely high. The authors provide no details of their calculations or code, so I have attempted to replicate their findings based on the description provided and my own GA solution to the Steiner problem.
My Steiner solution is based on my GA engine that operates on binary strings. The number of bits required by the genome for the five node Steiner problem is 199. I use a tournement selection mechanism to populate the next generation:
- Select two genomes at random, making the most fit parent one
- Select two more genomes at random, making the most fit parent two
- Mate the two parents by crossing over at a single point to
generate two offspring
- Mutate each of the offspring
My simulation routinely finds a near optimal solution with a fitness cost of 1220 or less using a population size of 1000 within 250 generations.
Each generation of my simulation therefore requires 4000 fitness evaluations, or one million in total for a typical run. Based on the results reported in this paper, I should find that nearly 70% of the time one million random genomes should contain at least one fairly good solution.
Here’s the code to create a million random genomes and return the most fit, using my implementation:
(let* ((genome-length (genome-length *five-point-problem*)) (best-fitness (fitness *five-point-problem* (make-genome genome-length)))) (format t "~A~%" (dotimes (i 10000 best-fitness) (let ((random-fitness (fitness *five-point-problem* (make-genome genome-length)))) (when (< random-fitness best-fitness) (setf best-fitness random-fitness))))))
My simulation adds a penalty of 2000 for solutions that don’t connect all fixed nodes. The results of running this forty times are:
4500.08 4394.203 4466.51 4707.275 4430.624 4690.22 4847.232 4833.6436 4889.9487 5147.8467 4929.9795 4677.0977 5248.128 4940.031 4542.566 4555.242 4993.9985 4707.127 4450.148 4792.4023 4647.6304 4800.055 5224.4985 4616.3096 5000.876 4788.144 4961.3896 4383.1436 4888.9614 4577.821 3791.8147 4739.0806 4846.8315 4572.504 4951.1255 4481.5996 4690.18 4426.3213 5097.7544 4948.6284
The best fitness found was 3791.8147, not remotely close to the optimum of 1212 or the 1220 routinely generated by the GA.
The authors of the paper do not provide a description of the algorithm they used or the code, so it is not possible to determine how to replicate their results. As demonstrated here, though, their claim that a collection of random strings, similar in size to the number of fitness calculations used by a GA, has a 70% likelihood of containing a near optiomal solution is not supported.
On page 6 of the paper, the authors plug this dubious value into some poorly explained forumlae and claim to thereby have derived “the integral cost of the solution produced by a search algorithm making 2,000,000 random queries.” This section could use considerably more justification, to say the least.
Ultimately, aside from demonstrating a distinct lack of rigor on the part of the authors, this topic is as much of a red herring as the No Free Lunch theorems. At most, when calculated carefully, “active information” is nothing more than an indication of how a particular algorithm performs relative to an equiprobable random search in a particular fitness landscape. Despite the misleading name applied to this concept, it does not indicate that any information was provided to the non-random algorithm, actively or otherwise.
Beginning on page 7 the authors discuss implementation details that, they claim, provide information about the solution to Thomas’s GA.
Interestingly, this is very similar to an approach that Dembski used unsuccessfully a few years ago. When discussing Thomas Schneider’s ev simulation, Dembski noticed an implementation detail related to tie breaking. He claimed that:
Schneider is here fine-tuning his evolutionary algorithm to obtain the results he wants. All such fine-tuning amounts to investigator interference smuggling in complex specified information.
Schneider put Dembski’s claim to the test. The result was “For the condition where both organisms survive a tie, the ev program still produces the same graph as previously published, given the same parameters, demonstrating that the code was not significantly altered.”
One might think that Dembski would have learned from that experience to test his claims before making them public. One would be incorrect.
“Effect of count of interchanges”
The authors claim that having an explicit count of the number of non-fixed nodes in the virtual genome “shifts the distribution of solutions.” They also note that Thomas’s code ensures that a minimum number of non-fixed nodes (“interchanges”) are present.
My implementation contains no such explicit count nor does it specify any minimum number of non-fixed nodes, yet it routinely converges on an optimal or near-optimal solution.
“Effect of restricted initialization”
According to the authors, Thomas’s code restricts the initial location of non-fixed nodes to a smaller area than the full 1000×1000 grid allowed by the simulation. Again, my implementation has no such restriction and yet it routinely converges on an optimal or near-optimal solution.
“Effect of selection skew”
The authors begin this subsection with:
As noted, it is easy to obtain a solution with a score of 1246.
In fact, as demonstrated above, the claim that near-optimal genomes are common enough to be frequently present in collections of millions of random genomes is false.
The authors go on to discuss a “skew parameter” used in Thomas’s code. They make the claim, without any logical or empirical support, that:
Without the skew parameter there isn’t enough selective pressure to effectively separate scores of close value.
Not only is this unsupported, it is demonstrably false since my implementation contains no such parameter and yet readily distinguishes between “scores of close value.” This is easily seen when monitoring its progress generation by generation.
“Effects of crossover”
In this section the authors go to some effort to score an own goal. Rather than limiting themselves to the single point crossover implemented by Thomas, they also discuss two point crossover and, bizarrely, what they call “uniform” crossover. This variant, which consists of randomly picking an element from each parent string in order to generate a child string, would not be recognized as crossover by anyone familiar with GAs.
Be that as it may, they then provide a table of crossover related “active information” in Table 3. Precious little detail is provided on how these values were calculated.
The own goal is apparent when one steps back from the detailed, albeit odd, discussion and notes that crossover is observed in nature. The authors are essentially admitting that known evolutionary mechanisms can create “active information.”
On page 10 the authors raise their only pertinent point in the entire paper:
Crossover works well because of the way the genome is structured. The genome is built using a structure consisting of a list of locations as well as a connection map indicating which cities and interchanges are connected. However, the ways in which the genome could be structured are limited only by the developer’s creativity. A number of other possibilities also exist:
- Every city and interchange could have a list of other
cities/interchanges to which it is connected.
- The genome could be a sequence of road segments specified by start and stop points
- The genome could have been represented as a bitmap where each bit indicates whether or not an integral coordinate was on a road.
None of these methods were chosen, and none of them would seem to be useful representations. But they are rejected, not because something is intrinsically wrong with them, but because the developer uses his knowledge of the problem to come up with the best way to represent the solution. The programmer chooses the representation that he believes will be effective and part of that is choosing a representation that works well with crossover.
As with their other claims, the authors provide no suggestion that they implemented a fitness function using any alternative representation. In fact, I suspect that a GA could successfully operate on any of the three alternatives. Despite the authors’ incorrect reasoning, though, this is an interesting point.
When I implemented my solution to the Steiner problem I wasn’t thinking in terms of a fitness function that could easily be traversed by point mutations and crossover. I simply wrote down the first binary format that came to mind — my primary concern was my own ability to decipher it. That aside, the authors make a claim that is almost correct:
The crossover and genome structure work together. As such, their selection by the developer constitutes the use of prior knowledge about the problem in order to help identify better solutions.
Even if the fitness landscape were deliberately chosen to be easily traversed by the subset of known evolutionary mechanisms modeled by the GA, this would not be a way of smuggling in information about any possible solution. No additional information about the problem or the particular solution would be provided. Such a deliberate choice would simply be modeling a landscape that, like what we observe in nature, has characteristics that make known evolutionary mechanisms effective search techniques.
When modeling evolution as a search (which has its own issues outside the scope of this discussion), what the No Free Lunch theorems tell us, contra Dembski, is that the evolutionary mechanisms we implement will be far more effective at finding a solution in some types of fitness landscapes than in others. Because we observe these mechanisms working in the real world, it is a reasonable inference that the incredibly complex and ever changing fitness landscape we call reality is one of those in which those mechanisms are efficacious. If this were not the case, we would either observe other mechanisms or would not be here to observe at all.
Despite themselves, the authors are recognizing that even a small subset of the mechanisms we observe in the real world is more than capable of generating solutions that are highly fit in certain landscapes.
The authors completely fail to support their conclusion:
The claim is not that the solution found by a genetic algorithm is hidden inside the algorithm, but rather that the algorithm contains the necessary information sources in order to find a low probability target. This is because in order to develop a successful genetic algorithm, the developer makes many decisions based on knowledge of the problem he is attempting to solve. As such, information is derived from this prior knowledge of the search problem.
Every implementation detail alleged by the authors to smuggle in information about possible solutions is trivially shown to be unnecessary to the success of the GA. Whether they understand it themselves or not, all the authors have shown is that known evolutionary mechanisms are quite capable of generating solutions to problems that are not amenable to random search.
It is no surprise that this paper was published in BIO-Complexity, the Intelligent Design vanity website masquerading as a journal. Given the overall poor quality, particularly with respect to the authors’ failure to provide sufficient information to replicate their results and their further failure to subject any of their claims to testing, it is unlikely in the extreme that it would pass peer review in an actual scientific journal.