Review: Climbing the Steiner Tree

Posted on April 23, 2012 by Patrick

Overview

Winston Ewert, William Dembski, and Robert J. Marks II have made available a paper titled Climbing the Steiner Tree — Sources of Active Information in a Genetic Algorithm for Solving the Euclidean Steiner Tree Problem wherein they claim to have identified sources of “active information” in a genetic algorithm used to find solutions to a Steiner problem. The problem referenced in this paper was originally described by Dave Thomas on the Panda’s Thumb almost six years ago. I developed a GA to solve the problem that Dave posed as a challenge a few weeks later, as did a number of other people.

This paper suffers from numerous flaws, starting with a fundamental mischaracterization of the purpose of Thomas’s solution and including general misunderstanding of genetic algorithms, misapplication of the No Free Lunch theorems, spurious claims about “active information”, and incorrect and unsupported assertions regarding the impact of certain GA implementation details.

Targets

Right from the abstract, this paper mischaracterizes Thomas’s intentions:

In particular, the success of such search algorithms is said to show that intelligent design has no scientific value.

Thomas never claimed that GAs showed that ID has no scientific value. His goal was simply to demonstrate the fact that GAs don’t need explicit targets.

(As an aside, it is true that ID has no scientific value but that is because ID proponents explain no empirical observations, articulate no scientific theory, and make no testable predictions. ID is simply a vacuous argument from incredulity.)

The authors of this paper do appear to grudgingly recognize that typical GAs are not like Richard Dawkins’ well known Weasel program:

Nevertheless, many simulations do not contain a fully articulated solution hidden behind lines of code that can be readily reconstructed by inspecting the program without running it.

Speaking of weasels, this phrasing does have the mustelid nature. The authors could have been more forthright and simply noted that most GAs specify only the problem and a fitness function of some sort. The solution is completely unknown to the programmer in most cases.

Even this rhetorical squirming, though, is insufficient for the authors who add a caveat immediately:

Rather, its code is front-loaded with specialized information for how to find that solution. Instead of the explicit answer, the means by which the eventual product can be found is front-loaded.

The authors are inadvertently correct here. The evolutionary mechanisms being modeled by the GA (“the means by which the eventual product can be found”) are indeed useful in finding solutions to some problems.

Properties of Genetic Algorithms

There are several statements in the paper that suggests the authors have some fundamental misunderstandings of genetic algorithms and the observed phenomena on which they are based. As early as page 2, for example, is this:

The Darwinist claim is that no such assistance is required. Rather, natural selection is innately capable of solving any biological problem that it faces.

No “Darwinist” (a term that reflects ID’s creationist roots) claims this. Extinction is known to happen.

Another, from page 4:

This genome is of a fixed length, always containing the coordinates and connections for the maximum number of interchanges. This simplifies the genetic algorithm because it not [sic] necessary to implement deletion or insertion mutations.

While the design of a GA engine is somewhat simplified by limiting genomes to a fixed length, such a constraint doesn’t prevent modeling insertions and deletions. Further, the lack of insertion and deletion mutations actually makes finding a solution more difficult by limiting the number of evolutionary mechanisms that can be brought to bear.

On page 5 we find:

Does the genetic algorithm have the ability to solve the problem given only a description of the same?

This question demonstrates a profound confusion about what is being modeled. The authors almost seem to be suggesting that they expect a GA to find a solution without any feedback whatsoever. That would certainly fail to reflect the biological situation.

In fact, the fitness function is a description of the problem domain, corresponding to the environment of biological evolution. In that sense a GA does have the ability to solve some types of problems “given only a description of the same.”

No Free Lunch Theorems

No paper authored by Dembski would be complete without some mention, however gratuitous, of the No Free Lunch theorems:

Conservation of information theorems, such as the No Free Lunch Theorems, place limitations on the abilities of search algorithms. Any search algorithm uses resources to try to find its target. Different algorithms exist, but as long as they avoid searching the same place multiple times, given the same resources they will have the same average performance. As a result, excluding algorithms that visit the same place multiple times, no algorithm can be claimed to be superior to any other algorithm. At best, it can be superior over some subset of all fitness functions.

First, the NFL theorems have nothing to do with “conservation of information.” Dembski’s purported “Law of Conservation of Information” has been demonstrated to be bogus.

Second, this is a typical gross misapplication of the NFL theorems by Dembski et al. The NFL theorems say that averaged over all fitness landscapes, no search algorithm is superior to random search. Over a specific fitness landscape, however, some algorithms may well be significantly better than others. Given that GAs, like the natural systems on which they are based, operate within a single fitness landscape (albeit one that is constantly changing, particularly in the real world case), the NFL theorems are simply a red herring in this discussion.

Active Information

The authors define “active information” as:

I₊ = -log₂(p/q)

where p is probability of a “baseline” search finding a solution and q is the probability of the GA finding a solution.

A baseline search is typically taken to be a single random guess at the answer.

“Active information” is thus simply a comparison of the efficacy of a particular search algorithm to that of a random search using an equiprobable distribution. The term is misleading in that it doesn’t correspond to any commonly used definitions of “information” nor does it demonstrate the “active” involvement of any external actor.

In order to determine how Thomas’s GA compares to a random search algorithm, the authors ran a relatively large number of tests:

In what follows we will be comparing Thomas’s genetic algorithm to that of a random query search. Data on the performance of the search algorithm was obtained by Monte Carlo simulation. In the case of genetic algorithms, we ran 5615 distinct simulations, keeping track of how many times a particular cost was found. In the case of random query searches, we ran 11,228,106,000 random queries to determine the empirical probability distribution for the cost of a random solution. The actual performance of a random query search was calculated by using probability theory and this empirical distribution.

“Probability theory and this empirical distribution”? With over eleven billion data points, why did the authors not simply use the observed values?

The results reported on page 5 of the paper are surprising:

Figure 4 shows the performance of the genetic algorithm presented by Thomas as compared to a random query algorithm. For Thomas’s algorithm, each generation is run for 1000 generations giving a total of 2,000,000 queries or cost evaluations performed. That is, the algorithm considers two million distinct solutions and measures the cost of each solution. At the end of that process the best solution found during those queries is produced. The random query algorithm makes 2,000,000 random queries and reports the best of all of those queries. Figure 4 shows that obtaining a low cost solution to the Steiner tree problem is not as hard as it would at first appear.

The random query algorithm produces a cost of 1246 with probability slightly lower than 0.7.

The authors seem to be claiming that two million random searches will yield a reasonable solution to the Steiner problem nearly 70% of the time. That’s hard to reconcile with further details provided on the very next page:

Baseline for good and adequate thresholds.
We empirically estimated the probability of both the good and adequate thresholds by running Monte Carlo experiments that tested random solutions. 11,228,106,000 random queries were performed to obtain an empirical estimation of the probability of finding the target. The probabilities for finding the good and adequate targets were 1.7812e-09 and 1.9905e-07, respectively. The empirically determined probability for finding the optimal target was zero, because none of the random queries was successful in finding the target.

The 70% chance of success of the random search algorithm seems extremely high. The authors provide no details of their calculations or code, so I have attempted to replicate their findings based on the description provided and my own GA solution to the Steiner problem.

My Steiner solution is based on my GA engine that operates on binary strings. The number of bits required by the genome for the five node Steiner problem is 199. I use a tournement selection mechanism to populate the next generation:

Select two genomes at random, making the most fit parent one
Select two more genomes at random, making the most fit parent two
Mate the two parents by crossing over at a single point to
generate two offspring
Mutate each of the offspring

My simulation routinely finds a near optimal solution with a fitness cost of 1220 or less using a population size of 1000 within 250 generations.

Each generation of my simulation therefore requires 4000 fitness evaluations, or one million in total for a typical run. Based on the results reported in this paper, I should find that nearly 70% of the time one million random genomes should contain at least one fairly good solution.

Here’s the code to create a million random genomes and return the most fit, using my implementation:

(let* ((genome-length (genome-length *five-point-problem*))
       (best-fitness
        (fitness *five-point-problem* (make-genome genome-length))))
  (format t "~A~%"
          (dotimes (i 10000 best-fitness)
            (let ((random-fitness
                   (fitness *five-point-problem* (make-genome genome-length))))
              (when (< random-fitness best-fitness)
                (setf best-fitness random-fitness))))))

My simulation adds a penalty of 2000 for solutions that don’t connect all fixed nodes. The results of running this forty times are:

4500.08   4394.203  4466.51   4707.275
4430.624  4690.22   4847.232  4833.6436
4889.9487 5147.8467 4929.9795 4677.0977
5248.128  4940.031  4542.566  4555.242
4993.9985 4707.127  4450.148  4792.4023
4647.6304 4800.055  5224.4985 4616.3096
5000.876  4788.144  4961.3896 4383.1436
4888.9614 4577.821  3791.8147 4739.0806
4846.8315 4572.504  4951.1255 4481.5996
4690.18   4426.3213 5097.7544 4948.6284

The best fitness found was 3791.8147, not remotely close to the optimum of 1212 or the 1220 routinely generated by the GA.

The authors of the paper do not provide a description of the algorithm they used or the code, so it is not possible to determine how to replicate their results. As demonstrated here, though, their claim that a collection of random strings, similar in size to the number of fitness calculations used by a GA, has a 70% likelihood of containing a near optiomal solution is not supported.

On page 6 of the paper, the authors plug this dubious value into some poorly explained forumlae and claim to thereby have derived “the integral cost of the solution produced by a search algorithm making 2,000,000 random queries.” This section could use considerably more justification, to say the least.

Ultimately, aside from demonstrating a distinct lack of rigor on the part of the authors, this topic is as much of a red herring as the No Free Lunch theorems. At most, when calculated carefully, “active information” is nothing more than an indication of how a particular algorithm performs relative to an equiprobable random search in a particular fitness landscape. Despite the misleading name applied to this concept, it does not indicate that any information was provided to the non-random algorithm, actively or otherwise.

Implementation Details

Beginning on page 7 the authors discuss implementation details that, they claim, provide information about the solution to Thomas’s GA.

Interestingly, this is very similar to an approach that Dembski used unsuccessfully a few years ago. When discussing Thomas Schneider’s ev simulation, Dembski noticed an implementation detail related to tie breaking. He claimed that:

Schneider is here fine-tuning his evolutionary algorithm to obtain the results he wants. All such fine-tuning amounts to investigator interference smuggling in complex specified information.

Schneider put Dembski’s claim to the test. The result was “For the condition where both organisms survive a tie, the ev program still produces the same graph as previously published, given the same parameters, demonstrating that the code was not significantly altered.”

One might think that Dembski would have learned from that experience to test his claims before making them public. One would be incorrect.

“Effect of count of interchanges”

The authors claim that having an explicit count of the number of non-fixed nodes in the virtual genome “shifts the distribution of solutions.” They also note that Thomas’s code ensures that a minimum number of non-fixed nodes (“interchanges”) are present.

My implementation contains no such explicit count nor does it specify any minimum number of non-fixed nodes, yet it routinely converges on an optimal or near-optimal solution.

“Effect of restricted initialization”

According to the authors, Thomas’s code restricts the initial location of non-fixed nodes to a smaller area than the full 1000×1000 grid allowed by the simulation. Again, my implementation has no such restriction and yet it routinely converges on an optimal or near-optimal solution.

“Effect of selection skew”

The authors begin this subsection with:

As noted, it is easy to obtain a solution with a score of 1246.

In fact, as demonstrated above, the claim that near-optimal genomes are common enough to be frequently present in collections of millions of random genomes is false.

The authors go on to discuss a “skew parameter” used in Thomas’s code. They make the claim, without any logical or empirical support, that:

Without the skew parameter there isn’t enough selective pressure to effectively separate scores of close value.

Not only is this unsupported, it is demonstrably false since my implementation contains no such parameter and yet readily distinguishes between “scores of close value.” This is easily seen when monitoring its progress generation by generation.

“Effects of crossover”

In this section the authors go to some effort to score an own goal. Rather than limiting themselves to the single point crossover implemented by Thomas, they also discuss two point crossover and, bizarrely, what they call “uniform” crossover. This variant, which consists of randomly picking an element from each parent string in order to generate a child string, would not be recognized as crossover by anyone familiar with GAs.

Be that as it may, they then provide a table of crossover related “active information” in Table 3. Precious little detail is provided on how these values were calculated.

The own goal is apparent when one steps back from the detailed, albeit odd, discussion and notes that crossover is observed in nature. The authors are essentially admitting that known evolutionary mechanisms can create “active information.”

Alternative Representations

On page 10 the authors raise their only pertinent point in the entire paper:

Crossover works well because of the way the genome is structured. The genome is built using a structure consisting of a list of locations as well as a connection map indicating which cities and interchanges are connected. However, the ways in which the genome could be structured are limited only by the developer’s creativity. A number of other possibilities also exist:

Every city and interchange could have a list of other
cities/interchanges to which it is connected.

The genome could be a sequence of road segments specified by start and stop points

The genome could have been represented as a bitmap where each bit indicates whether or not an integral coordinate was on a road.

None of these methods were chosen, and none of them would seem to be useful representations. But they are rejected, not because something is intrinsically wrong with them, but because the developer uses his knowledge of the problem to come up with the best way to represent the solution. The programmer chooses the representation that he believes will be effective and part of that is choosing a representation that works well with crossover.

As with their other claims, the authors provide no suggestion that they implemented a fitness function using any alternative representation. In fact, I suspect that a GA could successfully operate on any of the three alternatives. Despite the authors’ incorrect reasoning, though, this is an interesting point.

When I implemented my solution to the Steiner problem I wasn’t thinking in terms of a fitness function that could easily be traversed by point mutations and crossover. I simply wrote down the first binary format that came to mind — my primary concern was my own ability to decipher it. That aside, the authors make a claim that is almost correct:

The crossover and genome structure work together. As such, their selection by the developer constitutes the use of prior knowledge about the problem in order to help identify better solutions.

Even if the fitness landscape were deliberately chosen to be easily traversed by the subset of known evolutionary mechanisms modeled by the GA, this would not be a way of smuggling in information about any possible solution. No additional information about the problem or the particular solution would be provided. Such a deliberate choice would simply be modeling a landscape that, like what we observe in nature, has characteristics that make known evolutionary mechanisms effective search techniques.

When modeling evolution as a search (which has its own issues outside the scope of this discussion), what the No Free Lunch theorems tell us, contra Dembski, is that the evolutionary mechanisms we implement will be far more effective at finding a solution in some types of fitness landscapes than in others. Because we observe these mechanisms working in the real world, it is a reasonable inference that the incredibly complex and ever changing fitness landscape we call reality is one of those in which those mechanisms are efficacious. If this were not the case, we would either observe other mechanisms or would not be here to observe at all.

Despite themselves, the authors are recognizing that even a small subset of the mechanisms we observe in the real world is more than capable of generating solutions that are highly fit in certain landscapes.

Summary

The authors completely fail to support their conclusion:

The claim is not that the solution found by a genetic algorithm is hidden inside the algorithm, but rather that the algorithm contains the necessary information sources in order to find a low probability target. This is because in order to develop a successful genetic algorithm, the developer makes many decisions based on knowledge of the problem he is attempting to solve. As such, information is derived from this prior knowledge of the search problem.

Every implementation detail alleged by the authors to smuggle in information about possible solutions is trivially shown to be unnecessary to the success of the GA. Whether they understand it themselves or not, all the authors have shown is that known evolutionary mechanisms are quite capable of generating solutions to problems that are not amenable to random search.

It is no surprise that this paper was published in BIO-Complexity, the Intelligent Design vanity website masquerading as a journal. Given the overall poor quality, particularly with respect to the authors’ failure to provide sufficient information to replicate their results and their further failure to subject any of their claims to testing, it is unlikely in the extreme that it would pass peer review in an actual scientific journal.

118 thoughts on “Review: Climbing the Steiner Tree”

Rich on April 23, 2012 at 8:41 pm said:

Great take-down.
Joe Felsenstein on April 23, 2012 at 8:50 pm said:

In general, genetic algorithms work well unless the fitness surface is very rough and “jaggy”. Dembski and Marks’s “active information” is basically a measure of how smooth the fitness surface is when neighbors of a genotype are defined as those that can be reached by a single mutation.

It is thus no big surprise that unless there is some of this “active information” genetic algorithms will not work well. They intend this to be part of an argument that a Designer put the “active information” there. But (as I and others have noted), the laws of physics seem to mandate a certain amount of smoothness of fitness surfaces. A mutation that affects one part of the organism often has little effect on other parts, and that helps keep the fitness surface smooth. A “white noise” fitness surface would be a big problem for natural selection, but it would require everything to interact strongly with everything else, and physics doesn’t work that way. My typing these words has not caused your roof to leak.
RBH on April 23, 2012 at 9:29 pm said:

Joe wrote “But (as I and others have noted), the laws of physics seem to mandate a certain amount of smoothness of fitness surfaces.”

Yup. The universe is full of (local) gradients.
Elizabeth on April 23, 2012 at 9:29 pm said:

Dembski’s argument simply has no merit. Part of me thinks that he must realise this.

Irreducible Complexity (or, at least, the idea of unevolvable systems) could have some force, given the right data, but Specified Complexity is flawed at its heart.
Patrick on April 23, 2012 at 9:42 pm said:

Joe Felsenstein:
Dembski and Marks’s “active information” is basically a measure of how smooth the fitness surface is when neighbors of a genotype are defined as those that can be reached by a single mutation.

That’s a very clean way of describing it. Nothing active, no information involved, and the more different types of mutations available, the more jagged a fitness landscape can be traversed.
Mike Elzinga on April 23, 2012 at 9:46 pm said:

Interestingly, even simple things such as the various allotropes of carbon are “solutions to a problem,” namely, of finding a local minimum in potential energy of a configuration of condensing atoms or molecules consistent with temperature and pressure in the environment of these carbon atoms.

That these various “solutions” appear at various temperatures and pressures with varying degrees of probability is a common underlying feature of all condensing matter.

I have often suggested that ID/creationists make the egregious errors they do about biological systems because they don’t understand even the most basic high school level physics and chemistry. The misconceptions that Dembski et. al. continue to repeat are all founded in the notion that nothing interacts; that all atoms and molecules just sit around waiting for some intelligent entity to select them and place them in highly improbable arrangements.

There are always matters of physics and chemistry that contribute to the “solutions” found in biological systems undergoing natural selection. And the “analog computer” soap bubble solutions to the Steiner Problem rely on the minimizing of the potential energy contained in the tensions in the spanning soap bubble surfaces.

Soft matter systems, such as those making up biological organisms, exist within a very narrow energy range in which the thermal kinetic energies are comparable to the binding energies of many parts of these systems. This is what allows the systems to respond to changes in the average background forces of the total environment in which the system is immersed. If binding energies were too large relative to the thermal kinetic energies, nothing could change. If the thermal kinetic energies become larger than the binding energies, the systems come apart.

The result of all this is that complex biological systems get to explore all sorts of nearby “solutions” as biasing pressures are placed on them by their environments. Just look at the billions life forms that exist or have existed on this planet and you see that lots of “solutions” have been found (and discarded). Whether or not any of these “solutions” will be repeated in the future may be unlikely, but the fact that “solutions” exist in such abundance means that there is almost always a nearby niche in which something can survive for a while.
petrushka on April 23, 2012 at 9:48 pm said:

Behe’s argument seems you can’t get from AB to XY if AX and BY are fatal. Ignoring the possibility of passing through AC to XC to XY. In the simplest case.
Elizabeth on April 23, 2012 at 9:50 pm said:

Patrick: That’s a very clean way of describing it.Nothing active, no information involved, and the more different types of mutations available, the more jagged a fitness landscape can be traversed.

Or the smoother the fitness landscape becomes 🙂

In other words, isn’t the fitness landscape topology defined in part by the types of mutations available?
Elizabeth on April 23, 2012 at 9:51 pm said:

Elizabeth: Or the smoother the fitness landscape becomes

In other words, isn’t the fitness landscape topography defined in part by the types of mutations available?
Patrick on April 23, 2012 at 9:52 pm said:

Elizabeth: Or the smoother the fitness landscape becomes

In other words, isn’t the fitness landscape topology defined in part by the types of mutations available?

I’m reminded of the optical illusion that sometimes looks like a vase and sometimes like two people facing each other. 😉
Mike Elzinga on April 23, 2012 at 10:11 pm said:

Elizabeth: Patrick: That’s a very clean way of describing it.Nothing active, no information involved, and the more different types of mutations available, the more jagged a fitness landscape can be traversed.
Or the smoother the fitness landscape becomes
In other words, isn’t the fitness landscape topology defined in part by the types of mutations available?

You hit the nail directly on the head.

There is even a well-defined mathematical description of sampling a complex landscape, and it comes from the Fourier analysis of signal and image processing. It is called “dithering.”

In the language of Fourier analysis, if a function that has a Fourier transform is shifted by a little bit x, the only change in its Fourier transform is to multiply it by a phase factor exp(ikx) where k is called a spatial frequency. The shift, x, produces a much larger shift in phase for higher spatial frequencies.

If the shift, x, is a random variable, and the signal is summed, then the higher spatial frequencies are “washed out” because they tend to add more randomly than the lower spatial frequencies. This leaves the lower spatial frequencies pretty much intact, but the fine details in the image are blurred out.

This is often referred to as “poor man’s low-pass filtering” in that it retains the gross, overall features of the image but smoothes out the fine details. It is often used to remove noise from a signal or image.

This applies directly to the sampling of a landscape by random variations of an organism.
Mike Elzinga on April 23, 2012 at 10:21 pm said:

Here is a little example of the “poor man’s low pass filtering” one can easily demonstrate. Look through a hedge at a distant object on the other side of the hedge. It is hard to make out the object.

But if you move your eye along the line of the hedge while peering through it to the distant object, you see the object more clearly. It is the persistence of the distant image on the retina of the eye that makes this Fourier analysis apply. Another equivalent analysis uses what is called the convolution of the moving images of the hedge leaves and branches relative to the much slower movement of the image of the distant object.

I use this technique when looking for oncoming cars as I approach an intersection with hedges blocking the view. I roll slowly by the hedges as I stare through them at the cross street.
Joe Felsenstein on April 23, 2012 at 10:41 pm said:

Elizabeth:
Dembski’s argument simply has no merit.Part of me thinks that he must realise this.

Irreducible Complexity(or, at least, the idea of unevolvable systems) could have some force, given the right data, but Specified Complexity is flawed at its heart.

I hope we can discuss this in a separate thread. You were at one point reaqding your way through Dembski’s No Free Lunch chapter by chapter. When you reach the one about why natural selection cannot put Complex Specified Information into the genome, I’d be pleased to hear your thoughts on his argument. My own take, as I have written elsewhere, is that for simple model evolutionary systems, Complex Specified Information can be defined and is thus at least partly meaningful. However the Law of Conservation of Complex Specified Information (LCCSI) is both unproven, and in the form he presents it, not able to do the job of showing that natural selection cannot put CSI into the genome.

But let’s not discuss it here — it needs a separate thread.
Joe G on April 23, 2012 at 10:45 pm said:

Yes, it is true the GA does not need a specific target. The GA was written to do something. So when it does it, it does it by design.

In “Evolving Inventions” there wasn’t any pre-specified target- meaning no one knew what the GA generated circuits would look like. All they knew is what the circuits had to do.

So again the circuits arose by design.

Another problem is there isn’t any “selection” in nature. Natural slection is a result. There isn’t a goal, just survival- whatever is good enough survives.
OMTWO on April 23, 2012 at 10:58 pm said:

Joe G:
Yes, it is true the GA does not need a specific target. The GA was written to do something. So when it does it, it does it by design.

In “Evolving Inventions” there wasn’t any pre-specified target- meaning no one knew what the GA generated circuits would look like. All they knew is what the circuits had to do.

So again the circuits arose by design.

Another problem is there isn’t any “selection” in nature. Natural slection is a result. There isn’t a goal, just survival- whatever is good enough survives.

blowing my totem
I inflated my crayons
before my money
Joe G on April 23, 2012 at 11:11 pm said:

Scratching your scrotum
You blew your crayon
and your pimp beat you
OMTWO on April 23, 2012 at 11:13 pm said:

Joe G: Scratching your scrotum
You blew your crayon
and your pimp beat you

Oh the breaking taught!
fie! drat! blowing playing suit.
Suzuki cheats soon
Joe G on April 23, 2012 at 11:16 pm said:

Seems to me there is a thread about natural selection in the cue…
rhampton7 on April 23, 2012 at 11:23 pm said:

Patrick, you should send a more formal version of this to BIO-Complexity – perhaps they would publish it in a letters section.
Joe G on April 23, 2012 at 11:28 pm said:

I posted a link to this over on UD…
Patrick on April 24, 2012 at 12:11 am said:

rhampton7:
Patrick, you should send a more formal version of this to BIO-Complexity – perhaps they would publish it in a letters section.

That would require a significant snarkectomy. I do wonder if they’d publish such a thing, though.

Ideally one or more of the authors will come here to discuss the paper (and Dr. Felsenstein’s CSI related topics).
DaveThomas on April 24, 2012 at 1:12 am said:

Patrick,

Some confusion here. When EDM say “The random query algorithm produces a cost of 1246 with probability slightly lower than 0.7.”, they are referring to the simplest solution, 5/6 outer perimeter segments, no internal junctions, etc. Looks like a letter “C” for the 5-city problem. If you look at EDM Figure 4, THEY NEVER ACHIEVE THE ACTUAL STEINER SOLUTION (1212 Length). More later, Dave
damitall2 on April 24, 2012 at 10:43 am said:

Joe G:
I posted a link to this over on UD…

Very kind of you.
No-one there interested in this subject dare come here, though.
Joe G on April 24, 2012 at 12:05 pm said:

Yes, they realize that when a programmer writes a program to do something, and it does it, then it did it by design, not via natural selection.

IOW they realize it is a waste of time trying to reason with people who say otherwise.
Patrick on April 24, 2012 at 12:16 pm said:

DaveThomas:
Patrick,

Some confusion here. When EDM say “The random query algorithm produces a cost of 1246 with probability slightly lower than 0.7.”, they are referring to the simplest solution, 5/6 outer perimeter segments, no internal junctions, etc.Looks like a letter “C” for the 5-city problem.If you look at EDM Figure 4, THEY NEVER ACHIEVE THE ACTUAL STEINER SOLUTION (1212 Length). More later, Dave

Their result with length 1246 is certainly nowhere near the 1212 actual solution or even the 1218 to 1220 my implementation routinely gets within 250 generations.

What I wanted to emphasize in my response, though, is the authors’ claim on page 5 that “The random query algorithm produces a cost of 1246 with probability slightly lower than 0.7.” I read that as saying that almost 70% of the time a set of two million random genomes will contain at least one with a fitness of 1246 or better. That’s certainly not the case with my implementation, as I demonstrated. Have you tried to independently confirm their results with your version?
Elizabeth on April 24, 2012 at 4:45 pm said:

Stickied.

Guys! Ignore the shiny thing!
Flint on April 24, 2012 at 5:31 pm said:

Joe G:
Yes, they realize that when a programmer writes a program to do something, and it does it, then it did it by design, not via natural selection.

IOW they realize it is a waste of time trying to reason with people who say otherwise.

So close! What they realize is the futility of making unsupported claims in an attempt to convert those whose contrary claims are already demonstrated, complete with source code!. They know they’ll be challenged to identify the answer hidden somewhere in the code, they know it’s not there, so they don’t bother to do more than stay in their ghetto making false claims and all agreeing not to test or challenge them.
Joe G on April 24, 2012 at 6:50 pm said:

LoL! YOU are one of the reasons they will stay away- no way to reason with that “logic”…
rhampton7 on April 24, 2012 at 8:12 pm said:

Even if the programmer uses randomness to produce the result?

Because that’s the argument the Catholic Church makes in regards to the flawed assumption behind Intelligent Design and Atheism — that is, random mutations must be purposeless and a force outside of God’s Providence. Instead, the Church teaches that the range of freedom granted by random processes has a God given purpose that, none the less, can not on its own alter the prescribed plan.

Given your hostility towards Evolutionism, it’s ironic that you believe randomness to be outside of God’s domain.
Joe Felsenstein on April 24, 2012 at 9:11 pm said:

Elizabeth, is there some reason that today this site is showing this thread (“Review: Climbing the Steiner Tree”) twice, once before and once after JoeG’s thread? The comments in those two nominally different threads seems to be the same..

I suspect some kind of misconfiguration or WordPress bug.
Rich on April 24, 2012 at 9:12 pm said:

Joe Felsenstein: Elizabeth, is there some reason that today this site is showing this thread (“Review: Climbing the Steiner Tree”) twice, once before and once after JoeG’s thread? The comments in those two nominally different threads seems to be the same..I suspect some kind of misconfiguration or WordPress bug.

I believe it has received a “sticky” to keep it at the top, as it is such a very good thread!
Joe Felsenstein on April 24, 2012 at 9:21 pm said:

Rich: I believe it has received a “sticky” to keep it at the top, as it is such a very good thread!

I think it is not only “sticky”, it has replicated, although without becoming genetically different.
petrushka on April 24, 2012 at 9:25 pm said:

Perhaps one copy will mutate.
Joe G on April 24, 2012 at 9:25 pm said:

Even if the programmer uses randomness to produce the result?

Yes, even though that is not what happens with a GA.
DaveThomas on April 25, 2012 at 7:32 am said:

Patrick,

Holy crap, I did some new runs, and those ID folks are doing some really strange stuff!

Here’s my result just now for the 5-node problem, 5 million random sequences:

And here are 2 million sequences at a pop, three times:

It is clearly floundering badly, not getting close to the optimum Steiner (1212), and not even to the “Round-the-track” C-shaped solution (1246).

I will have to check this out some more, Patrick! Those folks are trying to pull a fast one!
Cheers, Dave

P.S. I am using image tags. Not showing up in Preview. Here are the raw URLs to load the images at NMSR:
http://www.nmsr.org/random5M.jpg
http://www.nmsr.org/random2Ma.jpg
http://www.nmsr.org/random2Mb.jpg
http://www.nmsr.org/random2Mc.jpg
Creodont2 on April 25, 2012 at 11:26 am said:

Joe G:
LoL! YOU are one of the reasons they will stay away- no way to reason with that “logic”…

Yeah joe, the last thing IDiots want is to face their opponents out in the open. It must be a terribly empty feeling to have no evidence and no confidence in their position. It must also be a terribly empty feeling for you that they won’t come here and stand up for you and ID. You poor thing.
Joe G on April 25, 2012 at 11:55 am said:

LoL! Reality says that the last thing you evos want is to face ANY opponent out in the open. But that is because your position has absolutely nothing, and it shows.

Also, unlike you evoTARD cowards, I do not need anyone to stroke me/ stand up for me. And I feel bad for all IDists and creationists that do come here because they have to deal with irrational thugs like you.
Joe G on April 25, 2012 at 12:59 pm said:

Umm whoever wrote the program is the “ID folk”….
Patrick on April 25, 2012 at 1:30 pm said:

DaveThomas:
Patrick,

Holy crap, I did some new runs, and those ID folks are doing some really strange stuff!

Here’s my result just now for the 5-node problem, 5 million random sequences:

And here are 2 million sequences at a pop, three times:

It is clearly floundering badly, not getting close to the optimum Steiner (1212), and not even to the “Round-the-track” C-shaped solution (1246).

I will have to check this out some more, Patrick! Those folks are trying to pull a fast one!

It’s interesting to see that our results agree with each other and differ so significantly with those reported in the paper. I think I have Winston Ewert’s email around somewhere — if I can find it I’ll see if he’d be interested in joining the discussion here. Perhaps he’ll at least be willing to make his code available.

(I’m making the hopefully not unwarranted assumption that the graduate student on the authors’ list did most of the hard work.)
Joe G on April 25, 2012 at 1:37 pm said:

Your OP is full of nonsense, Patrick.

1- Evos coined the term “Darwinism” which includes “Darwinists”

2- Darwinists do make the claim that natural selection can solve just about any issue

3- There isn’t anything in
Does the genetic algorithm have the ability to solve the problem given only a description of the same? that implies there isn’t any feedback

4- The paper says, right in it, that a GA does need a specific target- page 2

5- You have no idea if insertions or deletions are blind and undirected

the list could go on but I am sure you will ignore it just as IDists should ignore your post.
Joe G on April 25, 2012 at 1:48 pm said:

The above quote shows that Thomas is under the misapprehension
that intelligent design advocates claim that the actual answer is encoded into the algorithm. This is not in fact what intelligent design advocates claim, as will be shown later. Rather, we say that success is due to prior knowledge being exploited to produce active information in the search algorithm. Although the code does not include the actual Steiner shape, it does include a tuned algorithm for how to find the Steiner shape.
Patrick on April 25, 2012 at 1:54 pm said:

Joe,

The rule of this venue is that we should assume that other participants are acting in good faith. Based on your behavior here, I cannot rationally make that assumption. Accordingly, in order to honor our gracious hostess’s wishes, I will not be replying to you on this or any other topic.

Sincerely,

Patrick
Patrick on April 25, 2012 at 1:58 pm said:

Joe,

The rule of this venue is that we should assume that other participants are acting in good faith. Based on your behavior here, I cannot rationally make that assumption. Accordingly, in order to honor our gracious hostess’s wishes, I will not be replying to you on this or any other topic.

Sincerely,

Patrick

(Lizzie — I tried to move this to Guano, but don’t seem to have the right privileges. My apologies for imposing on your time to do so.)
Joe G on April 25, 2012 at 3:05 pm said:

LoL! Patrick, YOU are not acting in good faith. Your OP contains lies and errors. But somehow that doesn’t mean anything to you.
Joe G on April 25, 2012 at 3:16 pm said:

As Patrick’s OP contains errors that are easily exposed- I have already exposed some- perhaps Patrick needs some of Elizabeth’s protection because his mother isn’t around to do so.
olegt on April 25, 2012 at 4:59 pm said:

Joe G: Yes, even though that is not what happens with a GA.

What? Genetic algorithms start with randomly generated genomes and add random variations to the genomes as the organisms reproduce.
stcordova on April 25, 2012 at 5:06 pm said:

Here’s the code to create a million random genomes and return the most fit, using my implementation:

Patrick,

You were intelligently designing the fitness function weren’t you?

Dave Thomas,

By the way, since I humored you on his earlier challenge, I have a chanllenge for you that you should accept in reciprocity. 🙂

I will donate $50 to the New Mexicans skeptic organization that you are a member of if you can write a GA which will converge on the complex specified informaiton string (a password if you will) that I have wrtitten explicitly on a sheet of paper. You have 2 months to solve it and publish your results. C'mon dave, I humored you, now it's your tun. You're officially challenged to describe it. You have no feed back mechanism to tell you if you're getting warm. But the$ 50 should help your organization reproduce more Darwinists, so according to Darwinist logic, since it will make your organization more fit financially, it ought to imply a GA will find the solution on behalf of your organization. (I’ve heard such perverse reasoning before from Darwinists, that something necessarily evolved because it will make it more fit!).

Think I’m being too hard. Numerous systems in a biological organism are like lock-and-key, password login systems. They are deeply coordinated. A random password is unlikely to hit a random login. There is no reason a GA would solve this unless it were trivial since there is no feedback mechanism and since password/login’s irreducibly complex.

And since the absence of some proteins, like insulin or other critical proteins, will result in death — one shot at a solution is like what it might be in real life biology. 🙂 So you have one shot at a solution.

I look forward to hearing from you. I humored you, now it’s your turn.
OMTWO on April 25, 2012 at 5:10 pm said:

stcordova: Numerous systems in a biological organism are like lock-and-key, password login systems.

For example?
OMTWO on April 25, 2012 at 5:12 pm said:

stcordova: You were intelligently designing the fitness function weren’t you?

What if the “code” to do that was the environment. Woudl the the fitness function still be intelligently designed?

You can demonstrate this easily. Simply take some bacteria and spread them evenly over a differentiated environment. Some will be less fit in the environment they find themselves in then others.

Where did the fitness function come from?
stcordova on April 25, 2012 at 6:01 pm said:

Patrick wrote:

. Rather, natural selection is innately capable of solving any biological problem that it faces.

No “Darwinist” (a term that reflects ID’s creationist roots) claims this.

So are you saying there ARE complex biological systesm which natural selection can’t create. 🙂

If you say, yes, then well, thanks for helping creationists.

If you say no, then, on what basis do you make that claim except by assumption.