Weasel Wars – Directed Evolution

Posted on February 27, 2016 by Mung

This is just an effort to help keep Joe’s thread focused and to help keep it from being derailed. People can use it or not. I hope they will.

CharlieM: Can someone explain to me, why is all of this not just a model of directed evolution? Surely it is set up to be directed towards a target?

Allan Miller: It gets towards the target by means of variation and selection of genotypes in the current population. The programmer does not direct it towards the target. Indeed, there would be no point in writing GAs for problem solution if it were simply a matter of specifying a target and directing the program to find it.

Consider a small modification – instead of distance from target, evaluate fitness by adding up the ASCII bits. Those with the greater sum are fitter than those with a lesser. There is no mention of a distant target – although it is clear that the program will converge on a string of all Zs, the program doesn’t know this. It is not drawn by, nor directed towards, that target. It is simply doing generational evaluation of fitter and less fit genotypes in the current population.

Discuss.

340 thoughts on “Weasel Wars – Directed Evolution”

Mung on February 27, 2016 at 7:44 pm said:

My view is that it is a model of directed evolution, but that is what it is supposed to be.
Flint on February 27, 2016 at 7:56 pm said:

Mung:
My view is that it is a model of directed evolution, but that is what it is supposed to be.

What do you mean by “it” — the original Dawkins program, or some other weasel program, or Allan’s ASCII example (which is ambiguous – what is an “ascii bit”?) or something else?

My understanding is that in practice, GAs are used to find good (not necessarily optimal) solutions to difficult problems of a certain sort. And the way this is done is to create an environment where better solutions are rewarded.

I’d guess this would be handy for np-complete problems.
Mung on February 27, 2016 at 7:56 pm said:

AM: It gets towards the target by means of variation and selection of genotypes in the current population.

ok. Does that mean it’s directed or it is not directed?

AM: The programmer does not direct it towards the target.

How do we know that? How do we judge whether or not the programmer directs it towards the target?

AM: Indeed, there would be no point in writing GAs for problem solution if it were simply a matter of specifying a target and directing the program to find it.

ok, but that doesn’t mean that’s not what happened in this specific case. That’s not an argument that this is not a model of directed evolution.

Consider a small modification – instead of distance from target, evaluate fitness by adding up the ASCII bits. Those with the greater sum are fitter than those with a lesser. There is no mention of a distant target – although it is clear that the program will converge on a string of all Zs, the program doesn’t know this. It is not drawn by, nor directed towards, that target.

The rather obvious conclusion is that there’s more than one way to specify a target. And again, you have not shown that in this case we are not dealing with a model of directed evolution.
dazz on February 27, 2016 at 7:57 pm said:

Mung:
My view is that it is a model of directed evolution, but that is what it is supposed to be.

What would be the Director in the model / algo?
Mung on February 27, 2016 at 8:04 pm said:

dazz: Maybe our creationist friends can answer how can it calculate the fitness of each generation without a single reference to the target string in it, LOL

What do you mean?

Just reading the comments we see the following:

…where `n_match` is the number of characters matching the target.
Mung on February 27, 2016 at 8:06 pm said:

dazz: What would be the Director in the model / algo?

Her name is Oracle.
dazz on February 27, 2016 at 8:13 pm said:

Mung: What do you mean?

Just reading the comments we see the following:

…where `n_match` is the number of characters matching the target.

And what’s the target in that algo?

Mung: Her name is Oracle.

So Satan is directing evolution? Maybe if it was coded in Java :p
But seriously Mung, if “your view” is that it’s directed, what’s directing it?
TomMueller on February 27, 2016 at 9:29 pm said:

Is it possible that we are rehashing philosophical arguments from long ago?
It appears to me (I may have this wrong) some gainsayers raising the red flag of so-called “directed evolution” may perhaps be reiterating more general misguided suggestions that “Natural Selection” is guilty of circular reasoning or evolution is reducible to some tautology along the lines that “survivors survive”.

Any such contention is of course wrong as addressed by Karl Popper who ascertained that Evolution as Theory is indeed “falsifiable” and therefore fulfills the criteria required of “empirical statements”.

I also remind everybody of Francisco Ayala (the great shibboleth slayer) specifically his insights explaining that almost any biological phenomenon can be DESCRIBED in teleological terms but at the same time CANNOT BE EXPLAINED in teleological terms. Such subtlety of distinction is crucial to understanding.

Like I said, I wonder whether some of us may be rehashing.
dazz on February 27, 2016 at 9:47 pm said:

TomMueller: Is it possible that we are rehashing philosophical arguments from long ago?
It appears to me (I may have this wrong) some gainsayers raising the red flag of so-called “directed evolution” may perhaps be reiterating more general misguided suggestions that “Natural Selection” is guilty of circular reasoning or evolution is reducible to some tautology along the lines that “survivors survive”.

Well said. That’s why I want Mung to identify the director in the algo, and what that models in RL. If evolution is directed because NS “picks” from random variation, fine. So Gravity is also directed by mass, and magnetic fields by moving charges…
Mung on February 27, 2016 at 9:51 pm said:

Guided Local Search

But it’s not really a search. and it’s not really guided, because, you know, we can’t see the name of “The Guide.”
Mung on February 27, 2016 at 9:53 pm said:

TomMueller: Is it possible that we are rehashing philosophical arguments from long ago?

Perhaps. But if someone is going to claim it’s not guided they ought to have an argument that’s more than a huge non-sequitur.
dazz on February 27, 2016 at 10:09 pm said:

Mung:
Guided Local Search

But it’s not really a search. and it’s not really guided, because, you know, we can’t see the name of “The Guide.”

The features from the local optima are evaluated and penalized, the results of which are used in an augmented cost function employed by the Local Search procedure

I would say the augmented cost function is what guides the algo to get the Local Search out of local optima when it gets stuck. Hill climbing algos can’t go downhill, so will remain in a local optima if unguided out of it

So what guides GAs Mung? Don’t be irresponsive. It’s a very simple question
dazz on February 27, 2016 at 10:11 pm said:

Mung: But if someone is going to claim it’s not guided they ought to have an argument

You’ve been told time and again that GAs aren’t guided in the sense that they follow partially random paths and there’s no way to predict the path they will follow
petrushka on February 27, 2016 at 10:14 pm said:

dazz: I would say the augmented cost function is what guides the algo to get the Local Search out of local optima when it gets stuck. Hill climbing algos can’t go downhill, so will remain in a local optima if unguided out of it

Perhaps that’s why the version of weasel that allows drift is important. It should be less likely to get stuck, even if it never reaches the hilltop. Or more precisely, not all members of the population reach the top.

There are two possible interpretation of “directed.”

On might be that the mutations are directed. I think we can safely discard this one.

The other might be that weasel is like artificial selection. If so, so what?
Flint on February 27, 2016 at 10:27 pm said:

dazz:
I would say the augmented cost function is what guides the algo to get the Local Search out of local optima when it gets stuck. Hill climbing algos can’t go downhill, so will remain in a local optima if unguided out of it

As I understand it, in Real Life, optimality has countless dimensions. There isn’t really any general local optimum, there are only specific local optima for hundreds of characteristics, and that all these optima are independent. For example, additional height might be an improvement, but greater speed might also, and better camouflage might also, and better teeth, and so on and on and on. And so it’s easily possible for the organism to lose ground on some dimensions but gain enough ground on others to compensate.

So I wonder if GAs implement this in some way.
Mung on February 27, 2016 at 10:58 pm said:

Unlike many methods, GAs use probabilistic transition rules to guide their search.

– David E. Goldberg
petrushka on February 27, 2016 at 11:00 pm said:

Ah, we are going to argue ad dictionarium, again.
Mung on February 27, 2016 at 11:08 pm said:

petrushka: Ah, we are going to argue ad dictionarium, again.

Feel free to ignore any and all evidence that poses a danger to what you believe.
dazz on February 27, 2016 at 11:17 pm said:

We all knew you would do that Mung, you’re so predictable.
Fine, so probabilistic transition rules “guide” GAs. Fair enough.
What are those rules and what do they model in RL? Seems we’re getting somewhere finally
Mung on February 27, 2016 at 11:27 pm said:

dazz: We all knew you would do that Mung, you’re so predictable.

So you all knew all along that I would provide evidence for my claim. Cool.
dazz on February 27, 2016 at 11:29 pm said:

Mung: So you all knew all along that I would provide evidence for my claim. Cool.

Please answer my questions and stop deflecting
Flint on February 27, 2016 at 11:32 pm said:

Mung: Feel free to ignore any and all evidence that poses a danger to what you believe.

There doesn’t seem to be much disagreement as to how GAs work and how they are different from brute force calculations from known formulas. The disagreement seems to be about whether evolution follows the environment by trial and error, or whether someone behind the curtain has a thumb on the scales.
Flint on February 27, 2016 at 11:36 pm said:

Mung: So you all knew all along that I would provide evidence for my claim. Cool.

How is that quote “evidence’? It seems you are combing through the literature for specific word combinations, rather than an understanding of the process. If I claim water has motivations, and find someone saying water “seeks its own level”, is this evidence for my claim?
CharlieM on February 28, 2016 at 12:02 am said:

Can Weasel be used as a model for the maturation of an individual organism?

If we start with 27 letters each representing a pluropotent stem cell and the target is a group of letters that represent an adult built up of several specialised cells.

Would this be a legitimate way to use the Weasel simulation?
CharlieM on February 28, 2016 at 12:33 am said:

dazz: You’ve been told time and again that GAs aren’t guided in the sense that they follow partially random paths and there’s no way to predict the path they will follow

In snakes and ladders there is no way to predict the path the winner will take but it is known in advance where she or he will end up.
Mung on February 28, 2016 at 12:34 am said:

Flint:If I claim water has motivations, and find someone saying water “seeks its own level”, is this evidence for my claim?

The quote I provided was specifically on point as to the issue under debate.
CharlieM on February 28, 2016 at 12:34 am said:

Mung:
My view is that it is a model of directed evolution, but that is what it is supposed to be.

Sounds about right to me.
Mung on February 28, 2016 at 12:37 am said:

Flint: The disagreement seems to be about whether evolution follows the environment by trial and error, or whether someone behind the curtain has a thumb on the scales.

We were talking about a specific GA not about evolution in general. If people want to insist that GAs are unguided in order to avoid the implication that evolution is guided that’s really not my problem, nor do I see it as being relevant other than as perhaps an explanation for what motivates people to deny the obvious. But I don’t much see the point in trying to figure out the motivation behind the claims of dazz and Allan.
Mung on February 28, 2016 at 1:15 am said:

dazz: Fine, so probabilistic transition rules “guide” GAs. Fair enough.

Where do those transition rules come from?
Mung on February 28, 2016 at 1:21 am said:

Genetic algorithms are a class of general-purpose search methods combining elements of directed and stochastic search which can make a remarkable balance between exploration and exploitation of the search space.

Genetic algorithms perform a multiple directional search by maintaining a population of potential solutions.

– Genetic Algorithms and Engineering Design
Mung on February 28, 2016 at 1:23 am said:

Wasn’t someone claiming that GAs were not searches?
stcordova on February 28, 2016 at 1:43 am said:

There are explicit and implicit targets of GAs. The explicit target is already known in advance and in exact detail like “METHINK IT IS LIKE A WEASEL”, whereas the implicit target is not known in advance like the shortest route of a travelling salesman for a never-before-encountered set of travelling points.

http://stackoverflow.com/questions/23736889/genetic-algorithm-travelling-salesman

The thing about biological systems is that although they are optimal in some dimensions, they have wasteful Rube Goldbergesque designs in other dimensions — to do tasks in complex ritual rather than simply.

So natural selection should actually select against the complexity Rube Goldbergesue designs, not for them. Darwin had it backward, he thought natural selection would select for Rube Goldberg complexity which did not exist since he thought that would be the path that nature took.

We see actually the opposite in the lab and field.

Why for example should natural selection select for the regulation of genes by long distance enhancers located in the junk DNA introns of other genes while those introns aren’t transcribing to eRNAs, btw? That is Rube Goldbergesque.

GA’s in general aren’t designed to find such wastefully extravagant Rube Goldbergesque solutions.

Real-life GAs destroy such Rube Goldbergesque solutions in the present day– we call it the 6th great extinction. Birds are a prime example of being selected against by inter species natural selection.
stcordova on February 28, 2016 at 2:05 am said:

DNA are strings, but what sort of Real Life genetic algorithm would arrange DNA to make a set of 3 distant acting “parking lots” on one gene’s introns or an intergenic region to regulate another gene and then let these “parking lots” themselves possibly transcribe to be regulatory eRNAs? It would be a dual-use Rube Goldbergesque design.

The GA would have to have foresight into the molecular machinery and put receptor binding sites in the right place to act as a DNA sign post for the parking lot, but then that parking lot would have to be a usable eRNA regulator when transcribed. GA’s will perform poorly on such polyconstrained systems as demonstrated by a world expert on GAs, Rober Marks, III et. al.

I may be worth repeating here since some may have miss this in the Rube Goldberg discussion.

Philosophy and Complexity of Rube Goldberg Machines

==========
Below is a screen capture from about 25 minutes into Wesley Pike’s 2015 ENCODE presentation (youtube link below) which highlights reasons junk DNA isn’t necessarily junk:

The -10k, -20k, -30k bubbles are the enhancer regions that basically serve as parking lots for the regulatory machinery that acts on the Mmp13 gene.

The Mmp13 gene is represented by the black squares of Mmp13 exons and the GTA (general transcription assembly) bubble. The diagram is also in the paywalled article that colewd mentioned.

This diagram is really a 2 dimensional conceptual picture of how 3 dimensional DNA positions the parking lots (enhancers) for regulatory machines like the Vitamin D receptor (VDR) and RUNX2 assembly . The enhancer parking lots are marked “-10k”, “-20k”, “-30k” and are right there near the promoter region (marked “Pro”) of the Mmp13 gene.

So this shows how junk DNA can be used as a parking lot/scaffold for the complex regulatory machines to park themselves near genes. The parking lot has specific signs (junk DNA sequences) on them that help the regulatory machinery like the vitamin D receptor to find the right parking lot. We might call those junk DNA sequences “receptor binding sites”.

As mentioned earlier the -10k parking lot is 427 bases and the -30k parking lot is 482 bases. I did not find the size of the -20k parking lot in Pike’s paper.

But if ENCODE is right that 80-100% of the DNA is transcribed, there is a chance these parking lots may also create functional RNA transcripts which would be eRNA transcripts:

https://en.wikipedia.org/wiki/Enhancer_RNAs

Whether this happens in this gene remains to be seen, but eRNA transcription does happen in other gene enhancers.

If junk dna enhancer regions exist which act as scaffold/parking lots/receptor binding regions but also provide functional eRNAs when transcribed, then this would be an amazing dual-use Rube Goldbergesque design that should just boggle the mind!
Flint on February 28, 2016 at 2:29 am said:

Mung:
Wasn’t someone claiming that GAs were not searches?

Yes and no. After all, why use a GA to solve a problem in the first place?

I think it’s fairly clear that GAs in general instantiate one particular method of searching when the target is not known in advance. But this requires careful wording. Is the “answer” to the Steiner tree problem known in advance? If so, why bother to search for it? If not, how do you know what you’re looking for and how could you tell if you found it?

I would consider a GA to be a good approach to “solving” the traveling salesman problem: if a salesman must visit N cities where N is larger than 15 or so, what is the shortest distance he must travel to visit them all?

Now, let’s say the GA chews on this problem for a while, and starts spitting out answers. The ACTUAL shortest distance is not known, and for large N is beyond any brute force “try them all” approach. So all the programmer knows is that as the process continues to evolve paths, the paths get shorter.

Perhaps for comparison, we might use a simple algorithm. Like, start at city #1, and always travel to the closest city not yet visited. It’s likely the GA could do MUCH better than this.

So is this a search? Certainly it is. Is it “guided”? Only insofar as the goal — minimize total distance — is used to select paths. Is the target “known?” – no, if the target is a path. Yes, if the target is minimal distance.
dazz on February 28, 2016 at 7:25 am said:

Mung: Where do those transition rules come from?

Good question. Maybe if you could help us identify those rules we could answer it. What are the rules directing GAs and what are their RL counterpart?

I’m willing to explore the idea that evolution is guided, I just want to know what you mean by guiding, and what guides it. Please stop quoting others and explain in good detail what guides the weasel program. Take keiths’ or Tom’s implementations of Joe’s model for example and be specific: point to a function maybe, some equation from Joe’s model… then explain what that abstraction represents in RL.

I’m also a bit confused now. My car comes from America, but I come from Spain. Not sure which of the two guide it, maybe you can help me figure that out too
Allan Miller on February 28, 2016 at 10:00 am said:

[post removed – I see Mung addressed it]
Allan Miller on February 28, 2016 at 10:07 am said:

Perhaps the gainsayers could provide a model (pseudocode will do) that has some means of implementing differential survival in the GA pool, based on string content, that would qualify as NOT a guided search?

ISTM that all implementations of NS in a GA would be regarded as ‘guided’ in their terms. If that’s the case – if NS and ‘guiding’ are synonyms – the dispute seems terminological.
Allan Miller on February 28, 2016 at 10:20 am said:

Mung,

AM: It gets towards the target by means of variation and selection of genotypes in the current population.

Mung: ok. Does that mean it’s directed or it is not directed?

It depends whether you regard ALL implementations of a fitness function, whether target-based or not, as directing.

AM: The programmer does not direct it towards the target.

Mung: How do we know that? How do we judge whether or not the programmer directs it towards the target?

Clearly, exactly the same implementation could, with a minor modification, locate ANY 28-letter string with exactly the same speed. So get the program to choose its own ‘target’. Clearly, the programmmer cannot pre-program the algorithm to find every possible target. The difference is in the parameter, not the code.

After all:

AM: Indeed, there would be no point in writing GAs for problem solution if it were simply a matter of specifying a target and directing the program to find it.

Mung: ok, but that doesn’t mean that’s not what happened in this specific case. That’s not an argument that this is not a model of directed evolution.

So if the programmer picks the target it’s directed, if the program does it’s not? Same program, doing the same thing, from then on.

Allan: Consider a small modification – instead of distance from target, evaluate fitness by adding up the ASCII bits.

Mung: The rather obvious conclusion is that there’s more than one way to specify a target. And again, you have not shown that in this case we are not dealing with a model of directed evolution.

So rather than consider the implications of my suggested change, you retreat to the specific example.

Trying again, here’s another variant of the above. Instead of counting ASCII bits, you have a ‘target’ of all Z’s. You evaluate strings by subtracting their bit sum from the bit sum of this target. Call the simple bit sum comparison Program A, and the subtract-from-target version B.

Which of Programs A and B is directed?
OMagain on February 28, 2016 at 10:24 am said:

Mung: Wasn’t someone claiming that GAs were not searches?

Have you stopped beating your wife yet?

Honestly, what can you possibly be getting out of any of this?
OMagain on February 28, 2016 at 10:26 am said:

Mung: How do we know that? How do we judge whether or not the programmer directs it towards the target?

Who is this “we” you speak of?
Allan Miller on February 28, 2016 at 10:27 am said:

petrushka,

Perhaps that’s why the version of weasel that allows drift is important. It should be less likely to get stuck, even if it never reaches the hilltop. Or more precisely, not all members of the population reach the top.

This version can’t get stuck – the landscape has a single peak, and there are no barriers to progression towards the target. All genotypes are viable.
OMagain on February 28, 2016 at 10:27 am said:

Mung: But if someone is going to claim it’s not guided they ought to have an argument that’s more than a huge non-sequitur.

Presumably you believe that biological evolution is guided by your god. How do you know that and what is it being guided towards? Feel free to answer with huge non-sequiturs.
Allan Miller on February 28, 2016 at 10:30 am said:

CharlieM,

In snakes and ladders there is no way to predict the path the winner will take but it is known in advance where she or he will end up.

And in chess?
Allan Miller on February 28, 2016 at 10:39 am said:

stcordova,

what sort of Real Life genetic algorithm would arrange DNA to make a set of 3 distant acting “parking lots” on one gene’s introns or an intergenic region to regulate another gene and then let these “parking lots” themselves possibly transcribe to be regulatory eRNAs?

I don’t suppose there’s any danger of you actually testing your suppositions with some actual code?
stcordova on February 28, 2016 at 11:15 am said:

I don’t suppose there’s any danger of you actually testing your suppositions with some actual code?

The burden should be on the Weasel advocates, not me to show the relevance of Weasel to actual complex systems.
TomMueller on February 28, 2016 at 11:16 am said:

The contingent randomness of genetic changes required of adaptation require the majority of evolutionary outcomes are in fact maladapted/ clumsily adapted and ultimately quite unpredictable over time. My favorite essay dealing with such important subtlety and parsing of fine distinction is Stephen Jay Gould’s seminal essay The Panda’s Thumb

“Our textbooks like to illustrate evolution with examples of optimal design—nearly perfect mimicry of a dead leaf by a butterfly or of a poisonous species by a palatable relative. But ideal design is a lousy argument for evolution, for it mimics the postulated action of an omnipotent creator. Odd arrangements and funny solutions are the proof of evolution—paths that a sensible God would never tread but that a natural process, constrained by history, follows perforce. No one understood this better than Darwin.” S J Gould

In my opinion, a realistic algorithm would tolerate a certain load of grammar, syntax and spelling mistakes. A realistic program would also tolerate alternate “targets”.

Something along the lines of
“Methinks the Panda has an opposable pollex” would be one correct “target”

While
“Methinks the Panda has an opposable radial sesamoid” would be another correct “target”

Of course, there are a many other conceivable correct targets – validating Petrushka’s contention:

The other might be that weasel is like artificial selection. If so, so what?

In other words, GAs as mathematical models of evolution have “raised the bar” not “lowered the bar” when considering how evolution operates in “real life”. Meaning, in real life evolutionary success would be achieved with far fewer generations than actually modeled here. In other words, the number of generations to success drops dramatically, from 10^40 to mere thousands… to even less!

It seems to me that such obvious considerations are self-evident to Joe Felsenstein and others who dabble in mathematical modeling, but not so self-evident to gainsayers who cannot wrap their heads around this non-issue of “directed”.
Allan Miller on February 28, 2016 at 11:37 am said:

stcordova,

The burden should be on the Weasel advocates, not me to show the relevance of Weasel to actual complex systems.

Weasel advocates are not using it to model genomic interaction. So if you bring genomic interaction into a thread about Weasel, you need to make it relevant. Simply declaring ‘GAs can’t do this’ requires some support. Why can’t they?
CharlieM on February 28, 2016 at 12:33 pm said:

Allan Miller:
CharlieM,

And in chess?

That’s why I chose snakes and ladders and not chess as being analogous to an algorithm such as Weasel. In Weasel each position is seeking a target of one specific letter or space just as in snakes and ladders the target is one specific square. In chess the target is an arrangement whereby the king is trapped and this is not dependent on one specific square. It is dependent on the relative positions of several pieces.
CharlieM on February 28, 2016 at 12:45 pm said:

In my opinion GAs model evolution as if it were equivalent to just one face of Rubik’s cube whereas in actual fact it is the equivalent of trying to solve a multifaceted Rubik’s solid.

I am all for solving problems by simplification but it must be kept in mind that there is a vast difference between the simple model and the real-life situation.
Alan Fox on February 28, 2016 at 12:50 pm said:

stcordova: The burden should be on the Weasel advocates, not me to show the relevance of Weasel to actual complex systems.

Who has ever made that claim? The Blind Watchmaker was published in 1986 (thirty years ago). Dawkins wrote then:

…it [his “weasel” program] is misleading in important ways. One of these is that, in each generation of selective ‘breeding’, the mutant ‘progeny’ phrases were judged according to the criterion of resemblance to a distant ideal target, the phrase METHINKS IT IS LIKE A WEASEL. Life isn’t like that.

Oops ETA correct no of years between 1986 and 2016