Weasel Wars – Directed Evolution

This is just an effort to help keep Joe’s thread focused and to help keep it from being derailed. People can use it or not. I hope they will.

CharlieM: Can someone explain to me, why is all of this not just a model of directed evolution? Surely it is set up to be directed towards a target?

Allan Miller: It gets towards the target by means of variation and selection of genotypes in the current population. The programmer does not direct it towards the target. Indeed, there would be no point in writing GAs for problem solution if it were simply a matter of specifying a target and directing the program to find it.

Consider a small modification – instead of distance from target, evaluate fitness by adding up the ASCII bits. Those with the greater sum are fitter than those with a lesser. There is no mention of a distant target – although it is clear that the program will converge on a string of all Zs, the program doesn’t know this. It is not drawn by, nor directed towards, that target. It is simply doing generational evaluation of fitter and less fit genotypes in the current population.

Discuss.

340 thoughts on “Weasel Wars – Directed Evolution

  1. Mung,

    Flint: I pointed out that IF there is an overflow, then the prediction of all Zs will fail for that very reason.

    Mung: That’s what Allan gets for not actually getting in and coding it.

    So a program I didn’t write would fail to get all Z’s if I’d provided too small a summing field. But I’d have to actually write it, complete with bug, then fix the bug, and ‘this is what I get’ for not so doing.

    Another devastating point for the Mung! It’s not about understanding the concept, it’s about the bugs!

  2. Flint,

    So let me try a different approach. The goal is for the program to converge on a list of all Z characters.

    No, my goal was to illustrate a process that did NOT have an explicit target. Therefore, comparing some numerical representation of strings in which they were numerically ‘higher’ or ‘lower’ seemed a fair way to do it. I was trying to divert focus from this ‘target’ notion to emphasise that the convergent result – which may have been intended or not – is something that falls out of intermediate population-level process. By using numerical sum, (and a certain care in implementation!) all Z’s is the end result only because the program can’t go anywhere else. It is not the ‘goal’. One could flip the operator in the comparison and end up with all spaces. One could pick an intermediate value, and have a range of possible fittest genotypes – multiple peaks of equal height. None of these would be goals either, but a simple end-run consequence of the selection operation.

  3. Using the sum of bits, of course, would also result in multiple peaks – not least, all anagrams of the maximally fit ‘phrase’. So, is it the phrase or the optimal bit sum which is the ‘target’ in this implementation?

  4. Flint: But I’m going to guess that you are using the word “checksum” to indicate any and every possible way of validating the integrity of a data set – it might be a sum, or a CRC, or a bitsum, it might be XORING together all the bytes, it might be an encryption where there is (or maybe isn’t?) a decryption key or password, etc. If this is your definition of a checksum, you aren’t making it clear.

    Yes. I think keiths is using ‘checksum’ to mean a sum (with or without overflow) that is used to check data integrity. Its a definition-by-use. But if the calculation CANNOT overflow, then why not just use the word ‘sum’? IOW I agree with Flint that the word ‘checksum’ implies the result of a calculation that permits (and discards) overflow.
    keiths attempts to justify his idiosyncratic definition with examples of the machine-code meaning of the word ‘sum’. A machine-code (two’s complement) ‘sum’ will produce a mathematical (standard usage of the word) ‘sum’ if and only if there is no overflow. If the possibility of unmanaged overflow exists, then you don’t know whether your ‘sum’ is correct or not; therefore we don’t call it a ‘sum’, which could be misleading, we coin the term ‘checksum’…

  5. Allan Miller:
    Using the sum of bits, of course, would also result in multiple peaks – not least, all anagrams of the maximally fit ‘phrase’. So, is it the phrase or the optimal bit sum which is the ‘target’ in this implementation?

    A quarter of a billion of them. I was hoping we could persuade Mung to code this bitsum version, so he could “prove” everyone wrong.
    Including himself.
    OOOOOOOOOOWWWWWWOOOOOOWWWWWW

  6. DNA_Jock: So, is it the phrase or the optimal bit sum which is the ‘target’ in this implementation?

    Well might you ask. The same distinction is important in animal or plant breeding. Opponents of evolutionary biology sneer that these are examples of “directed” evolution, and represent the breeder front-loading the information into the breeding program. The breeder is directing the phenotype towards higher values on a scale.

    But the breeder does not know, and cannot control, the exact genotype used to achieve those higher phenotype values. As in the bitsum case, there is no information about the desired genotypes imparted by the breeder, except for the target value of the phenotype.

  7. It would be interesting to see if a GA could converge on a checksum.

    Sum of bits might be one kind of checksum, but there are others, including CRC that would be difficult.

    One of the points of contention by IDists is the “isolated islands” claim. How likely is it that a mutation will be favorable or neutral.

    The CRC is designed to reject mutations, however small, but there are other checksum algorithms that are more permissive.

  8. keiths: You’re confusing what you want me to have said with what I actually have said.

    Now that is funny. That’s a keiths keeper.

  9. Often it appears that creationists and IDists argue that favorable mutations are impossible (for some obscure not-quite-explained reasons involving information theory and where information comes from).

    So, for example, if we have a functional gene and at position 179 in Exon 1, a C mutates to a G, and renders the protein less functional, that is possible. But can that G later mutate back to a C? They seem to think that for some information-theoretic reason, that that is impossible.

  10. Allan Miller: No, my goal was to illustrate a process that did NOT have an explicit target.

    By replacing it with an implicit target?

    After the following code is executed, how is the target an less explicit?

    char_set = (32..126).map {|i| i.chr}.shuffle.slice(0, 27)
    target = 28.times.map {char_set.sample}

    You can even print it out.

    puts target.join

    “aR4sqZ~abgR4qRq8h[)q)8s))heO”

  11. Is Weasel an example of directed evolution? Perhaps not.

    It could be that it is not a directed search. But it sure acts like one.

    It could be that it is not an evolutionary algorithm. But it was certainly written as if it was meant to be one.

  12. Alan Fox: @ CharlieM

    Catching up, and just to reiterate what Allan Miller has been saying, you can consider the environment as the designer in natural selection. I’m happy if you want to call “natural selection” by the alternative name of “environmental design”. So neither do I disagree with your quoted remarks.

    So we are in agreement that natural selection is the way in which organisms are synchronised to the environment. But I’m not so sure you will agree when I say that I don’t think that natural selection drives the evolution of life as a whole.

    I think that evolution is a process whereby organisms gain more and more independence from environmental control. For example, witness the move from dependence on external heat to internal thermoregulation, from creatures which let the environment deal with their progeny to creatures which keep their offspring to be separate from the external environment and nurture them as they mature. Look at the progression from fish up to mammals and birds and this becomes an obvious observation.

    The evolution of life is not a case of the environment giving direction to the variable life forms. It is a case of life growing out of and separating itself from the environment. And human lives are prime examples of organisms which have separated furthest from the external environment which they find themselves in.

  13. Joe Felsenstein: Well might you ask.The same distinction is important in animal or plant breeding.Opponents of evolutionary biology sneer that these are examples of “directed” evolution, and represent the breeder front-loading the information into the breeding program.The breeder is directing the phenotype towards higher values on a scale.

    But the breeder does not know, and cannot control, the exact genotype used to achieve those higher phenotype values.As in the bitsum case, there is no information about the desired genotypes imparted by the breeder, except for the target value of the phenotype.

    Humans have been successfully breeding plants and animals for millenia. They begin with certain phenotypes they have certain phenotypes as a target. Genotypes are an abstraction that they knew nothing about and so it didn’t concern them. So what do you mean by “desired genotype”? When farmers go to a mart to choose a bull to mate with his cows, is he desiring a certain genotype or does he use his experience on decided by the physical qualities of the animal?

  14. Mung, you are adept at two dimensional word lawyering, and little else. Everything is black and white, when it serves your argument.

  15. petrushka,

    Mung, you are adept at two dimensional word lawyering, and little else.

    What else can he do? It’s not like he has an actual case to make against Weasel.

  16. DNA_Jock,

    Your (and Flint’s) position is

    a) that a checksum is not a sum used to check data integrity;

    b) that whoever coined the term ‘checksum’ was therefore wrong to do so;

    c) that a sum used to check data integrity becomes a checksum only when it ceases to be a sum; and

    d) that the same set of bytes at the end of the same data block used for the same data-checking purpose is a checksum in one case but not another.

    Who’s being idiosyncratic here?

  17. Mung,

    Allan: No, my goal was to illustrate a process that did NOT have an explicit target.

    Mung: By replacing it with an implicit target?

    Sigh. No, by removing targets altogether. If you simply convert strings into numbers and differentially reward higher (or lower, if you prefer) values during operation of the algorithm, what does any target have to do with it?

    After the following code is executed, how is the target an less explicit? […]

    Your response missed the point completely. I deliberately removed comparison of population strings with any other string that is not in the population. My earlier-mentioned process would not compare any string with all Z’s, or indeed any other target. Your response is to say “what about this target?”.

  18. The checksum frakas is among the least productive arguments ever to grace thes web pages.

    It really is as pointless as mung’s word lawyering.

    How about each side just post the definition it wants to use and let it be?

    I’ll post mine:

    A checksum is the result of an algorithm designed to test data integrity. Period.

    Since there are lots of algorithms and lots of implementations for specialized purposes, nothing else necessarily follows.

  19. petrushka: The checksum frakas is among the least productive arguments ever to grace thes web pages.

    I agree. And yet, like you <heh>, I cannot resist the urge to stick my oar in one more time…
    keiths writes:

    Your (and Flint’s) position is

    a) that a checksum is not a sum used to check data integrity;

    Well, a checksum will take the same value as a sum under certain circumstances. My point is that that does not make them equivalent. You appear to be arguing that since, under certain circumstances, sum and checksum take the same value, then they are the same thing. You might as well argue that sum() is equivalent to product()…

    b) that whoever coined the term ‘checksum’ was therefore wrong to do so;

    No, they were right to coin a new term, because the functions are NOT equivalent. Your complaint appears the hinge on the presence of the letters S U and M in the word ‘checksum’. By your logic, the use of the term <x co-ordinate> is wrong, viz: “It’s not an ordinate, it’s an abscissa, dammit!”

    c) that a sum used to check data integrity becomes a checksum only when it ceases to be a sum; and

    It doesn’t become anything. If it may differ from a sum, then it’s a different function, whether or not it actually gives a different result. See sum() and product() above.

    d) that the same set of bytes at the end of the same data block used for the same data-checking purpose is a checksum in one case but not another.

    People are welcome to use a sum to check data integrity. I’ve done it myself. They could use a product, or any other function, as Flint noted. But renaming what is in fact a sum a “checksum” because of how you are using it seems a) superfluous and b) an invitation to confusion.
    Caveat: people talking about machine code have a jargon all of their own.

    Who’s being idiosyncratic here?

    Good question.

  20. I can play mung’s game:

    A checksum or hash sum is a small-size datum from a block of digital data for the purpose of detecting errors which may have been introduced during its transmission or storage.

    CRCs and several other kinds of calculated data are called checksums, even though they stray pretty far from the concept of addition.

    I’ll go further out on the limb and assert that the only thing checksums have in common is their purpose. Aside from being used to validate stored data, they have little or nothing in common. Not size, not overflow, not the algorithms used to calculate them, nor anything else about their implementation.

  21. There is a meta-sum, if the ASCII encoding has a parity bit and you add that into the bitsum … enough, already! 🙂

  22. petrushka:

    The checksum frakas is among the least productive arguments ever to grace thes web pages.

    DNA_Jock:

    I agree. And yet, like you , I cannot resist the urge to stick my oar in one more time…

    Petrushka likes to make unproductive comments about unproductive comments. 🙂

    But renaming what is in fact a sum a “checksum” because of how you are using it seems a) superfluous and b) an invitation to confusion.

    Let’s do a comparative confusion analysis.

    Your position:

    A checksum isn’t a sum, and a sum used to check data integrity isn’t a checksum. What makes a checksum a checksum is the potential for overflow, not the fact that it is used to check data integrity. ‘Checksum’ is a misnomer.

    My position:

    A checksum is a sum used to check data integrity.

    I would say that (a), not (b) qualifies as “an invitation to confusion.”

  23. keiths: My position:
    A checksum is a sum used to check data integrity.

    My position:

    A checksum is a calculated datum used to check data integrity, and in some applications, repair data. It is calculated, but not always a sum.

    Why are we arguing about this?

  24. Allan Miller:
    Mung,

    Sigh. No, by removing targets altogether. If you simply convert strings into numbers and differentially reward higher (or lower, if you prefer) values during operation of the algorithm, what does any target have to do with it?

    Ah!I get it. A moving target 🙂

  25. petrushka,

    Why are we arguing about this?

    a) Because it’s fun, at least up to a point — a point we may be rapidly approaching; 🙂 and

    b) because Flint and DNA_Jock are defending a silly position that can be summarized thus:

    A checksum isn’t a sum, and a sum used to check data integrity isn’t a checksum. What makes a checksum a checksum is the potential for overflow, not the fact that it is used to check data integrity. ‘Checksum’ is a misnomer.

  26. I agree that it may be a misnomer. It’s just that no alternative to “sum” trips easily off the tongue, and it’s established jargon.

  27. CharlieM: So we are in agreement that natural selection is the way in which organisms are synchronised to the environment.

    You are consistent in stating things poorly. Not sure what “synchronized” means there. Natural selection is the way in which populations of organisms become better adapted to their environments.

    I think that evolution is a process whereby organisms gain more and more independence from environmental control. For example, witness the move from dependence on external heat to internal thermoregulation, from creatures which let the environment deal with their progeny to creatures which keep their offspring to be separate from the external environment and nurture them as they mature. Look at the progression from fish up to mammals and birds and this becomes an obvious observation.

    And this is your most common technique: using humans as the pinnacle of creation to construct a scala naturae which does not in fact exist. This, like all your other universal goals of evolution — and you’ve had several so far — does not exist. Homeothermy is an adaptation that’s beneficial in certain circumstances and not in others. There is no “progression” of the sort you frequently imagine. Your observations are at once obvious and wrong. Let’s remember that to a first approximation, all organisms are bacteria.

  28. CharlieM,

    Ah!I get it. A moving target

    No! No target at all! All that happens in that model is that the current population is evaluated by some metric that can be determined with reference to a quality of the population members alone. Much like Natural Selection, indeed. Targets are a red herring.

    I may have confused matters by my suggestion that higher-beats-lower and lower-beats-higher could alternatively be used. I did mean alternatively, and not alternately. But hey, I’m the ‘bitsum’ guy; confusion is my thing.

  29. DNA_Jock,

    Caveat: people talking about machine code have a jargon all of their own.

    It isn’t just assembly-language coders that talk about sums overflowing. High-level language programmers do too, because they have to worry about things like this:

    #include

    int main() {
      short i;

      printf(“%d\n”, sizeof(short)); // prints ‘2’ on my machine

      for(i=0; i<100000; i++) { // loop never completes
      }

    }

    You see it all the time. These are from a discussion of overflow in C code:

    Why is the sum of two large integers a negative number in the C language?

    And:

    If a sum overflows the minimum or maximum value, it’s often computed modulo 2^b , yielding a result of the opposite sign.

  30. petrushka,

    I agree that it may be a misnomer. It’s just that no alternative to “sum” trips easily off the tongue, and it’s established jargon.

    Under your broader definition there’s even less reason to insist, as DNA_Jock and Flint do, that a checksum is a checksum only by virtue of the possibility of overflow.

  31. Just to stir the pot, all the actual implementations of checksums I know about (not many) involve overflow or bit shifting or modulo arithmetic. I don’t know of any algorithms in common use that are just sums as third graders would think of sums.

  32. DNA_Jock,

    You appear to be arguing that since, under certain circumstances, sum and checksum take the same value, then they are the same thing. You might as well argue that sum() is equivalent to product()…

    Not at all. ‘Sum’ and ‘checksum’ are distinct terms with distinct but overlapping meanings.

  33. petrushka,

    Just to stir the pot, all the actual implementations of checksums I know about (not many) involve overflow or bit shifting or modulo arithmetic. I don’t know of any algorithms in common use that are just sums as third graders would think of sums.

    Yes, in real life checksums usually have a fixed width, which means that an additive checksum can overflow if the associated data block is long enough with the right data pattern. However, even if overflow is not possible, as in my example of a 2-byte additive checksum at the end of a 28-character Weasel phrase, the checksum is still a checksum.

    Flint insists that it isn’t, which seems goofy to me:

    IF we stipulate that the value of 5A hex (132 decimal) is the largest permitted value, this will produce a 2-byte maximum value as the checksum ONLY because there is no overflow ((28 * 132) < 65536). But then, it’s not a checksum, it’s only a sum. [Emphasis added]

  34. John Harshman: You are consistent in stating things poorly. Not sure what “synchronized” means there. Natural selection is the way in which populations of organisms become better adapted to their environments.

    okay, agreed, it was a poor choice of wording. How about, in tune with the environment?

    And this is your most common technique: using humans as the pinnacle of creation to construct a scala naturae which does not in fact exist. This, like all your other universal goals of evolution — and you’ve had several so far — does not exist. Homeothermy is an adaptation that’s beneficial in certain circumstances and not in others. There is no “progression” of the sort you frequently imagine. Your observations are at once obvious and wrong. Let’s remember that to a first approximation, all organisms are bacteria.

    I don’t think “goals” is a good term to use. I would prefer to consider evolution as having waypoints.

    You are committed to the belief that evolution has no direction, I am not. How can you be so sure of your beliefs?

    You see evolutionary success as a numbers game. I go for qualities such as individual awareness and consciousness over population sizes. Some things we just can’t put numbers on.

  35. Patrick: You beat me to it.

    It’s high compared to machine code and assembly. It’s also much slower in execution. C++ was originally implemented in the C preprocessor. It’s also portable.

  36. (responding to my comment that in artificial selection breeders do not specify which genotypes they are trying to achieve):

    CharlieM: Humans have been successfully breeding plants and animals for millenia. They begin with certain phenotypes they have certain phenotypes as a target. Genotypes are an abstraction that they knew nothing about and so it didn’t concern them. So what do you mean by “desired genotype”? When farmers go to a mart to choose a bull to mate with his cows, is he desiring a certain genotype or does he use his experience on decided by the physical qualities of the animal?

    Your are right, and in fact that was the point I was making. If someone argues that artificial selection is “directed” they’ll have to explain why the directing did not involve giving the organism any information about what genetic changes to make. At the genetic level, the result does not come from any front-loading.

  37. CharlieM: okay, agreed, it was a poor choice of wording. How about, in tune with the environment?

    Hard to tell. You consistently use words in a way that doesn’t communicate a clear meaning. “In tune” could mean all sorts of things.

    I don’t think “goals” is a good term to use. I would prefer to consider evolution as having waypoints.

    “Waypoints” doesn’t communicate a clear meaning either.

    You are committed to the belief that evolution has no direction, I am not. How can you be so sure of your beliefs?

    If there were a direction of evolution, we would see massively parallel evolution in that direction in all or at least most lineages. We do not. You only find such a direction by picking one lineage (yours).

    You see evolutionary success as a numbers game. I go for qualities such as individual awareness and consciousness over population sizes. Some things we just can’t put numbers on.

    You go for a subjective criterion, chosen because it puts you at the top of a linear scale, and then proceed to ignore the data anyway. This is even worse, since you declare that the criterion you use to measure success is also the goal (or “waypoint”, whatever that means), and then prove that the most successful species match the criterion. That’s as circular as you can get.

  38. Joe Felsenstein: The same distinction is important in animal or plant breeding. Opponents of evolutionary biology sneer that these are examples of “directed” evolution, and represent the breeder front-loading the information into the breeding program. The breeder is directing the phenotype towards higher values on a scale.

    But the breeder does not know, and cannot control, the exact genotype used to achieve those higher phenotype values.

    I’ve often wondered the same thing about creationists’ response to the directed evolution of proteins or RNA (i.e., scientists are using their intelligence to “inject” information into the protein). Historically, directed evolution came up specifically as an alternative to “rational design”, where biochemists tried to pick the best mutations based on their understanding of the chemistry of enzymes. Directed evolution proved much more effective, even though the “evolutionists” often had no idea what the winning mutations were doing to make the enzyme better.

  39. petrushka: It’s high compared to machine code and assembly. It’s also much slower in execution. C++ was originally implemented in the C preprocessor. It’s also portable.

    Oh, sure, but relative to languages like Lisp and Python it’s very low level.

    That being said, I like hacking C. It takes me back to my misspent youth.

  40. Patrick: Oh, sure, but relative to languages like Lisp and Python it’s very low level.

    But it’s extensible. When I did C for a living, we built libraries. With a few libraries, programming gets pretty easy. The syntax seems to be the basis for things like javascript and even python.

  41. Joe Felsenstein: Your are right, and in fact that was the point I was making.If someone argues that artificial selection is “directed” they’ll have to explain why the directing did not involve giving the organism any information about what genetic changes to make.At the genetic level, the result does not come from any front-loading.

    Because breeders know that the organism must be treated as an interconnected whole. Under normal circumstances the organism is in control of its genetic dynamics just as it is in control of its respiration and its metabolism. The breeder selects for the desired traits and the organism does the rest. The breeders learn through experience and a bit of common sense. If flat faced dogs are the target of selection then the animals that are produced will no doubt suffer from respiratory problems.

    As Stephen Talbott puts it:

    You can hardly turn around today without hearing from this or that biologist or philosopher that we have gone beyond old, narrow conceptions of genes as the makers of organisms. And ours is indeed a time of great and bracing change — change, even, that portends revolution. Yet genes are still almost universally regarded among evolutionary theorists as the true bearers of destiny within the organism, which is why “genetic” remains an entrenched, if thoroughly improper, synonym for “heritable”. In other words, the gene is still honored as the one intrinsic factor truly definitive for the life of the organism. Full implications of the fact that organisms live their lives as irreducible wholes remain largely ignored and even taboo.

    The taboo is not hard to understand, since we can fully acknowledge an organism’s capacity to be an organism — its integral power to act, its agency — only by abandoning the materialism and the machine models that have captivated biologists for so long. It’s a difficult step. Faced with the question where the behavioral unity of the organism resides (Who is doing the behaving?) few are prepared to acknowledge that the organism’s coherence is a coherence of intention, idea, and reason operating at an organic level.

    A functioning organism can be studied as a self-enclosed distinct unit. The genome cannot be studied like this in any meaningful way because it is far to entangled in the cell, in the tissues and in the whole.

  42. CharlieM,

    Talbott, Noble, and many others taking this line confuse evolutionary ‘genes’ with developmental ones. They aren’t precisely the same thing. Almost all the elaborate genomics and expression that takes place in multicellular organisms does so in the somatic cells – an evolutionary dead end. Evolutionary genes – simply, lengths of covalently linked DNA of a certain generational persistence – sit there wrapped in this shell, waiting …

  43. petrushka: The syntax seems to be the basis for things like javascript and even python.

    Yes, it is responsible for many modern evils, enforced indentation not being one of them though.

  44. Mung: Yes, it is responsible for many modern evils, enforced indentation not being one of them though.

    Any decent program editor can beautify C code. Or you can get a beautifier from sourceforge.

Leave a Reply