This is just an effort to help keep Joe’s thread focused and to help keep it from being derailed. People can use it or not. I hope they will.
CharlieM: Can someone explain to me, why is all of this not just a model of directed evolution? Surely it is set up to be directed towards a target?
Allan Miller: It gets towards the target by means of variation and selection of genotypes in the current population. The programmer does not direct it towards the target. Indeed, there would be no point in writing GAs for problem solution if it were simply a matter of specifying a target and directing the program to find it.
Consider a small modification – instead of distance from target, evaluate fitness by adding up the ASCII bits. Those with the greater sum are fitter than those with a lesser. There is no mention of a distant target – although it is clear that the program will converge on a string of all Zs, the program doesn’t know this. It is not drawn by, nor directed towards, that target. It is simply doing generational evaluation of fitter and less fit genotypes in the current population.
Discuss.
Mung,
So a program I didn’t write would fail to get all Z’s if I’d provided too small a summing field. But I’d have to actually write it, complete with bug, then fix the bug, and ‘this is what I get’ for not so doing.
Another devastating point for the Mung! It’s not about understanding the concept, it’s about the bugs!
Flint,
No, my goal was to illustrate a process that did NOT have an explicit target. Therefore, comparing some numerical representation of strings in which they were numerically ‘higher’ or ‘lower’ seemed a fair way to do it. I was trying to divert focus from this ‘target’ notion to emphasise that the convergent result – which may have been intended or not – is something that falls out of intermediate population-level process. By using numerical sum, (and a certain care in implementation!) all Z’s is the end result only because the program can’t go anywhere else. It is not the ‘goal’. One could flip the operator in the comparison and end up with all spaces. One could pick an intermediate value, and have a range of possible fittest genotypes – multiple peaks of equal height. None of these would be goals either, but a simple end-run consequence of the selection operation.
Using the sum of bits, of course, would also result in multiple peaks – not least, all anagrams of the maximally fit ‘phrase’. So, is it the phrase or the optimal bit sum which is the ‘target’ in this implementation?
Yes. I think keiths is using ‘checksum’ to mean a sum (with or without overflow) that is used to check data integrity. Its a definition-by-use. But if the calculation CANNOT overflow, then why not just use the word ‘sum’? IOW I agree with Flint that the word ‘checksum’ implies the result of a calculation that permits (and discards) overflow.
keiths attempts to justify his idiosyncratic definition with examples of the machine-code meaning of the word ‘sum’. A machine-code (two’s complement) ‘sum’ will produce a mathematical (standard usage of the word) ‘sum’ if and only if there is no overflow. If the possibility of unmanaged overflow exists, then you don’t know whether your ‘sum’ is correct or not; therefore we don’t call it a ‘sum’, which could be misleading, we coin the term ‘checksum’…
A quarter of a billion of them. I was hoping we could persuade Mung to code this bitsum version, so he could “prove” everyone wrong.
Including himself.
OOOOOOOOOOWWWWWWOOOOOOWWWWWW
Well might you ask. The same distinction is important in animal or plant breeding. Opponents of evolutionary biology sneer that these are examples of “directed” evolution, and represent the breeder front-loading the information into the breeding program. The breeder is directing the phenotype towards higher values on a scale.
But the breeder does not know, and cannot control, the exact genotype used to achieve those higher phenotype values. As in the bitsum case, there is no information about the desired genotypes imparted by the breeder, except for the target value of the phenotype.
It would be interesting to see if a GA could converge on a checksum.
Sum of bits might be one kind of checksum, but there are others, including CRC that would be difficult.
One of the points of contention by IDists is the “isolated islands” claim. How likely is it that a mutation will be favorable or neutral.
The CRC is designed to reject mutations, however small, but there are other checksum algorithms that are more permissive.
Now that is funny. That’s a keiths keeper.
Often it appears that creationists and IDists argue that favorable mutations are impossible (for some obscure not-quite-explained reasons involving information theory and where information comes from).
So, for example, if we have a functional gene and at position 179 in Exon 1, a C mutates to a G, and renders the protein less functional, that is possible. But can that G later mutate back to a C? They seem to think that for some information-theoretic reason, that that is impossible.
By replacing it with an implicit target?
After the following code is executed, how is the target an less explicit?
char_set = (32..126).map {|i| i.chr}.shuffle.slice(0, 27)
target = 28.times.map {char_set.sample}
You can even print it out.
puts target.join
“aR4sqZ~abgR4qRq8h[)q)8s))heO”
Wouldn’t the back mutation be less likely than some arbitrary mutation?
Is Weasel an example of directed evolution? Perhaps not.
It could be that it is not a directed search. But it sure acts like one.
It could be that it is not an evolutionary algorithm. But it was certainly written as if it was meant to be one.
So we are in agreement that natural selection is the way in which organisms are synchronised to the environment. But I’m not so sure you will agree when I say that I don’t think that natural selection drives the evolution of life as a whole.
I think that evolution is a process whereby organisms gain more and more independence from environmental control. For example, witness the move from dependence on external heat to internal thermoregulation, from creatures which let the environment deal with their progeny to creatures which keep their offspring to be separate from the external environment and nurture them as they mature. Look at the progression from fish up to mammals and birds and this becomes an obvious observation.
The evolution of life is not a case of the environment giving direction to the variable life forms. It is a case of life growing out of and separating itself from the environment. And human lives are prime examples of organisms which have separated furthest from the external environment which they find themselves in.
Humans have been successfully breeding plants and animals for millenia. They begin with certain phenotypes they have certain phenotypes as a target. Genotypes are an abstraction that they knew nothing about and so it didn’t concern them. So what do you mean by “desired genotype”? When farmers go to a mart to choose a bull to mate with his cows, is he desiring a certain genotype or does he use his experience on decided by the physical qualities of the animal?
Mung, you are adept at two dimensional word lawyering, and little else. Everything is black and white, when it serves your argument.
petrushka,
What else can he do? It’s not like he has an actual case to make against Weasel.
DNA_Jock,
Your (and Flint’s) position is
a) that a checksum is not a sum used to check data integrity;
b) that whoever coined the term ‘checksum’ was therefore wrong to do so;
c) that a sum used to check data integrity becomes a checksum only when it ceases to be a sum; and
d) that the same set of bytes at the end of the same data block used for the same data-checking purpose is a checksum in one case but not another.
Who’s being idiosyncratic here?
Mung,
Sigh. No, by removing targets altogether. If you simply convert strings into numbers and differentially reward higher (or lower, if you prefer) values during operation of the algorithm, what does any target have to do with it?
Your response missed the point completely. I deliberately removed comparison of population strings with any other string that is not in the population. My earlier-mentioned process would not compare any string with all Z’s, or indeed any other target. Your response is to say “what about this target?”.
The checksum frakas is among the least productive arguments ever to grace thes web pages.
It really is as pointless as mung’s word lawyering.
How about each side just post the definition it wants to use and let it be?
I’ll post mine:
A checksum is the result of an algorithm designed to test data integrity. Period.
Since there are lots of algorithms and lots of implementations for specialized purposes, nothing else necessarily follows.
I agree. And yet, like you <heh>, I cannot resist the urge to stick my oar in one more time…
keiths writes:
Well, a checksum will take the same value as a sum under certain circumstances. My point is that that does not make them equivalent. You appear to be arguing that since, under certain circumstances, sum and checksum take the same value, then they are the same thing. You might as well argue that sum() is equivalent to product()…
No, they were right to coin a new term, because the functions are NOT equivalent. Your complaint appears the hinge on the presence of the letters S U and M in the word ‘checksum’. By your logic, the use of the term <x co-ordinate> is wrong, viz: “It’s not an ordinate, it’s an abscissa, dammit!”
It doesn’t become anything. If it may differ from a sum, then it’s a different function, whether or not it actually gives a different result. See sum() and product() above.
People are welcome to use a sum to check data integrity. I’ve done it myself. They could use a product, or any other function, as Flint noted. But renaming what is in fact a sum a “checksum” because of how you are using it seems a) superfluous and b) an invitation to confusion.
Caveat: people talking about machine code have a jargon all of their own.
Good question.
I can play mung’s game:
CRCs and several other kinds of calculated data are called checksums, even though they stray pretty far from the concept of addition.
I’ll go further out on the limb and assert that the only thing checksums have in common is their purpose. Aside from being used to validate stored data, they have little or nothing in common. Not size, not overflow, not the algorithms used to calculate them, nor anything else about their implementation.
There is a meta-sum, if the ASCII encoding has a parity bit and you add that into the bitsum … enough, already! 🙂
petrushka:
DNA_Jock:
Petrushka likes to make unproductive comments about unproductive comments. 🙂
Let’s do a comparative confusion analysis.
Your position:
My position:
I would say that (a), not (b) qualifies as “an invitation to confusion.”
My position:
A checksum is a calculated datum used to check data integrity, and in some applications, repair data. It is calculated, but not always a sum.
Why are we arguing about this?
Ah!I get it. A moving target 🙂
petrushka,
a) Because it’s fun, at least up to a point — a point we may be rapidly approaching; 🙂 and
b) because Flint and DNA_Jock are defending a silly position that can be summarized thus:
I agree that it may be a misnomer. It’s just that no alternative to “sum” trips easily off the tongue, and it’s established jargon.
You are consistent in stating things poorly. Not sure what “synchronized” means there. Natural selection is the way in which populations of organisms become better adapted to their environments.
And this is your most common technique: using humans as the pinnacle of creation to construct a scala naturae which does not in fact exist. This, like all your other universal goals of evolution — and you’ve had several so far — does not exist. Homeothermy is an adaptation that’s beneficial in certain circumstances and not in others. There is no “progression” of the sort you frequently imagine. Your observations are at once obvious and wrong. Let’s remember that to a first approximation, all organisms are bacteria.
CharlieM,
No! No target at all! All that happens in that model is that the current population is evaluated by some metric that can be determined with reference to a quality of the population members alone. Much like Natural Selection, indeed. Targets are a red herring.
I may have confused matters by my suggestion that higher-beats-lower and lower-beats-higher could alternatively be used. I did mean alternatively, and not alternately. But hey, I’m the ‘bitsum’ guy; confusion is my thing.
DNA_Jock,
It isn’t just assembly-language coders that talk about sums overflowing. High-level language programmers do too, because they have to worry about things like this:
You see it all the time. These are from a discussion of overflow in C code:
And:
petrushka,
Under your broader definition there’s even less reason to insist, as DNA_Jock and Flint do, that a checksum is a checksum only by virtue of the possibility of overflow.
Just to stir the pot, all the actual implementations of checksums I know about (not many) involve overflow or bit shifting or modulo arithmetic. I don’t know of any algorithms in common use that are just sums as third graders would think of sums.
Now you’re just word lawyering.
DNA_Jock,
Not at all. ‘Sum’ and ‘checksum’ are distinct terms with distinct but overlapping meanings.
petrushka,
Yes, in real life checksums usually have a fixed width, which means that an additive checksum can overflow if the associated data block is long enough with the right data pattern. However, even if overflow is not possible, as in my example of a 2-byte additive checksum at the end of a 28-character Weasel phrase, the checksum is still a checksum.
Flint insists that it isn’t, which seems goofy to me:
haha. keiths thinks C is a high level language.
You beat me to it.
okay, agreed, it was a poor choice of wording. How about, in tune with the environment?
I don’t think “goals” is a good term to use. I would prefer to consider evolution as having waypoints.
You are committed to the belief that evolution has no direction, I am not. How can you be so sure of your beliefs?
You see evolutionary success as a numbers game. I go for qualities such as individual awareness and consciousness over population sizes. Some things we just can’t put numbers on.
My attempt at humour failed.
It’s high compared to machine code and assembly. It’s also much slower in execution. C++ was originally implemented in the C preprocessor. It’s also portable.
Your are right, and in fact that was the point I was making. If someone argues that artificial selection is “directed” they’ll have to explain why the directing did not involve giving the organism any information about what genetic changes to make. At the genetic level, the result does not come from any front-loading.
Hard to tell. You consistently use words in a way that doesn’t communicate a clear meaning. “In tune” could mean all sorts of things.
“Waypoints” doesn’t communicate a clear meaning either.
If there were a direction of evolution, we would see massively parallel evolution in that direction in all or at least most lineages. We do not. You only find such a direction by picking one lineage (yours).
You go for a subjective criterion, chosen because it puts you at the top of a linear scale, and then proceed to ignore the data anyway. This is even worse, since you declare that the criterion you use to measure success is also the goal (or “waypoint”, whatever that means), and then prove that the most successful species match the criterion. That’s as circular as you can get.
I’ve often wondered the same thing about creationists’ response to the directed evolution of proteins or RNA (i.e., scientists are using their intelligence to “inject” information into the protein). Historically, directed evolution came up specifically as an alternative to “rational design”, where biochemists tried to pick the best mutations based on their understanding of the chemistry of enzymes. Directed evolution proved much more effective, even though the “evolutionists” often had no idea what the winning mutations were doing to make the enzyme better.
CharlieM,
Sorry! 🙂
Oh, sure, but relative to languages like Lisp and Python it’s very low level.
That being said, I like hacking C. It takes me back to my misspent youth.
But it’s extensible. When I did C for a living, we built libraries. With a few libraries, programming gets pretty easy. The syntax seems to be the basis for things like javascript and even python.
Because breeders know that the organism must be treated as an interconnected whole. Under normal circumstances the organism is in control of its genetic dynamics just as it is in control of its respiration and its metabolism. The breeder selects for the desired traits and the organism does the rest. The breeders learn through experience and a bit of common sense. If flat faced dogs are the target of selection then the animals that are produced will no doubt suffer from respiratory problems.
As Stephen Talbott puts it:
A functioning organism can be studied as a self-enclosed distinct unit. The genome cannot be studied like this in any meaningful way because it is far to entangled in the cell, in the tissues and in the whole.
CharlieM,
Talbott, Noble, and many others taking this line confuse evolutionary ‘genes’ with developmental ones. They aren’t precisely the same thing. Almost all the elaborate genomics and expression that takes place in multicellular organisms does so in the somatic cells – an evolutionary dead end. Evolutionary genes – simply, lengths of covalently linked DNA of a certain generational persistence – sit there wrapped in this shell, waiting …
Yes, it is responsible for many modern evils, enforced indentation not being one of them though.
Any decent program editor can beautify C code. Or you can get a beautifier from sourceforge.