Man has created many sophisticated modelling tools, all of which have different strengths and weaknesses. A good ‘solution’ or ‘description’ of a problem should not be overly complex (parsimony) and also have high descriptive power. Let’s look at how some of these cutting edge tools compare to Nutonian‘s symbolic regression / evolutionary computation product Eureqa:
Looks like we get highly efficient yet incredibly accurate models. Huzzah! Don’t believe me? try it yourself. They “ran Eureqa on seven test-cases for which data is publically available, and compared performance to four standard machine learning methods. The implementations used were the WEKA codes, with settings optimized for best performance”. Weka is free and the Eureqa trial version is free.
Enjoy the full post here
Unfortunately I think Hod’s interpretation of NFL isn’t quite there, though. 😉
Very interesting. Thanks. I need to check that out.
I was waitng for someone else to comment to see if they spotted the spoof. I can’t tell from Lizzie’s comment whether that is subtle irony or…
This is pretty awesome.
Using it on some data now.
oh. What spoof?
I suppose it isn’t cool simply to say I don’t have a clue.
Okay – some background.
Imagine a complicated equation that describes someting. It has a structure whereby the different parts interact in different ways to do different things. It can be long or short. Parts of it may add high or low value. This is very similar to genetics, no?
Eureqa evolves and breeds solutions to problems using genetic structures for equations and culls bad ones based on fitness – how well the equation explains known data.
Ping me offline if you want some help or ideas!
Is this similar to Koza’s GP approach?
I’ve heard of Eureqa, but haven’t the background to fully understand or appreciate it.
Finding equations to fit datasets sounds pretty cool. It’s a bit like an automatic scientific law discoverer.
It would underscore two facets of science that are seldom discussed. One is that science is imaginative and inventive — not simply mapping — and the second is that “laws” are limited in scope and provisional.
Bonus content: Enjoy the video!
Thanks, I will. We have been using SVMs but I’ve been pushing for this in the lab.
I’m deeply ashamed to admit I followed Rich’s link and, as all the jargon completely mesmerised me, I thought it was a Sokal-style spoof.
I shall now follow in the footsteps of Abu Hasan.
Hmmmm. I think we’ll take a look at this as it compares (competes?) with our prop time series modeling technology (GA-evolvable + perceptron-ish memory + secret sauce).
I was wondering if you’d come across this, RBH 🙂
Intelligently Designed computer programs. HUZZAH!
For those who haven’t, in their faux skepticism, actually bothered to read the post you linked to in your OP, there is no mention of evolution. Or Intelligent Design.
Nor is there any pretense to having solved the no free lunch problem. Here’s a direct quote: “There is no free lunch. ”
Nor are the data sets used biological (feel free to correct me).
But the method, now that’s interesting: Symbolic regression
What’s your point, really?
Oh, I get it, all those other machine learning algorithms were not using “evolution!”
Color me dense.
But then, they weren’t using “intelligent design,” either. What’s your point?
So SVMs are naturally occuring?
The joys of watching a driveby in a cul-de-sac.
This might help you out a bit, Mung: http://www.mafy.lut.fi/EcmiNL/older/ecmi35/node70.html
Explain again how symbolic regression excludes the alternatives?
Thanks for the link, but it hardly clarifies whatever point it is you’re attempting to make.
A better EA than EA’s?
Data of unknown process. So your point here is that SR can distinguish design from evolution?
Color me stupid.
Richardthughes: So SVMs are naturally occuring?
You’re pretending to respond to an actual statement made by someone in this thread?
To help Mung and Chubs who is having trouble over on his own blog:
Man is quite a clever problem solver. We use pattern recognition, intelligence and exogenous information to design these classes of models. They do okay, but apparently (emperically) not as well as GAs, which without our intelligence (or hubris or preconceptions) find novel ways to build explanatory frameworks. Consider this: https://www.youtube.com/watch?v=MSo6eeDsFlE
Would you have got there through experimental data? (yes Joe, we know you would have, using choo-choo math).
Oh, and Joe, pick a point of ignorance and stick to it. If you think evolution is okay but was initially configured, don’t be upset with Lenski or Tiktaalik’s location. If you despise all things evolution (and you do, poster child for ID is simply upset with evolution) then also be upset with Eureqa’s outperformance.
So these solutions are designed by an intelligence.
Well, Mung, of the methods mentioned in the OP, the only one I haven’t come across is “regression trees”. Of the remainder, Eureqa is the most similar to evolution, although they all have some similarities. Even linear regression, if you use a Maximum Likelihood Estimation for finding the best fit, rather than Ordinary Least Squares, involves an iterative optimisation method that includes randomly chosen values. The others all involve iterating generations of solutions, with random variation, until the best fit is found.
So I think the point of the OP is that the more similar a fitting algorithm is to real-life evolution, the better it is at finding solutions that are both simple and accurate, the downside being that it takes far more time (because many many generations are needed).
I’ve just been running Eureqa on some brain imaging data, to find an equation that distinguishes between two groups of people, using the time courses of their brain oscillations. We’ve been using SVMs for this in the past, but this looks potentially much more promising, as it allows for non-linear equations, and if…then statements.
So whatever the implications for the smartness of evolution over intelligent design, it looks like a pretty neat tool.
The reason it more closely resembles real-life evolution than the others is that it is far higher-dimensioned – there are far more ways in which it can vary. It also uses sexual reproduction, which makes it more efficient (as it does in real life, allowing even slowly-reproducing organisms to evolve efficiently).
Of course the algorithm itself is “intelligently designed” – computer programs don’t write themselves. But that is a quite different question, and raises the question of whether an evolutionary system could come about naturally, as opposed to the question of whether evolutionary systems can find solutions to problems that that an intelligent designer can’t (without using an evolutionary system).
Which itself raises another question: human designers need to use evolutionary algorithms to solve very complex problems. If they were to attempt to design a living thing, I suggest they would use an evolutionary algorithm. So any argument for ID that is based on analogy with human intelligence (and most are) needs to address the question: would the putative Designer have to use an evolutionary algorithm to do it? Because we have no evidence that human designers can do tackle such a task without using evolutionary algorithms.
Iterative fitting, with random variation along a large number of dimensions, keeping the best and discarding the worst, is an extraordinarily creative process. Watching Eureqa find equations for my data is mesmerising.
Now I understand – I did not get the point of Eureca from the OP.
What is also fascinating is seeing punk eek at work. The algorithm toddles along in a nice little niche, not change much over many thousands of generations, because most variants are worse than the best of the current variants, then suddenly – dramatic change, and a quite different critter starts to dominate the population.
And what is also neat that often that new critter is quite clumsy, with a lot of parameters, but as parameters are penalised, it very rapidly slims down. So I might get a new successful equation with, say 20 parameters. I look again in half an hour, and I have a similarly fit equation with only 14, or even twelve.
If you have a big problem, their cloud solution scales really well (basically a re-sell of the amazon cloud). It’s just a function of cores and time. Also consider limiting the functions that are ‘genomically available’.
I’m not sure that I understand how it deals with time series (which are what I’ve got). I’m just playing right now, but I’d like it to be able to fit functions to time-series for a set of independent series.
I’ll ping you 🙂
How do we determine how similar a “fitting algorithm” is to “real-life evolution”?
You’re implying that evolution finds “solutions” that are both simple and accurate. How do you know this?
And evolution doesn’t?
Eureqa returns a Pareto frontier that trades off complexity for explanatory power. Complexity is a bit like genomic length (although some expressions count for more) and explanatory power is any if the standard accuracy measures (AIC does well)
How does it know what “complexity” is, and what “explanatory power” is and what algorithm does it use to calculate the “tradeoff”?
Keep in mind, of course, that this a better ID than ID.
“Complexity is a bit like genomic length (although some expressions count for more) and explanatory power is any if the standard accuracy measures (AIC does well)”
complexity = length (as for as you’re concerned) and accuracy = how will it fits existing data.
More to bring Mung up to speed:
It looks at the output of ID and says “it’s not that”.
Remember, Mung, that we are comparing a model of biological optimisation (namely the evolutionary hypothesis) with that model applied to an actual optimisation problem. We know our algorithm resembles the biological model because we can look and see that they are the same model.
The point is that the algorithm does optimise very efficiently. Therefore the charge that it couldn’t, in a biological context, doesn’t fly.
I’m “implying” – actually reporting – that the evolutionary algorithm finds solutions that are both simple and accurate, because I can look at the solutions, and find that they are both simple and accurate – simple, in that they use a very small number of data and parameters, and accurate, because they result in very good predictions, as measured by any number of goodness-of-fit measures, e.g. an R2.
Not sure what you mean. Evolutionary algorithms require a large number of generations to optimise.
I think at least one of us is confused here.
Remember Elizabeth, we are working with data and models. I don’t remember seeing any biological data in the dataset. Maybe I missed it.
No-one claimed we were using “biological data”, we make claims about the efficacy of evolutionary methods.
More specifically we are commenting of Dembski’s mathematical model.
Remember those claims about “information”? About semiosis and such?
When you claim that the behavior of chemistry can be abstracted, you open the door to abstract models. Those models can be tested mathematically.
They do not prove that chemistry works that way, but they prove that to the extent evolution can be abstracted as information, it works.
The end of all ID claims involving probability.