An odd post by “news” at UD raises yet again the issue of Fisherian *p* values – and reveals yet again that many ID proponents don’t understand them.

She (I assume it is Denyse) writes:

Further to “Everyone seems to know now that there’s a problem in science research today and “At a British Journal of Medicine blog, a former editor says, medical research is still a scandal,” Ronald Fisher’s p-value measure, a staple of research, is coming under serious scrutiny.

Many will remember Ronald Fisher (1890–1962) as the early twentieth century Darwinian who reconciled Darwinism with Mendelian genetics, hailed by Richard Dawkins as the greatest biologist since Darwin. Hid original idea of p-values (a measure of whether an observed result can be attributed to chance) was reasonable enough, but over time the dead hand got hold of it:

Many at UD may also “remember” Ronald Fisher as the early twentieth century statistician who inspired William Dembski’s eleP(T|H)ant.

In Specification: The Pattern that Signifies Intelligence, Dembski writes:

Fisher’s approach to testing the statistical significance of hypotheses, one is justified in rejecting (or eliminating) a chance hypothesis provided that a sample falls within a prespecified

rejectionregion(also known as acritical region).^{6}For example, suppose one’s chance hypothesis is that acoin is fair. To test whether the coin is biased in favor of heads, and thus not fair, one can set a rejection region of ten heads in a row and then flip the coin ten times. In Fisher’s approach, if the coin lands ten heads in a row, then one is justified rejecting the chance hypothesis.

Fisher’s approach to hypothesis testing is the one most widely used in the applied statistics literature and the first one taught in introductory statistics courses. Nevertheless, in its original formulation, Fisher’s approach is problematic: for a rejection region to warrant rejecting a chance hypothesis, the rejection region must have sufficiently small probability. But how small is small enough? Given a chance hypothesis and a rejection region, how small does the probability of the rejection region have to be so that if a sample falls within it, then the chance hypothesis can legitimately be rejected? Fisher never answered this question. The problem here is to justify what is called a

significance levelsuch that whenever the sample falls within the rejection region and the probability of the rejection region given the chance hypothesis is less than the significance level, then the chance hypothesis can be legitimately rejected.

More formally, the problem is to justify a significance level

α(always a positive real number less than one) such that whenever the sample (an event we will callE) falls within the rejection region (call itT) and the probability of the rejection region given the chance hypothesis (call itH) is less thanα(i.e.,P(T|H) <α), then the chance hypothesisHcan be rejected as theexplanation of the sample. In the applied statistics literature, it is common to see significance levels of .05 and .01. The problem to date has been that any such proposed significance levels have seemed arbitrary, lacking “a rational foundation.”^{7}

The only problem Dembski sees in Fisher’s approach, which he otherwise adopts, is that of coming up with a sufficiently conservative rejection threshold that would enable us to reject a hypothesis as “impossible”, and his somewhat bizarre solution was to come up with an arbitrarily conservative threshold based on his misunderstanding of a paper by Seth Lloyd. Most scientists are perfectly happy with something much less stringent: 5, or even 3 sigma, rather than Dembski’s 23.

But the problem of the appropriate alpha criterion is not the problem articulated in the Nature article by Regina Nuzzo that Denyse cites, and Dembski’s “solution” would not solve it. What Nuzzo draws attention to is the problem is that the *p* value that is the output of Fisherian null hypothesis testing *is not the probability we want to know.* Jacob Cohen’s classic article, The world is round, p<.05 is 10 years old this year, and IMO should be required reading for all ID proponents (and indeed for anyone who ever uses null hypothesis testing), so Denyse is late to the party.

But Cohen’s point is worth repeating. Nuzzo writes:

One result is an abundance of confusion about what the

Pvalue means^{4}. Consider Motyl’s study about political extremists. Most scientists would look at his originalPvalue of 0.01 and say that there was just a 1% chance of his result being a false alarm. But they would be wrong. ThePvalue cannot say this: all it can do is summarize the data assuming a specific null hypothesis. It cannot work backwards and make statements about the underlying reality. That requires another piece of information: the odds that a real effect was there in the first place. To ignore this would be like waking up with a headache and concluding that you have a rare brain tumour — possible, but so unlikely that it requires a lot more evidence to supersede an everyday explanation such as an allergic reaction. The more implausible the hypothesis — telepathy, aliens, homeopathy — the greater the chance that an exciting finding is a false alarm, no matter what thePvalue is.

Exactly. This is precisely the problem that underlies the entire ID project. Both the concept of CSI (and friends) and the concept of Irreducible Complexity depend on the principle that phenomenon X can be demonstrated to be improbable under some null i.e. have a low Fisherian *p* value. But *what we want to know*, as Cohen points out, is not how probable our observations (or “Target” as Dembski calls it) are if a hypothesis is true, but how *probable it is that the hypothesis is true*. And Fisherian hypothesis testing simply does not give you that probability. And worse – a Fisherian p value is only as good as the definition of your null – it only tells you that your observations are unlikely *under the null that you tested,* which brings us back to the good old eleP(T|H)ant in the room.

The infinitessimally tiny *p* values that pass the stringent Dembskian alpha criterion of 23 sigma (or 500 bits, or whatever) as a result of CSI (or other alphabet soup) computations are Fisherian *p* values. Those *p* values are not the impossibly low probability that “evolution did it”, nor is 1-P the unfeasibly high probability that the pattern in question was intelligently designed. P is simply the probability that the pattern was the result of the process specified in H, where, to quote Dembski, “**H**, here, is the relevant chance hypothesis that takes into account Darwinian and other material mechanisms”.

Which is why, Barry, that on seeing a table laid with 500 coins, all heads up, we do not conclude that they were intelligently laid. What we do is reject the probability that they got there by a process of fair-coin-tossing. And this is not a minor quibble. It is precisely the point of the Nature article that your newsdesk has commended to you.

Implicit in the typical ID probability analysis is the idea that intelligence (God, really) can do anything. So it’s a low probability, supposedly, of natural processes causing life, complexity, etc., vs. Intelligence can do anything.

It works, but only if you grant their premise that Intelligence can do anything, anywhere, without itself needing a cause (evolution being the only cause of biologic intelligence in our knowledge base). Implicitly, God is always granted a high probability, and this fact becomes explicit in their “conclusions.”

Glen Davidson

Which is why it is important for the Bible to be historically accurate. It is the only evidence of God existing or acting.

Well, no, it still doesn’t “work” because you can only derive a fisherian p value for your observation given your null if you are clear about your null.

Dembski rejects anything other than Intelligent Design hypothesis if “the relevant chance hypothesis that takes into account Darwinian and other material mechanisms” if the probability of some special pattern (a functional protein for instance) under that null hypothesis is below 10^120.

Leaving aside the wrongness of his assumption that 10^120 represents the number of opportunities that pattern has to pop up under that null, it’s simply not possible to compute the probability itself under a null that somehow embraces the entire set of non-design hypotheses. Or, if there is, nobody had yet demonstrated how.

The really silly thing is that the entire argument is a counter-argument to an argument that nobody has made, scientifically (although they may have made it metaphysically). Nobody has attempted to show that non-Design is probable. All anyone has done is to show that non-Design is possible – that known non-Design processes can account for the kinds of patterns that IDists insist they can’t, and indeed, predict quite a lot of the features we observe that would be somewhat odd if living things were designed in a top-down way.

Certainly nothing in science rules out intervention by an Intelligence that can do anything anywhere without needing a cause, or indeed physical embodiment, although I think it leads to ugly theology – and theodicy.

And absolutely nothing in science rules out the possibility that a deity of some sort considered all possible worlds with self-consistent laws, and picked to actuate one of the few in which those laws resulted in something like us. But in that scenario, there’s no reason to stop looking for mechanisms by which those self-consistent laws did produce something like us, nor any way of inferring, from within that world, that such a deity were responsible, other than revelation.

Most of the post is of course correct. So is Dembski’s discussion of P values. But I don’t think Dembski’s use of P values in his Design Inference uses them to compute the probability that Design Did It.

My reading of Dembski’s tail probability is that he wants to set it small enough that an event that improbable would not occur even once in the history of the universe, if each event in the universe were a trial. At that point he hopes that people will conclude that natural processes are not the explanation for the event. But he does not put prior probabilities on Natural and Non-Natural and compute a posterior probability.

The problem with Dembski’s calculation is that he requires us to compute the probability that his event occurs, where the processes include not simply mutation but also natural selection. He is making us do all the difficult work, and he and his supporters do none of the work. Then he and his supporters announce that Complex Specified Information has proven that natural processes were not the explanation. Actually CSI is only declared to be present once it has been shown that natural processes were not the explanation, so the concept of CSI is redundant.

An additional place where statistical methodology comes into ID arguments is when they invoke “inference to best explanation”. My reading of their not-carefully-explained arguments is that “best explanation” means that the probability of the adaptation is high if a Designer Did It, but the probability is a lot lower if Nature Did It.

This, however is unconvincing for a couple of reasons:

The Designer could easily do it, but we have no information about what the designer would really want to do. The Designer could make elephants big, slow, and gray, but the Designer could also make them small, rapid, pink, and airborne. So there is no way of calculating the probability that the Designer would in fact make the features we see.

No prior probabilities are available as to whether there is a Designer around.

There is some Bayesian argument there, but they do not clarify what it is.

No, and I shouldn’t have implied he did use it that way. That was me talking – making the same point as Nuzzo makes, namely that a Fisherian p value doesn’t tell you what you want to know i.e. the probability that your hypothesis (null or H1) is correct.

It only tells you something quite trivial – that your observation under some highly specific null hypothesis is high improbable. It can’t tell you that your observation under a class of hypotheses is improbable, which is what Dembski wants it to do.

What he is is actually doing, as has been explained to us, is requiring

to rule out a class of hypotheses (all hypotheses using the forces of natural selection, mutation, migration, and genetic drift). Then he declares Complex Specified Information to be present. He doesn’t actually tell us how to rule out the observation under that class of hypotheses. It’s us that has to come up with the rejection region.usI’ve argued that ID is unscientific — not because it is wrong — but because it asks a useless question.

The task of science is to discover regular processes and to see if complex phenomena can be accounted for by regular processes.

If you ask whether a phenomenon could have been produced by unseeable actors, the answer is always yes. You can’t rule out what can’t be seen. But asserting magic isn’t useful.

You have to give Behe credit for making a testable assertion of probability — namely that you can’t bridge from one configuration to another by small steps. Unfortunately the testing part doesn’t seem to be going his way.

Lizzie, why have you never learnt to spell “infinitesimally”?

lol

I’m quite a good speller, but as a result, there are some words I assume I’ve got right! Thanks!