Barry gets it wrong again….

Here.

 

The null hypothesis in a drug trial is not that the drug is efficacious. The null hypothesis is that the difference between the groups is due to chance.

No, Barry.  Check any stats text book.  The null hypothesis is certainly not that the drug is efficacious (which is not what Neil said), but more importantly, it is not that “the difference between the groups is due to chance”.

It is that “there is no difference in effects between treatment A and treatment B”.

When in hole, stop digging, Barry!

If the drug is in fact efficacious, there will be a “real” difference between the two groups.

Wrong again! If the drug is in fact efficacious, you may or may not get a difference between the groups, but if your study has enough power to detect an effect the size of the “real” effect, you probably will.

However, whether or not your drug is efficacious you will get a perfectly “real” difference between your groups.  It just won’t necessarily be due to the drug.

How do you know if there is a real difference. By ruling out the chance explanation, as Professor Thisted says.

But he’s wrong, or rather, sloppy.  What you do is reject the null hypothesis that the there is no effect, and therefore that any between-group differences is simply due to random sampling variance.

Fourth, the “chance” at issue is not the noise in the sampling. I mean, this statement is absurd on its face. If group A takes the treatment and group B takes the placebo, what is being measured when they report back different results?

Yes, the “chance” at issue is indeed noise in the sampling.  That’s why we call it “sampling error”.  And what is being measured when different results are reported are changes in whatever symptoms or signs are supposed to be affected by the drug. What is being tested is whether the differences between the mean changes in each group are probable under the null that there is no effect of the drug.

Ask yourself this question. If the treatment is not effective, what difference would you expect between the two groups?

The answer to this question is a probability distribution based on the postulated effect size (difference divided by variability) and the size of the sample.

Of course, you would expect their response to be roughly equal.

You would expect the differences to vary each time you did the study, and for those differences to have a probability distribution with a mean of zero.

But no two groups are ever going to be exactly equal. Random differences between the groups will result in some difference. A statistical test starts with this assumption (the null hypothesis): There is no difference between the two groups and any difference that is reported is due to chance (i.e., the “chance explanation).

Sloppily worded, yes.  Better worded: that any difference between the two is due to unmodelled variance.  But your “chance explanation” is still not your null.  It’s the sloppy wording for the explanation for the difference between the groups.  These two things are not the same.  Your null – what you reject if your p value is low enough – is the hypothesis that there is no difference between the treatments.

The statistical analysis then determines whether that null hypothesis is rejected.

Yep.  Well, “the null”.  You’ve still got the null wrong.  It isn’t “the chance explanation”.

 

In other words, if you reject the chance explanation, you are left with the conclusion that the best explanation for the data is that the drug is efficacious.

Nope. What you reject is the null hypothesis, which is that there is no difference in the effects of the treatments.  You are still confusing the null hypothesis (what you retain as the truth if you retain the null) with an explanation for the difference you observed between your samples (for which “chance” is a shorthand for “unmodelled variance between the samples).

Finally, while Neil is wrong about the “sampling noise” being the “chance” that is tested, there is such a thing as sampling noise.

Neil is absolutely correct, not surprisingly, being a mathematician and a teacher of statistics, and of course there is such a thing as “sampling noise”.

There is a chance that the sampled population does not truly reflect the real population. Generally, the larger your sample size, the smaller this risk is but it cannot be eliminated completely. In other words, there is a “chance” that the “chance explanation” is correct even though your test says it should be rejected.

No, you have got this garbled.  Yes, indeed, “there is a chance that the sampled population does not reflect the real population” although the correct terminology is “there is a chance that the sample mean (or other summary statistic) will not reflect the mean in the population from which the sample is randomly drawn” and yes, the larger the sample, the closer the sample mean is likely to be to the population mean.  But you have garbled your “other words”, because you still have not understood what the null hypothesis is.  It is not “the chance explanation” for the difference between your two means (in this case).  It’s the hypothesis that the means in the population from which they are drawn is not different.

That risk is measured by the “p-value” Professor Thisted is discussing in his paper. A low p-value means the chance of your analyses being wrong is low. How low is low enough to rely on the test? There is no universally accepted answer to that question. Generally, however, a p-value of 0.05 or less is said to be “statistically significant,” which means that for practical purposes the sampled group can be assumed to be reflective of the population as a whole.

No, this is completely wrong.  If your p-value is below your criterion “alpha” of .05 or whatever, that means that you can reject your null, whatever that was.  It doesn’t mean that your sample mean is close to the population mean.  It means that your sample mean is a long way from the mean you postulated your sample to be drawn from under your null hypothesis.

Barry, you won’t understand this until you know what a null hypothesis is, and you won’t find that out until you stop thinking that it is either “chance” or “the explanation for why your sample is different from your population”.  It isn’t either of those things.However, you are not the only IDist to misunderstand null hypothesis testing. I know a man with two PhDs who also fails to understand it properly (though he gets it a little less wrong than you do).

60 thoughts on “Barry gets it wrong again….

  1. Sal has come up with yet another addition to the IDiot big bowl of alphabet soup acronyms. He says we should no longer use Complex Specified Information (CSI). The new meaningless buzz-phrase is Specified Improbability. (SI)

    I guess now we’ll never get those CSI calculation the IDiots have promised for years but never could deliver. Of course Sal offers no formulas or ways to calculate SI either, but it will probably help the DI sell a few more tee-shirts.

  2. Somebody at UD proposed the example of a battleship (!) and the question was: How can we demonstrate that it is designed ? The result was a lot of bad language but little light. A battleship! Maybe it evolved.

  3. thorton:

    Sal has come up with yet another addition to the IDiot big bowl of alphabet soup acronyms. He says we should no longer use Complex Specified Information (CSI). The new meaningless buzz-phrase is Specified Improbability. (SI)

    I guess now we’ll never get those CSI calculation the IDiots have promised for years but never could deliver. Of course Sal offers no formulas or ways to calculate SI either, but it will probably help the DI sell a few more tee-shirts.

    CSI calculations are apparently at the extreme upper limit of the mathematical ability of most ID/creationists. The mathematics of coin flips may be a little simpler for them.

    If they can get their mathematical theory down to the level of elementary school arithmetic, maybe they will be able to come up with a calculation they will be able to do.

  4. graham2:
    Somebody at UD proposed the example of a battleship (!) and the question was: How can we demonstrate that it is designed ? The result was a lot of bad language but little light. A battleship! Maybe it evolved.

    We compare it and its components to other known-to-be-designed pieces and make hundreds of thousands of identical matches.

    The ID problem has always been how do we tell something is designed when we have seen nothing known-to-be-designed even remotely like it before, such as biological life.

  5. I think the question was suggesting they apply CSI/FCSI/DFCSI/xyz to the battleship(!) and demonstrate design, sort of like an exam question. Surely it would be a doddle. A battleship!. But no, it was all too hard.

  6. graham2:
    I think the question was suggesting they apply CSI/FCSI/DFCSI/xyz to the battleship(!) and demonstrate design, sort of like an exam question. Surely it would be a doddle.A battleship!.But no, it was all too hard.

    We’ve been asking them to calculate the CSI of Mt. Rushmore, probably their most used example of “you can tell design when you see it!”, for almost a decade now. Not one IDiot anywhere anytime has ever calculated the CSI of anything, save their one fatally flawed protein example done by Dembski way back in 2004.

    Sal wants to get rid of CSI but the stench of its failure will linger a long time.

  7. cubist,

    We both agree that us humans do tend to attach more significance to an all-the-same run of coin-tosses than we do to a randomly-distributed run of coin-tosses—but outside of the relative degrees of significance which we humans attach to them, what is the difference between the coin-toss runs?

    If the coins are fair, the tosses are independent, and Pr(H) = Pr(T) = 0.5, then there is no difference in the significance of the sequences. But that’s the rub — we don’t know that the coins are fair, the tosses are independent, etc. Indeed, we suspect that they aren’t in the case of the all-heads sequence.

    What we’re actually doing is assessing the probability that the coins and tosses are fair and independent, given that we observe a particular sequence S.

    If S is all heads, it decreases our confidence in the fairness of the coins and tosses. If S doesn’t seem special, our confidence is undiminished.

    More on this tomorrow, including an explanation of why our suspicions are not merely cognitive illusions.

  8. keiths: while you are gathering your loins, explain to us if the 500 coins represented the value of PI (correct to 500 bits), should we be suspicious.

  9. I think Barry’s underlying point is essentially absolutely correct: there are sets of patterns that are quite probable under the null hypothesis of “tossed fair coins) and sets of patterns that are highly improbable, and I think it’s perfectly possible to tease them apart – to look at a sequence and say how probable it is that it came from a process of fair-coin-tossing. And if it’s highly improbable, we can reject that null.

    My point is, and always has been, far simpler: that “chance” is not the null – the null is “fair coins, fairly tossed”.

    In other words, coins with a head on one side, a tail on the other, and projected on to the table by blind sample of tosses from a of an near-infinite set of possibel tosses out of which about half result in the coin landing heads up and the other half resulting tails up.

    An important word in the way I have phrased it here, is “blind” – the hypothesis that the tosser (who may or may not be an intelligent agents) did not know which toss method she had chosen before she chose it.

    Because, as always, chance refers to what we don’t know, not to what we do.

    And until IDers figure out how to define their null hypotheses properly, no amount of their eleP(T|H)ants are going to stampede The Academy.

  10. graham2: while you are gathering your loins, explain to us if the 500 coins represented the value of PI (correct to 500 bits), should we be suspicious.

    Unless we were explicitly checking, we would never notice. It would look like a random string. Of course, it is exactly as improbably as 500 heads, and if we did notice we would be suspicious.

Leave a Reply