Statistical Hypothesis Testing

Wikipedia:

https://en.wikipedia.org/wiki/Statistical_hypothesis_testing

The page is quite good, but I’d like the Math Gods that write here to give their thoughts on the relative strengths and applicability of the various models (T test, Chi squared and F-test / Anova)

I’d also welcome discussion on the null hypothesis and what that rejection of it does and doesn’t entail.

Questions welcomed, Design detectives feel free to contribute.

54 thoughts on “Statistical Hypothesis Testing

  1. For Design discussions, sometimes all I put on the table is pointing out Darwinists don’t even have a statistical model of any relevance.

    Me: what’s the probability insulin evolved?

    Darwinist: 100%

    Me: based on?

    Darwinist: Phylogenetic comparisons and evolutionary theory, and statistics involved on constructing phylogenetic trees, and that evolutionary theory is FACT FACT FACT.

    Me: I was asking statistics based on bio-chemistry and plausible physiology, laboratory evidence, not evolutionary theory. So you don’t have statistical model based on these considerations? On what basis do you claim a metabolism not regulated by insulin, becomes regulated by insulin? Can you make a relevant probability calculation.

    I mean, you need systems to generate insulin, systems to regulate insulin– a rather complicated cascade of well-matched interacting parts that should reasonably be in place lest the system kills the organism. How do you arrive at the 100% figure that it evolved, that such a well-matched system of comparable interdependent complexity is within expectation?

    Insulin isn’t the only way to implement a metabolic regulation system, but how do you establish from chemistry and physics, etc. any metabolic regulation system of comparable complexity is within believable expectation?

    Darwinist: the null hypothesis is that it evolved, you have to show me the Designer for that null hypothesis to be falsified, doesn’t matter that I can’t make it agree with what we know about chemistry, physics or whatever. Unless I see a Designer creating stuff, the null hypothesis is that ordinary processes evolved creatures.

    Me: so you believe an insulin-regulated metabolism or any comparably complex system is within expectation, not because you can actually demonstrate it from first principles that it an arise out of ordinary processes, but that you assume evolution as a null, right?

    Darwinist: right

    Me: how do you know your assumption is correct

    Darwinst: absence of Designer, “absence of evidence is evidence of absence”

    Me: Absence of a naturalistic mechanism, is evidence of absence of a naturalistic mechanism. 🙂 Seems the insulin-regulated metabolism emerge from a non-ordinary process and sequence of events.

  2. stcordova: Me: what’s the probability insulin evolved?

    Darwinist: 100%

    Me: based on?

    Surely nobody actually said anything like that, Sal? For a start, we don’t have a hypothesis-testing methodology that will tell us whether a theory is true. Null hypothesis testing doesn’t give us that, not even the best kind, in which you specify an effect size.

    And while a Bayesian probability estimate will tell you how much the evidence increases the evidence for the hypothesis, it won’t give you an absolute probability, as that will depend on your personal priors.

    But it is absolutely untrue that “Darwinists” don’t have statistical models to test. Phylogenetics employs standard null hypothesis testing, for instance. In fact, evolutionary biologists employ null hypothesis testing as much as any other branch of science.

    So your objection appears to be based on a misunderstanding.

  3. stcordova: Darwinist: the null hypothesis is that it evolved, you have to show me the Designer for that null hypothesis to be falsified, doesn’t matter that I can’t make it agree with what we know about chemistry, physics or whatever. Unless I see a Designer creating stuff, the null hypothesis is that ordinary processes evolved creatures.

    This is not a null hypothesis. A null hypothesis is a specific testable hypothesis based on a study hypothesis you want to test, which would, typically, be a hypothesis derived from some theory, like Darwinian theory.

    For instance you might say: if insulin evolved, we would see A, and then work out how likely it is that you would see A under some null, e.g. neutral drift.

    You could also test between two study hypotheses, e.g. evolution vs special creation, but you’d need to make differential predictions for each.

    There is no rule that says that evolution is always the null. It doesn’t even make any sense, statistically.

  4. stcordova: Seems the insulin-regulated metabolism emerge from a non-ordinary process and sequence of events.

    Presumably that is the sum total of all you have to say on the matter regarding its actual origin then? The Intelligent Design “explanation” is “well, it did not evolve”?

    And no, this conversation never happened, it’s what you imagine happened but it did not

    Darwinist: right

    Me: how do you know your assumption is correct

    Darwinst: absence of Designer, “absence of evidence is evidence of absence”

    Unless you can, you know, provide some actual evidence? But you don’t do evidence do you, you just do empty claims…

  5. stcordova: Me: so you believe an insulin-regulated metabolism or any comparably complex system is within expectation, not because you can actually demonstrate it from first principles that it an arise out of ordinary processes, but that you assume evolution as a null, right?

    Sounds to me that is the assumption you are actually making. You can’t actually demonstrate that any such system was designed. So if the inability to demonstrate is a problem, it’s doubly a problem for you as you literally have nothing. But you don’t see it like that, do you Sal?

  6. Let’s get clear right from the get-go:

    Null hypothesis testing does not give you a probability that your null is false NOR the probability that your study hypothesis is true.

    Trying to teach that to 120 graduate students was what kept me from TSZ for most of this academic year. Please don’t make me do this all over again, Sal!

    What a p value actually is, is the probability of your observations, given the null.
    It is NOT the probability of your null given your observation.

    You could uses Bayesian statistics instead, but it still only lets you compare the evidence for two hypotheses – it won’t tell you the probability that either are true.

  7. stcordova: Darwinist: the null hypothesis is that it evolved

    Dembski, of course, actually does treat “the relevant chance hypothesis that takes into account Darwinian and other material mechanisms” as the null.

    He’s the only person I know that does so. And it’s absurd because you can’t calculate the probability of your observations under that null. That is the eleP(T|H)ant I used to talk about!

    So far from it being a Darwinian error, it’s the error made by the most prominent ID statistician!

  8. stcordova:
    For Design discussions, sometimes all I put on the table is pointing out Darwinists don’t even have a statistical model of any relevance.

    Me: what’s the probability insulin evolved?

    Darwinist: 100%

    Me: based on?

    Darwinist: Phylogenetic comparisons and evolutionary theory, and statistics involved on constructing phylogenetic trees, and that evolutionary theory is FACT FACT FACT.

    Me:I was asking statistics based on bio-chemistry and plausible physiology, laboratory evidence, not evolutionary theory.So you don’t have statistical model based on these considerations?On what basis do you claim a metabolism not regulated by insulin, becomes regulated by insulin?Can you make a relevant probability calculation.

    I mean,you need systems to generate insulin, systems to regulate insulin– a rather complicated cascade of well-matched interacting parts that should reasonably be in place lest the system kills the organism. How do you arrive at the 100% figure that it evolved, that such a well-matched system of comparable interdependent complexity is within expectation?

    Insulin isn’t the only way to implement a metabolic regulation system, but how do you establish from chemistry and physics, etc.any metabolic regulation system of comparable complexity is within believable expectation?

    Darwinist:the null hypothesis is that it evolved, you have to show me the Designer for that null hypothesis to be falsified, doesn’t matter that I can’t make it agree with what we know about chemistry, physics or whatever.Unless I see a Designer creating stuff, the null hypothesis is that ordinary processes evolved creatures.

    Me: so you believe an insulin-regulated metabolism or any comparably complex system is within expectation, not because you can actually demonstrate it from first principles that it an arise out of ordinary processes, but that you assumeevolution as a null, right?

    Darwinist: right

    Me: how do you know your assumption is correct

    Darwinst: absence of Designer,“absence of evidence is evidence of absence”

    Me: Absence of a naturalistic mechanism, is evidence of absence of a naturalistic mechanism. Seems the insulin-regulated metabolism emerge from a non-ordinary process and sequence of events.

    Lots of unanswered questions and debunkings waiting at the YEC 2 thread.

  9. Elizabeth: Surely nobody actually said anything like that, Sal?

    I’ve seen it happen several times, say more that two dozen as a guess. Yeah, people on our side sometimes pontificate on things they don’t understand. Therefore Jesus!

  10. stcordova: Me: what’s the probability insulin evolved?

    Darwinist: 100%

    You asked for an “of the top of my head” opinion. You got such an opinion.

    I don’t see a problem.

    If I want to treat the question as a serious one, then I would be asking you for your statistical model such as would even make this a meaningful question.

  11. The probability that insulin evolved is approximately the same as the probability the earth orbits the sun. High, but less than one.

  12. The probability that insulin evolved is approximately the same as the probability the earth orbits the sun. High, but less than one.

    Me: Based on?

  13. What are you basing your claim that it did not evolve on?

    The fact you guys don’t have a relevant answer to my question, which was:

    Me: Based on?

    Predicted Darwinist Response: ….sequence comparisons….. (aka phylogenetic trees)…..

  14. stcordova: The fact you guys don’t have a relevant answer to my question, which was:

    If you want ID to be something other then something to slot into a gap you perceive with evolution then you need to get over this child like game you are playing.

    The ID “explanation” for the origin of insulin does not need to reference or talk about the equivalent explanation from the evolution side of the fence. Yet you seem to be saying that if I cannot provide that explanation then it defaults to design.

    Can you give a single other example of an unknown that became known in biology and the answer turned out to be “a designer did it”? No, of course you can’t. But that’s what you are saying here.

    If you want to get up to speed on the current ideas about how insulin evolved then a cursory search will get you reading material for the next couple of weeks. I know, I’ve done that search already.

    If ID is ever going to be something other then a gap filler then it’s not going to be because of your efforts is it Sal?

  15. What is the intelligent design explanation for the origin of insulin Sal?

    Predicted Sal Response “it was designed”.

  16. stcordova: The fact you guys don’t have a relevant answer to my question, which was:

    Me: Based on?

    Predicted Darwinist Response:….sequence comparisons…..(aka phylogenetic trees)…..

    That’s right, actual evidence of heredity. Which creationists of all stripes accept as evidence of “natural processes,” until they don’t.

    The evidence for design is, well, nothing at all.

    Glen Davidson

  17. Sal, asking someone how certain they are that something is true is not the same as doing a statistical test.

    I am just about 100% that insulin evolved.

    That is NOT a statistical test!

    Please do not confuse the two.

  18. petrushka:
    The probability that insulin evolved is approximately the same as the probability the earth orbits the sun. High, but less than one.

    The probability that insulin exists is exactly one.

    The probability of insulin having evolved is impossible to calculate from first principles without lots and lots of data that we don’t have. We do, of course, have plenty of data indicating that all earthly life evolved, but when one rejects all of that arbitrarily the calculated probability changes to near zero. So it’s useless to argue that with a YEC.

  19. But I want to reiterate the important point: that statistical hypothesis testing does not give you a probability that your hypothesis is true.

    At best, using Bayesian methods, it can give you an estimate of the amount of support the evidence provides for a model, and for the relative probability of competing models.

    But it doesn’t not give you a measure of the probability of one of a hypothesis being true.

    Thinking that is does is a frequent error among ID proponents (despite Dembski’s attempts to explain it). I’ve often see on UD and other places, people challenging “Darwinists” to estimate the “probability of evolution” (just as I’m seeing it here from Sal.

    It can’t be done. I can give you my hunch, and Sal can give me his, but neither is the result of a statistical hypothesis test (the topic of the OP).

  20. stcordova: The fact you guys don’t have a relevant answer to my question, which was:

    Me: Based on?

    Predicted Darwinist Response:….sequence comparisons…..(aka phylogenetic trees)…..

    Sal: do you now understand why the answer you get when you ask a Darwinist what the probability of evolution is is NOT the result of a statistical hypothesis test?

    If so, can we return to the subject of the OP?

    If not, shall I explain it again?

  21. Do I misunderstand or is Salvador’s argument something like this:

    S: See this green Volvo station wagon here? What is the probability that it was made in a factory?

    me: Nearly 100%

    S: So if a car is made in a factory, you’re claiming that it is nearly 100% likely that it will turn out to be a green Volvo station wagon!

    me: [incoherent splutterings]

  22. Hi Joe. Please don’t let the Sal sideshow stop you from educating me on my shortcomings highlighted in the OP.

  23. (Cordova’s assertions aside) I don’t think I have the energy for a comprehensive taking of positions on statistical methods. A few points:

    1. I tend to waver somewhere between being a likelihoodist and a frequentist.

    2. I strongly suspect that the differences between these schools reflect, not technical issues in mathematical statistics, but different philosophical positions on scientific inference.

    3. Bayesian inference is optimal (maybe Bayesian inference with a cost function for errors is even better). If … if you have a prior in which you can believe, and if it is shared by your audience. That latter requirement is a real killer.

    4. When the prior is something like segregation probabilities in Mendelian inheritance, all statisticians will happily use Bayes’ Theorem. So being a real Bayesian means, not being willing to use a prior, but being willing to use a dubious prior.

    5; Likelihood inference is very good, but it is hard to report the whole likelihood function. Also the only theorems we have to set cutoffs are actually justified only asymptotically, so they hold only in Asymptopia.

    6. Use of P values is recognized more and more as often misleading, It seems best justified when we have a prior probability of the null hypothesis that is fairly high, and we set an extreme P value to not generate too many false rejections of the null hypothesis. Which is really being Bayesian.

    7. Exploratory Data Analysis is fine, but the cutoffs used to indicate a real signal are, alas, arbitrary.

  24. Asymptopia. Haha. I’ll have to remember that one.

    In a near perfect world …

    Or should i say as we get closer and closer to a perfect world?

  25. Joe Felsenstein: Use of P values is recognized more and more as often misleading, It seems best justified when we have a prior probability of the null hypothesis that is fairly high, and we set an extreme P value to not generate too many false rejections of the null hypothesis. Which is really being Bayesian.

    I had a stats instructor who actually wouldn’t allow us to use p values. If you put them in an assignment, he’d cross them out in red biro, and give you zero marks until you resubmitted the assignment without them.

  26. Questions welcomed, Design detectives feel free to contribute.

    For ID there is usually an absence of statistical hypothesis testing and experimentation because much of the data in question is already widely published and accepted.

    In fact, the phrase “Null hypothesis” is almost entirely absent from ID literature as far as I can tell!

    For OOL, the statistics are not so hard. I simply will say, “do you think random chemical interactions can create something as complex as a DNA-RNA-protein replication cycle?” Usually the answer is…..vague or evaded.

    Take a soup of 20 amino acids of life in optimal concentrations comparable to any set of functional proteins you desire. There is a hypothetical X,Y,Z,t coordinate we can associate with each and a probability of bonding (usually low without polymerization mechanism!). Even on the generous assumption we have spontatneous alpha-peptide bonding (I mean really, really generous assumption), the amino acids are unlikely to form a functioning cascade on the assumption the X,Y,Z,t coordinates of each amino acid in a pre-biotic soup is fairly random.

    Dean Kenyon realized even if there are some favored bonding preferences in amino acids, they would be insufficient to constrain random X,Y,Z,t coordinates of amino acids to make functional proteins. Shortly thereafter he became an IDist.

    It really doesn’t make sense that one can make a haemoglobin protein unless there is need for it. Functional protein means functional within an interacting cascade or protein interactome.

    What’s the expectation that some cascade of a certain complexity will emerge via random process? Astronomically remote based on the numbers I’ve seen. We can then reject random process as a mechanism.

    Some have then argued a gradualistic Darwinian type chemical evolution for OOL to overcome the odds. That’s fine if one believes it, the problem is that it makes little chemical sense. Chemicals are governed by chemical laws, not the need to make functional proteins from a pre-biotic soup according to a Darwinian type scenario.

    So although statistical hypothesis testing is interesting, I’ve not seen it much if at all in typical ID literature, at least testing framed in terms of “null hypothesis”.

    Rejection of the chance hypothesis has already been described in professional literature. Most OOLers ignore it. They believe there could be some chemical or self-organizational principle that circumvents random X,Y,Z,t positioning of molecules and somehow channels outcomes toward functionality. The problem is, experiments don’t agree with such Quixotic viewpoints, equilibrium is away from functionality and toward non-life and dead Humpty Dumpty.

    Statistical expectation from random interaction is for non-life, not life.

    Therefore, everyone pretty much agrees for OOL, chance can be rejected as a mechanism, so there is not much need for statistical hypothesis testing. Everyone agrees there must have been some non-random agency. At issue is the nature of that non-random agency.

  27. stcordova: For ID there is usually an absence of statistical hypothesis testing and experimentation because much of the data in question is already widely published and accepted.

    In fact, the phrase “Null hypothesis” is almost entirely absent from ID literature as far as I can tell!

    You need reminding. William Dembski, in his CSI arguments, cites Fisher and sets up a rejection region. I don’t recall whether he uses the phrase “null hypothesis”, but he is certainly using the same statistical machinery.

  28. Joe Felsenstein,

    Also:

    Lee Spetner (1998). Not by Chance!: Shattering the Modern Theory of Evolution. Judaica Press. ISBN 1-880582-24-4

    “Not by Chance”..

  29. Joe,

    I don’t recall whether he [Dembski] uses the phrase “null hypothesis”…

    He does:

    Within Fisher’s approach, the “null” hypothesis (i.e., the chance hypothesis most naturally associated with this probabilistic set-up) is a chance hypothesis H according to which the die is fair (i.e., each face has probability 1/6) and the die rolls are stochastically independent (i.e., one roll does not affect the others).

  30. Yes he certainly does. He spents an inordinate amount of time, in Specification: the Pattern that Signifies Intelligence, on explaining Fisher’s approach, and actually explicitly rejects a Bayesian approach as “inadequate for drawing a design inference”:

    The approach to design detection that I propose eliminates chance hypotheses when the probability of matching a suitable pattern (i.e. specification) on the basis of these chance hypotheses is small and yet an event matching the pattern still happens (i.e., the arrow lands in a small target). This eliminative approach to statistical rationality, as epitomized in Fisher’s approach to significance testing is the one most widely employed in scientific applications.

    Nevertheless, there is an alternative approach to statistical rationality that is at odds with this eliminative approach. This is the Bayesian approach, which is essentially comparative rather than eliminative, comparing the probability of an event conditional on a chance hypothesis to its probability conditional on a design hypothesis, and preferring the hypothesis that confers the greater probability. I’ve argued at length elsewhere that Bayesian methods are inadequate for drawing design inferences. Among the reasons I’ve given is the need to assess prior probabilities in employing these methods, the concomitant problem of rationally grounding these priors, and the lack of empirical grounding in estimating probabilities conditional on design hypotheses.

    His discontent with Fisherian hypothesis testing was the arbitrariness of the alpha cut-off, which is why he sets it so absurdly low – so we can reject the null with utter confidence.

    What he doesn’t tell us how to do, is to compute the null for “everything except Design”, which is what you would need to do to draw a Design influence by his method.

    If you could do that, I’d be perfectly happy with a far less stringent alpha (not that the “Seth Lloyd” criterion he uses makes any sense anyway).

  31. From Richard Hughes wiki link

    A statistical hypothesis is a scientific hypothesis that is testable on the basis of observing a process that is modeled via a set of random variables.[1]

    If you all are claiming ID literature asserts statistical hypotheses, then you are saying they are asserting scientific hypotheses. Is that right?

    If an IDists argues random process cannot account for evolutionary convergence, that is a scientific claim.

    If an IDists further asserts, for evolutionary convergence to happen, the selection pressures would have to be non-random as well with respect to some outcome like say the evolution of flight ( flight in bats, in birds, and in insects), or the convergence of features in archaea and eukarya. That is a scientific claim.

    If an IDist asserts the hierarchical patterns in molecular data between species is not consistent with random process acting on radiation from a common ancestor but would require constraints above and beyond those of simple inheritance — that is a scientific claim.

    If an IDist asserts chemical and physiological considerations suggests selection would select against certain intermediates (such as in Irreducible systems), therefore if selection happens, the selection pressure would have been a highly improbable to exist, that is a scientific claim.

  32. Sal, all you have to do is produce an irreducible system that has a history that includes a strongly detrimental intermediate stage. Behe tried and failed. Malaria is now double drug resistant.

    Only an IDiot would conclude that a blocked path means there are no other paths. An IDiot or a dishonest person.

  33. If I argue that intelligent design cannot account for evolutionary convergence, is that a scientific claim?

  34. Note that “convergence” is ambiguous. If by convergence I mean that two species converge in all respects, that is profoundly unlikely under any evolutionary model. Note that this would mean that they would converge not only in some set of body shape characters (like shark and dolphin), but in their kidney structure and function, brain microanatomy, ribosomal RNA structure, and so on, all at the same time.

    But if I just mean that their external body shapes get similar, or just mean that their color patterns get similar, that is no problem at all. Shark and dolphin may have generally similar body shape, but their reproductive physiology is very different (uterus and lactation for example) and their behavior is very different. Not to mention their ribosomes and their kidney structures.

    Throwing around the word “convergence” without discussing this issue is useless.

  35. stcordova: If you all are claiming ID literature asserts statistical hypotheses, then you are saying they are asserting scientific hypotheses. Is that right?

    Only if it is testable.

    stcordova: If an IDists argues random process cannot account for evolutionary convergence, that is a scientific claim.

    Not as such. It would depend on the argument – on what the ID is presenting as falsifying evidence.

    stcordova: If an IDists further asserts, for evolutionary convergence to happen, the selection pressures would have to be non-random as well with respect to some outcome like say the evolution of flight ( flight in bats, in birds, and in insects), or the convergence of features in archaea and eukarya. That is a scientific claim.

    No, it would be an assertion. To be a scientific claim it would have to be backed up with actual evidence.

    stcordova: If an IDist asserts the hierarchical patterns in molecular data between species is not consistent with random process acting on radiation from a common ancestor but would require constraints above and beyond those of simple inheritance — that is a scientific claim.

    They would, as with your other examples, have to back it up with evidence. In this case, of course we do have evidence – that there are some horizontal transfer mechanisms as well as vertical. And when scientists discovered this, they went about finding out the mechanisms. And found some. Oddly enough, far from falsifying common descent, those traces of horizontal transfer leave nice trackers that allow us to refine the mapping of lineages far more precisely, because once a stretch of DNA is inserted into an individual’s germline DNA by horizontal means (a virus, for instance), it is thenceforth inherited longitudinally, and so that individual’s descendents have a neat tracking device thereafter.

    stcordova: If an IDist asserts chemical and physiological considerations suggests selection would select against certain intermediates (such as in Irreducible systems), therefore if selection happens, the selection pressure would have been a highly improbable to exist, that is a scientific claim.

    No, it’s a speculation, falsified by actual experimental data.

  36. Probably worth pointing out that t-tests and F tests/ANOVA are all basically the same model, as is linear regression and Pearson’s correlation.

  37. Part of my confusion, and I don’t have the example at hand was from t-test and anova giving different ‘significance’ (probably not the right word) numbers on the same data.

  38. Elizabeth:
    Shouldn’t do.

    Should be identical.And the F value should be the square of the t value.

    I think this is for the case of two groups. The tests would then yield the same P value if the test was two-sided. That is what the F test tests — in the two-sample case it is inherently a two-sided test. But if the t-test is one-sided, it is not the same as the F test.

  39. Need to think about that…

    I agree that the F distribution only has one tail. Not sure it makes it a different test though – wouldn’t you just move the alpha criterion to .1?

Leave a Reply