(slightly edited version of a comment I made at UD)
Barry Arrington has a rather extraordinary thread at UD right now, ar
It arose from a Sal’s post, here at TSZ, Siding with Mathgrrl on a point,and offering an alternative to CSI v2.0
Below is what I posted in the UD thread.
I don’t think I’ve ever seen a thread generate so much heat with so little actual fundamental disagreement!
Almost everyone (including Sal, Eigenstate, Neil, Shallit, Jerad, and Barry) is correct. It’s just that massive and inadvertent equivocation is going on regarding the word “probability”.
The compressibility thing is irrelevant. Where we all agree is that “special” sequences are vastly outnumbered by “non-special” sequences, however we define “special”, whether it’s the sequence I just generated yesterday in Excel, or highly compressible sequences, or sequences with extreme ratios of H:T, or whatever. It doesn’t matter in what way a sequence is “special” as long as it was either deemed special before you started, or is in a clear class of “special” numbers that anyone would agree was cool. The definition of “special” (the Specification) is not the problem.
The problem is that “probability” under a frequentist interpretation means something different than under a Bayesian interpretation, and we are sliding from frequentist interpretation (“how likely is this event?”) which we start with, to a Bayesian interpretation (“what caused this event?”) , which is what we want, but without noticing that we are doing so.
Under the frequentist interpretation of probability, a probability distribution is simply a normalised frequency distribution – if you toss enough sequences, you can plot the frequency of each sequence, and get a nice histogram which you then normalise by dividing by the total number of observations to generate a “probability distribution”. You can also compute it theoretically, but it still just gives you a normalised frequency distribution albeit a theoretical one. In other words, a frequentist probability distribution, when applied to future events, simply tells you how frequently you can expect to observe that event. It therefore tells you how confident you can be (how probable it is) that that the event will happen on your next try.
The problem is arises when we try to turn frequentist probabilities about future events into a measure of confidence about the cause of a past event. We are asking a frequency probability distribution to do a job it isn’t built for. We are trying to turn a normalised frequency, which tells us the how much confidence we can have of a future event, given some hypothesis, into a measure of confidence in some hypothesis concerning a past event. These are NOT THE SAME THING.
So how do we convert our confidence about whether a future event will occur into a measure of confidence that a past event had a particular cause? To do so, we have to look beyond the reported event itself (the tossing of 500 heads), and include more data.
Sal has told us that the coin was fair. How great is his confidence that the coin is fair? Has Sal used the coin himself many times, and always previously got non-special sequences? If not, perhaps we should not place too much confidence in Sal’s confidence! And even if he tells us he has, do we trust his honesty? Probably, but not absolutely. In fact, is there any way we can be <absolutelysure that Sal tossed a fair coin, fairly? No, there is no way. We can test the coin subsequently; we can subject Sal to a polygraph test; but we have no way of knowing, for sure, a priori, whether Sal tossed a fair coin fairly or not.
So, let’s say I set the prior probability that Sal is not honest, at something really very low (after all, in my experience, he seems to be a decent guy): let’s say, p=.0001. And I put the probability of getting a “special” sequence at something fairly generous – let’s say there are 1000 sequences of 500 coin tosses that I would seriously blink at, making the probability of getting one of them 1000/2^500. I’ll call the observed sequence of heads S, and the hypothesis that Sal was dishonest, D. From Bayes theorem we have:
P(D|S)=[P(S|D)*P(D)]/[ P(S|D)*P(D) + P(T|~D)*P(~D)]
where P(D|S) is what we actually want to know, which is the probability of Sal being Dishonest, given the observed Sequence.
We can set the probability of P(S|D) (i.e. the probability of a Special sequence given the hypothesis that Sal was Dishonest) as 1 (there’s a tiny possibility he meant to be Dishonest, but forgot, and tossed honestly by mistake, but we can discount that for simplicity). We have already set the probability of D (Sal being Dishonest) as .0001. So we have:
Which is, as near as dammit, 1. In other words, despite the very low prior probability of Sal being dishonest, now that we have observed him claiming that he tossed 500 heads with a fair coin, the probability that he was being Dishonest, is now a virtual certainty, even though throwing 500 Heads honestly is perfectly possible, entirely consistent with the Laws of Physics, and, indeed, the Laws of Statistics. Because the parameter (P(T|~D) (the probability of the Target given not-Dishonesty) is so tiny, any realistic evaluation of P(~D) (the probability that Sal was not Dishonest) , however great, is still going to make the term on the denominator, P(T|~W)]P(~W), negligible, and the denominator always only very slightly larger than the numerator. Only if our confidence in Sal’s integrity exceeds 500 bits will we be forced to conclude that the sequence could just or more easily have been Just One Of Those Crazy Things that occasionally happen when a person tosses 500 fair coins honestly.
In other words, the reason we know with near certainty that if we see 500 Heads tossed, the Tosser must have been Dishonest, is simply that Dishonest people are more common (frequent!) than tossing 500 Heads. It’s so obvious, a child can see it, as indeed we all could. It’s just that we don’t notice the intuitive Bayesian reasoning we do to get there – which involves not only computing the prior probability of 500 Heads under the null of Fair Coin, Fairly Tossed, but also the prior probability of Honest Sal. Both of which we can do using Frequentist statistics, because they tell us about the future (hence “prior”). But to get the Posterior (the probability that a past event had one cause rather than another) we need to plug them into Bayes.
The possibly unwelcome implication of this, for any inference about past events, is that when we try to estimate our confidence that a particular past event had a particular cause (whether it is a bacterial flagellum or a sequence of coin-tosses), we cannot simply estimate it from observed frequency distribution of the data. We also need to factor in our degree of confidence in various causal hypotheses.
And that degree of confidence will depend on all kinds of things, including our personal experience, for example, of an unseen Designer altering our lives in apparently meaningful and physical ways (increasing our priors for the existence of Unseen Designers), our confidence in expertise, our confidence in witness reports, our experience of running phylogenetic analyses, or writing evolutionary algorithms. In other words, it’s subjective. That doesn’t mean it isn’t valid, but it does mean that we should be wary (on all sides!) of making over-confident claims based on voodoo statistics in which frequentist predictions are transmogrified into Bayesian inferences without visible priors.