Well, should scientists be legally liable for deceiving the public and manipulating the evidence to support their OWN brliefs based on untrue claims and unsupported by scientific evidence?

Well, should scientists be legally liable for deceiving the public and manipulating the evidence to support their OWN brliefs based on untrue claims and unsupported by scientific evidence?

You must be logged in to post a comment.

colewdAllan Miller,Where did I make this assertion?

colewdJohn Harshman,How did Theobald come up with this number? Why is this number so incredibly large?

Allan Millercolewd,OK, I overstated it.

You evidently think

somethingis amiss in the presentation of UCD, given the thread title and the statements made.John HarshmanOK, I’ll try one more time. Promise to think about it this time? The number is the difference in likelihoods between one tree and three trees. The likelihood is the conditional probability of observing the data (the actual protein sequences of the species examined) given the tree. The difference is so large because one tree explains the protein sequence data that much better than three trees do. The probabilities arise from a reversible model of protein sequence transformation. There is a particular probability, given amino acid X at one end of a branch, of observing amino acid Y at the other end. The likelihood of a tree is the product of all these probabilities over all possible states at each internal node of the tree. The product of a whole bunch of tiny numbers is very likely to be a very, very tiny number. This is not the probability of abiogenesis (or even of evolution), but the probability of evolution following the path that leads to exactly the observed sequences. Evolution could have led to a huge number of other sequences. Does that explain your misuse of the numbers?

colewdJohn Harshman,What is the probability of one tree occurring?

John HarshmanSigh. Theobald doesn’t calculate any such probability. The paper has nothing to do with any such probability. I have no idea how you would calculate any such probability. But it’s the hypothesis that maximizes the probability of observing the data, which seems pretty good to me.

OMagainIs it the sound of one hand clapping?

newtonWhat is the probability of a supernatural tree?

colewdJohn Harshman,If I want to calculate the probability difference between two events, don’t I need to know the probability of each event?

John HarshmanYes indeed. And here Theobald calculates the difference between the probability that the observed data would result given a single tree and the probability that the observed data would result given three trees. You have consistently misunderstood what the probabilities (conditional probabilities or likelihoods) that Theobald computes actually are. As I have tried many, many times to tell you. In vain.

keithsThis dog ain’t never gonna learn calculus.

keithsBill,

See if you can understand this analogous example.

Suppose there are two decks of cards, A and B. Deck A is a standard 52-card deck. Deck B is a standard deck from which all cards except the 4’s and the 5’s have been removed.

Now suppose that on Saturday morning, Griselda will do exactly one of the following three things. You have no idea which she will choose or how likely she is to choose each one:

1) deal five cards from deck A and place them on the kitchen table; or

2) deal five cards from deck B and place them on the kitchen table; or

3) deal no cards and go to the farmers’ market.

No one else has access to the cards or the kitchen.

What is the probability that she will choose (1)? You don’t know.

What is the probability that she will choose (2)? You don’t know.

What is the probability that she will choose (3)? You don’t know.

Late Saturday morning you enter the kitchen and see the following cards on the kitchen table:

Given those cards, how much more likely is it that they came from deck B than from deck A, all else being equal?

Thatyou can compute.John Harshmankeiths,Note, by the way, that even given deck 2, the hand dealt is fairly unlikely, even more so if we count 4 4 4 5 4 as different from 4 5 4 4 4, as we would when considering protein sequences. I get P=.014. Clearly this means that deck 2 is highly unlikely, right? (Hint: wrong.)

Oh, I forgot to consider the suits. Much lower probability if we consider the suits: P=.00014. Now, given deck 1, P=.000000003. Can you do the likelihood ratio yourself, Bill?

dazzkeiths,The answer is: you’re assuming Griselda deals cards in her kitchen on any particular Saturday morning. The probability of such event is 1/10^77, therefore sweet jeebuz

keithsJohn,

Now you’re just

tryingto confuse poor Bill. 🙂ETA: I see you’ve added the deck A probability. That should make it better.

John HarshmanI should have said “excellent example”. It has the necessary feature that the likelihood of observing the data given either deck is very low, which has nothing to do with the probability that the deck was #2. And in fact we don’t know the prior probability that the deck was #2, which would be necessary to compute what everyone really wants to know, the probability of the deck given the hand.

keithsIf Bill

stilldoesn’t get it, I’m throwing in the towel.colewdJohn Harshman,So then the probability of a single tree should be known. Do you think this probability is less than 10^100? This seems unlikely to me given the differential probability is so large greater than 10^2800.

John HarshmanWhat you claim doesn’t follow from anything in the paper. The probability of a single tree does not appear in the paper. It’s impossible to calculate that probability. Please just accept that you have no idea what you’re talking about and either give up this line of argument or read the paper carefully enough to understand what it says. Once again: the paper calculates the probability of observing the data given a single tree. This is not the same as the probability of there being a single tree given the data. Look at the example keiths gave you and think about it carefully.

I should also note that no probability can possibly be 10^100, as a probability must be less than or equal to 1.

colewdkeiths,Lets see if you can make an argument that it is indeed analogous.

I would expect this from a supernatural keiths 🙂

colewdJohn Harshman,Again. If you don’t know the probability of each event, how do you calculate the differential probability?

Are you making the argument that the paper is nonsense because he pulled his calculations out of his ass since a key element of the equation is impossible to calculate?

dazzLMFAO

GlenDavidsonYou know, if you can’t make sense of that paper is discussing, you have no business discussing it or science. No one can stop you from doing so, but it’s clear that you don’t get even the most basic statistical issues.

Surely you could do something to boost your reading and science comprehension. Why don’t you?

Glen Davidson

John HarshmanNo, I’m making the argument that you have no clue what the paper says. Once more: Theobald calculates the probability of the data given various hypothesized trees, and then calculates the difference between those probabilities as a test of which hypothesis, one tree or three trees, to prefer.

This is not the prior probability of the tree, which you would need to compute the probability of the tree given the data. We cannot compute the prior probability of either one tree or three trees. However, in order to prefer three trees to one tree, the prior probability of three trees would have to be greater than that of one tree by an amount that any sane person would consider ridiculous. Are you at all acquainted with Bayes’ Theorem?

John HarshmanBill,

Let’s use keiths’ analogy. If you know that before choosing a deck, Griselda rolls a 10-billion-sided die, and uses deck #1 unless she rolls a 1, then you have good reason to conclude, even given the hand you see, that Griselda used deck #1. If, on the other hand, you know that Griselda only flipped a 2-sided coin when choosing a deck, you have good reason to conclude that she used deck #2. Is this in any way sinking in?

colewdJohn Harshman,Can you cite his raw calculations?

OMagainDid you read the paper, out of interest? If so, what do you make of it as-is without raw calculations? Objections?

colewdJohn Harshman,How is this analogous to Theobald’s work? Let’s start again with the question of why the differential probabilities are so large?

My hypothesis is that in order to form new complex biological molecules to build the three types of cells, he needs to work through a sequence randomly based on the mechanism he choose. If this is the case then he would have a probability estimate of building a bacteria or the simplest form of life we have observed.

colewdOMagain,I did, a while ago, but this time it was paywalled so I was only able to review the abstract. If you have a non paywalled link I would appreciate it. As I remember the basis of his calculations were not mentioned. I remember he listed probabilities for various scenario’s.

All of the numbers were so small, they disqualified his proposed mechanism as a reasonable hypothesis. I don’t think this is terribly unlikely because his paper had little following since it was published other then criticism as far as I can find.

dazzBill learning stuff, graphic description

OMagainHow do you suppose it passed review then?

keithsThis dog ain’t never gonna learn calculus. Or addition.

keithscolewd,

LMGTFYDave CarlsonAs various people are trying to explain to you and have explained to you in the past (I’ve done so once over at the Sandwalk, though I don’t think you replied so I don’t know if you saw it), you have a fundamental misunderstanding regarding the nature of likelihood calculations that is causing you to misinterpret Theobald’s results. The likelihood scores do not mean what you think they mean. It’s as simple as that.

It’s fine to not understand likelihood, but in that case you should probably refrain from discussions which require at least a small amount of knowledge (which is really all that I have) about how likelihood statistics works. Until you wrap your head around the difference between P(data | hypothesis), and P(hypothesis | data), and understand which one of these two things is actually being discussed, you are going to continue to misunderstand Theobald (and any other scenario in which the likelihoods of different models are compared).

Please do yourself a favor and try to understand what people are saying in this discussion. Keith’s example is a pretty good starting point.

John HarshmanNot without re-reading the paper. Why? Are you not just throwing anything you can think of at me in an attempt not to have to think about what I said?

You hypothesis is incorrect, and in fact bears no relation at all to the paper. There are no “new complex biological molecules” being formed. There are only changes in sequence of existing proteins. The differential probabilities are large because the data are much, much more likely to have evolved on a single tree than on three trees.

This statement shows you still have no idea at all what the numbers represent. Let me try an even simpler example than keiths’: suppose I shuffle a deck of cards and deal out a bridge hand. If I tell you that the probability of getting this particular hand is 1 in 635013559600, will you tell me that, because the probability is so small, I didn’t really have a deck of cards and didn’t deal from the top of it? That’s the exact equivalent of what you’re doing here. As I have explained many, many, many times, Theobald calculated the probability, given the tree, of seeing the particular protein sequences we see. Astronomically many different sequences would have been possible given the tree, just as astronomically many bridge hands would have been possible given a deck of cards. That probability has nothing to do with the probability of the tree or of the deck of cards.

keithscolewd:

Bill,

We can see your mistake plain as day. You can’t.

Since you have no idea where your mistake lies, you are in no position to uncover and correct it. We are.

If you want to have even a

prayerof understanding this stuff, you’re going to have to follow our lead instead of setting off on your own goose chases.Do you follow what John is saying in the preceding comment?

colewdkeiths,Thanks. I will look at the paper over the next couple of days.

keithsBill,

Don’t just read the paper.

Reread the comments as well. If you don’t grasp what we’re saying about the Griselda example, you’re unlikely to see your mistake regarding Theobald.

Allan Millercolewd,That has a familiar ring.

RumraketIt’s not even calculus. I don’t know calculus, but I still understand the logic of what Theobald did. You don’t even have to be able to do the math, to be able to understand the underlying logic of comparative hypothesis testing. Which makes it all the more amazing that Bill doesn’t get it, and is asking for calculations, when he fails long before the actual calculations become important.

He fails to understand what is being calculated, and why that is even interesting and important to estimate. He doesn’t understand that we already have the data, and that Theobald is trying to calculate what model most likely produces (as in predicts) a dataset

likethe one we have.Allan MillerI’m going to chuck something else into the mix, secure in the knowledge that it won’t confuse Bill

further…The analysis is picking up the best tree for the protein sequences analysed. To the extent that identity implies common descent of the protein gene, that assumption itself rests upon the proteins in common coming from the same system of genetic encoding. That fact alone argues for UCD. 55 of 64 positions are invariant. Only 1 of the variant positions is not a STOP in one species or another – and in that one, the variant is a Methionine, a Start signal. That’s not a random pattern. The code is as near universal as makes no difference. Even if the gene trees favoured 3 lineages over 1,

behindthose lineages rests a common ancestor of the protein system they all use, on the evidence.Theobald’s analysis places a root for the proteins actually synthesised on this common system at an earlier point than it might have if the sequence were protein coding, then divergence, then separate evolution of the actual protein repertoire – ie 3 non-convergent protein-coding gene trees. It would not, indeed could not, fix 3 separate origins. Nor does it preclude separate origins behind that protein-coding LCA. Nodes aren’t origins.

And no, I don’t know what the prior probability of protein coding was!

keithsAllan,

It must have been rilly, rilly low. Look at the size of those denominators!

Allan Millerkeiths,If you wrote each zero on a quantum particle, there wouldn’t be enough in the universe to … ach, if only the little buggers would stay

still!colewdJohn Harshman,The probability of getting the hand you dealt in this case is one once the hand is dealt.

John HarshmanSigh.

colewdAllan Miller,After reading Koonin’s critique of the paper last night it appears that the null hypothesis to UCD is convergent evolution. From your description above it appears you would agree with this.

Allan MillerChortle!

Allan Millercolewd,Well, it isn’t. Not sure you should be getting into critiques of a paper you don’t understand though. What is your principal reason for favouring Koonin over Theobald?

My point has absolutely zip to do with any dispute between Theobald and Koonin on Theobald’s analysis, nor a null hypothesis, a concept I’m doubtful that you grasp.

colewdAllan Miller,Why in your opinion is convergent evolution so much more difficult i.e. 10^2860 than UCD?

colewdAllan Miller,Reset. What is the null hypothesis?