Should scientists be legally accountable for deceiving the public?

Well, should scientists be legally liable for deceiving the public and manipulating the evidence to support their OWN brliefs based on untrue claims and unsupported by scientific evidence?

 

175 thoughts on “Should scientists be legally accountable for deceiving the public?

  1. Allan Miller,

    No it doesn’t. What I’m more interested in, though, is your assertion that scientists have somehow deceived the public on UCD.

    Where did I make this assertion?

  2. John Harshman,

    Second, note that Theobald is comparing the probabilities of various hypotheses given the data. In other words, Theobald’s conclusion is that universal common ancestry fits the evidence 10^2860 times better than the best competing hypothesis.

    How did Theobald come up with this number? Why is this number so incredibly large?

  3. colewd,

    Where did I make this assertion?

    OK, I overstated it.

    J-Mac: I believe, that there has to be accountability set for science that is not really science, but it is presented as such….

    colewd: I have read several papers […] The only issue I have seen is that universal common descent is treated as an a priori assumption […]

    You evidently think something is amiss in the presentation of UCD, given the thread title and the statements made.

  4. colewd:
    John Harshman,
    How did Theobald come up with this number?Why is this number so incredibly large?

    OK, I’ll try one more time. Promise to think about it this time? The number is the difference in likelihoods between one tree and three trees. The likelihood is the conditional probability of observing the data (the actual protein sequences of the species examined) given the tree. The difference is so large because one tree explains the protein sequence data that much better than three trees do. The probabilities arise from a reversible model of protein sequence transformation. There is a particular probability, given amino acid X at one end of a branch, of observing amino acid Y at the other end. The likelihood of a tree is the product of all these probabilities over all possible states at each internal node of the tree. The product of a whole bunch of tiny numbers is very likely to be a very, very tiny number. This is not the probability of abiogenesis (or even of evolution), but the probability of evolution following the path that leads to exactly the observed sequences. Evolution could have led to a huge number of other sequences. Does that explain your misuse of the numbers?

  5. John Harshman,

    . The difference is so large because one tree explains the protein sequence data that much better than three trees do.

    What is the probability of one tree occurring?

  6. colewd: What is the probability of one tree occurring?

    Sigh. Theobald doesn’t calculate any such probability. The paper has nothing to do with any such probability. I have no idea how you would calculate any such probability. But it’s the hypothesis that maximizes the probability of observing the data, which seems pretty good to me.

  7. John Harshman,

    Sigh. Theobald doesn’t calculate any such probability. The paper has nothing to do with any such probability. I have no idea how you would calculate any such probability. But it’s the hypothesis that maximizes the probability of observing the data, which seems pretty good to me.

    If I want to calculate the probability difference between two events, don’t I need to know the probability of each event?

  8. colewd: If I want to calculate the probability difference between two events, don’t I need to know the probability of each event?

    Yes indeed. And here Theobald calculates the difference between the probability that the observed data would result given a single tree and the probability that the observed data would result given three trees. You have consistently misunderstood what the probabilities (conditional probabilities or likelihoods) that Theobald computes actually are. As I have tried many, many times to tell you. In vain.

  9. Bill,

    See if you can understand this analogous example.

    Suppose there are two decks of cards, A and B. Deck A is a standard 52-card deck. Deck B is a standard deck from which all cards except the 4’s and the 5’s have been removed.

    Now suppose that on Saturday morning, Griselda will do exactly one of the following three things. You have no idea which she will choose or how likely she is to choose each one:

    1) deal five cards from deck A and place them on the kitchen table; or
    2) deal five cards from deck B and place them on the kitchen table; or
    3) deal no cards and go to the farmers’ market.

    No one else has access to the cards or the kitchen.

    What is the probability that she will choose (1)? You don’t know.
    What is the probability that she will choose (2)? You don’t know.
    What is the probability that she will choose (3)? You don’t know.

    Late Saturday morning you enter the kitchen and see the following cards on the kitchen table:

    4♠ 4♦ 4♥ 5♥ 4♣

    Given those cards, how much more likely is it that they came from deck B than from deck A, all else being equal? That you can compute.

  10. keiths,

    Note, by the way, that even given deck 2, the hand dealt is fairly unlikely, even more so if we count 4 4 4 5 4 as different from 4 5 4 4 4, as we would when considering protein sequences. I get P=.014. Clearly this means that deck 2 is highly unlikely, right? (Hint: wrong.)

    Oh, I forgot to consider the suits. Much lower probability if we consider the suits: P=.00014. Now, given deck 1, P=.000000003. Can you do the likelihood ratio yourself, Bill?

  11. keiths,

    The answer is: you’re assuming Griselda deals cards in her kitchen on any particular Saturday morning. The probability of such event is 1/10^77, therefore sweet jeebuz

  12. John,

    Now you’re just trying to confuse poor Bill. 🙂

    ETA: I see you’ve added the deck A probability. That should make it better.

  13. keiths:
    John,

    Now you’re just trying to confuse poor Bill. 🙂

    ETA:I see you’ve added the deck A probability.That should make it better.

    I should have said “excellent example”. It has the necessary feature that the likelihood of observing the data given either deck is very low, which has nothing to do with the probability that the deck was #2. And in fact we don’t know the prior probability that the deck was #2, which would be necessary to compute what everyone really wants to know, the probability of the deck given the hand.

  14. John Harshman,

    Yes indeed. And here Theobald calculates the difference between the probability that the observed data would result given a single tree and the probability that the observed data would result given three trees.

    So then the probability of a single tree should be known. Do you think this probability is less than 10^100? This seems unlikely to me given the differential probability is so large greater than 10^2800.

  15. colewd:
    John Harshman,

    So then the probability of a single tree should be known.Do you think this probability is less than 10^100?This seems unlikely to me given the differential probability is so large greater than 10^2800.

    What you claim doesn’t follow from anything in the paper. The probability of a single tree does not appear in the paper. It’s impossible to calculate that probability. Please just accept that you have no idea what you’re talking about and either give up this line of argument or read the paper carefully enough to understand what it says. Once again: the paper calculates the probability of observing the data given a single tree. This is not the same as the probability of there being a single tree given the data. Look at the example keiths gave you and think about it carefully.

    I should also note that no probability can possibly be 10^100, as a probability must be less than or equal to 1.

  16. keiths,

    See if you can understand this analogous example.

    Lets see if you can make an argument that it is indeed analogous.

    I would expect this from a supernatural keiths 🙂

  17. John Harshman,

    The probability of a single tree does not appear in the paper. It’s impossible to calculate that probability.

    Again. If you don’t know the probability of each event, how do you calculate the differential probability?

    Are you making the argument that the paper is nonsense because he pulled his calculations out of his ass since a key element of the equation is impossible to calculate?

  18. colewd:
    John Harshman,

    Again.If you don’t know the probability of each event, how do you calculate the differential probability?

    Are you making the argument that the paper is nonsense because he pulled his calculations out of his ass since a key element of the equation is impossible to calculate?

    You know, if you can’t make sense of that paper is discussing, you have no business discussing it or science. No one can stop you from doing so, but it’s clear that you don’t get even the most basic statistical issues.

    Surely you could do something to boost your reading and science comprehension. Why don’t you?

    Glen Davidson

  19. colewd:
    John Harshman,

    Again.If you don’t know the probability of each event, how do you calculate the differential probability?

    Are you making the argument that the paper is nonsense because he pulled his calculations out of his ass since a key element of the equation is impossible to calculate?

    No, I’m making the argument that you have no clue what the paper says. Once more: Theobald calculates the probability of the data given various hypothesized trees, and then calculates the difference between those probabilities as a test of which hypothesis, one tree or three trees, to prefer.

    This is not the prior probability of the tree, which you would need to compute the probability of the tree given the data. We cannot compute the prior probability of either one tree or three trees. However, in order to prefer three trees to one tree, the prior probability of three trees would have to be greater than that of one tree by an amount that any sane person would consider ridiculous. Are you at all acquainted with Bayes’ Theorem?

  20. Bill,

    Let’s use keiths’ analogy. If you know that before choosing a deck, Griselda rolls a 10-billion-sided die, and uses deck #1 unless she rolls a 1, then you have good reason to conclude, even given the hand you see, that Griselda used deck #1. If, on the other hand, you know that Griselda only flipped a 2-sided coin when choosing a deck, you have good reason to conclude that she used deck #2. Is this in any way sinking in?

  21. John Harshman,

    No, I’m making the argument that you have no clue what the paper says. Once more: Theobald calculates the probability of the data given various hypothesized trees, and then calculates the difference between those probabilities as a test of which hypothesis, one tree or three trees, to prefer.

    Can you cite his raw calculations?

  22. colewd: Can you cite his raw calculations?

    Did you read the paper, out of interest? If so, what do you make of it as-is without raw calculations? Objections?

  23. John Harshman,

    Let’s use keiths’ analogy. If you know that before choosing a deck, Griselda rolls a 10-billion-sided die, and uses deck #1 unless she rolls a 1, then you have good reason to conclude, even given the hand you see, that Griselda used deck #1. If, on the other hand, you know that Griselda only flipped a 2-sided coin when choosing a deck, you have good reason to conclude that she used deck #2. Is this in any way sinking in?

    How is this analogous to Theobald’s work? Let’s start again with the question of why the differential probabilities are so large?

    My hypothesis is that in order to form new complex biological molecules to build the three types of cells, he needs to work through a sequence randomly based on the mechanism he choose. If this is the case then he would have a probability estimate of building a bacteria or the simplest form of life we have observed.

  24. OMagain,

    Did you read the paper, out of interest? If so, what do you make of it as-is without raw calculations? Objections?

    I did, a while ago, but this time it was paywalled so I was only able to review the abstract. If you have a non paywalled link I would appreciate it. As I remember the basis of his calculations were not mentioned. I remember he listed probabilities for various scenario’s.

    All of the numbers were so small, they disqualified his proposed mechanism as a reasonable hypothesis. I don’t think this is terribly unlikely because his paper had little following since it was published other then criticism as far as I can find.

  25. colewd: All of the numbers were so small, they disqualified his proposed mechanism as a reasonable hypothesis.

    How do you suppose it passed review then?

  26. colewd,

    I did, a while ago, but this time it was paywalled so I was only able to review the abstract. If you have a non paywalled link I would appreciate it.

    LMGTFY

  27. colewd:
    OMagain,

    I did, a while ago, but this time it was paywalled so I was only able to review the abstract.If you have a non paywalled link I would appreciate it.As I remember the basis of his calculations were not mentioned.I remember he listed probabilities for various scenario’s.

    All of the numbers were so small, they disqualified his proposed mechanism as a reasonable hypothesis.I don’t think this is terribly unlikely because his paper had little following since it was published other then criticism as far as I can find.

    As various people are trying to explain to you and have explained to you in the past (I’ve done so once over at the Sandwalk, though I don’t think you replied so I don’t know if you saw it), you have a fundamental misunderstanding regarding the nature of likelihood calculations that is causing you to misinterpret Theobald’s results. The likelihood scores do not mean what you think they mean. It’s as simple as that.

    It’s fine to not understand likelihood, but in that case you should probably refrain from discussions which require at least a small amount of knowledge (which is really all that I have) about how likelihood statistics works. Until you wrap your head around the difference between P(data | hypothesis), and P(hypothesis | data), and understand which one of these two things is actually being discussed, you are going to continue to misunderstand Theobald (and any other scenario in which the likelihoods of different models are compared).

    Please do yourself a favor and try to understand what people are saying in this discussion. Keith’s example is a pretty good starting point.

  28. colewd:
    John Harshman,

    Can you cite his raw calculations?

    Not without re-reading the paper. Why? Are you not just throwing anything you can think of at me in an attempt not to have to think about what I said?

    colewd: How is this analogous to Theobald’s work? Let’s start again with the question of why the differential probabilities are so large?

    My hypothesis is that in order to form new complex biological molecules to build the three types of cells, he needs to work through a sequence randomly based on the mechanism he choose. If this is the case then he would have a probability estimate of building a bacteria or the simplest form of life we have observed.

    You hypothesis is incorrect, and in fact bears no relation at all to the paper. There are no “new complex biological molecules” being formed. There are only changes in sequence of existing proteins. The differential probabilities are large because the data are much, much more likely to have evolved on a single tree than on three trees.

    colewd: All of the numbers were so small, they disqualified his proposed mechanism as a reasonable hypothesis. I don’t think this is terribly unlikely because his paper had little following since it was published other then criticism as far as I can find.

    This statement shows you still have no idea at all what the numbers represent. Let me try an even simpler example than keiths’: suppose I shuffle a deck of cards and deal out a bridge hand. If I tell you that the probability of getting this particular hand is 1 in 635013559600, will you tell me that, because the probability is so small, I didn’t really have a deck of cards and didn’t deal from the top of it? That’s the exact equivalent of what you’re doing here. As I have explained many, many, many times, Theobald calculated the probability, given the tree, of seeing the particular protein sequences we see. Astronomically many different sequences would have been possible given the tree, just as astronomically many bridge hands would have been possible given a deck of cards. That probability has nothing to do with the probability of the tree or of the deck of cards.

  29. colewd:

    How is this analogous to Theobald’s work?

    Bill,

    We can see your mistake plain as day. You can’t.

    Since you have no idea where your mistake lies, you are in no position to uncover and correct it. We are.

    If you want to have even a prayer of understanding this stuff, you’re going to have to follow our lead instead of setting off on your own goose chases.

    Do you follow what John is saying in the preceding comment?

  30. Bill,

    Don’t just read the paper.

    Reread the comments as well. If you don’t grasp what we’re saying about the Griselda example, you’re unlikely to see your mistake regarding Theobald.

  31. keiths: This dog ain’t never gonna learn calculus.

    It’s not even calculus. I don’t know calculus, but I still understand the logic of what Theobald did. You don’t even have to be able to do the math, to be able to understand the underlying logic of comparative hypothesis testing. Which makes it all the more amazing that Bill doesn’t get it, and is asking for calculations, when he fails long before the actual calculations become important.

    He fails to understand what is being calculated, and why that is even interesting and important to estimate. He doesn’t understand that we already have the data, and that Theobald is trying to calculate what model most likely produces (as in predicts) a dataset like the one we have.

  32. I’m going to chuck something else into the mix, secure in the knowledge that it won’t confuse Bill further

    The analysis is picking up the best tree for the protein sequences analysed. To the extent that identity implies common descent of the protein gene, that assumption itself rests upon the proteins in common coming from the same system of genetic encoding. That fact alone argues for UCD. 55 of 64 positions are invariant. Only 1 of the variant positions is not a STOP in one species or another – and in that one, the variant is a Methionine, a Start signal. That’s not a random pattern. The code is as near universal as makes no difference. Even if the gene trees favoured 3 lineages over 1, behind those lineages rests a common ancestor of the protein system they all use, on the evidence.

    Theobald’s analysis places a root for the proteins actually synthesised on this common system at an earlier point than it might have if the sequence were protein coding, then divergence, then separate evolution of the actual protein repertoire – ie 3 non-convergent protein-coding gene trees. It would not, indeed could not, fix 3 separate origins. Nor does it preclude separate origins behind that protein-coding LCA. Nodes aren’t origins.

    And no, I don’t know what the prior probability of protein coding was!

  33. Allan,

    And no, I don’t know what the prior probability of protein coding was!

    It must have been rilly, rilly low. Look at the size of those denominators!

  34. keiths,

    If you wrote each zero on a quantum particle, there wouldn’t be enough in the universe to … ach, if only the little buggers would stay still!

  35. John Harshman,

    This statement shows you still have no idea at all what the numbers represent. Let me try an even simpler example than keiths’: suppose I shuffle a deck of cards and deal out a bridge hand. If I tell you that the probability of getting this particular hand is 1 in 635013559600, will you tell me that, because the probability is so small, I didn’t really have a deck of cards and didn’t deal from the top of it? That’s the exact equivalent of what you’re doing here. As I have explained many, many, many times, Theobald calculated the probability, given the tree, of seeing the particular protein sequences we see. Astronomically many different sequences would have been possible given the tree, just as astronomically many bridge hands would have been possible given a deck of cards. That probability has nothing to do with the probability of the tree or of the deck of cards.

    The probability of getting the hand you dealt in this case is one once the hand is dealt.

  36. Allan Miller,

    Theobald’s analysis places a root for the proteins actually synthesised on this common system at an earlier point than it might have if the sequence were protein coding, then divergence, then separate evolution of the actual protein repertoire – ie 3 non-convergent protein-coding gene trees. It would not, indeed could not, fix 3 separate origins. Nor does it preclude separate origins behind that protein-coding LCA. Nodes aren’t origins.

    After reading Koonin’s critique of the paper last night it appears that the null hypothesis to UCD is convergent evolution. From your description above it appears you would agree with this.

  37. colewd,

    After reading Koonin’s critique of the paper last night it appears that the null hypothesis to UCD is convergent evolution.

    Well, it isn’t. Not sure you should be getting into critiques of a paper you don’t understand though. What is your principal reason for favouring Koonin over Theobald?

    From your description above it appears you would agree with this.

    My point has absolutely zip to do with any dispute between Theobald and Koonin on Theobald’s analysis, nor a null hypothesis, a concept I’m doubtful that you grasp.

Leave a Reply