Evolution and Probability

Posted on December 9, 2018 by Mung

Probabilistic thinking is pervasive in evolutionary theory. It’s not a bad thing, just something that needs to be acknowledged and appropriately handled.

Denial

Some go so far as to deny it, but in my experience these people are ideologues. These are critics of ID who complain about the lack of any numbers being attached to the probability arguments of ID proponents, and their denial is perhaps rooted in their fear of a tu quoque.

Where are their own probability calculations?

Incredulity

Another reason for their denial could be that they also love to accuse ID proponents of making arguments from incredulity, while being unwilling to face up to the fact that they are guilty of the same thing. Does evolutionary theory depend on arguments from incredulity? Almost certainly.

Take for example the idea that all extant life shares a common ancestor. It is based upon the idea that it is simply too implausible that life should arise more than once and yet share common features such as the genetic code.

We can be very sure there really is a single concestor of all surviving life forms on this planet. The evidence is that all that have ever been examined share (exactly in most cases, almost exactly in the rest) the same genetic code; and the genetic code is too detailed, in arbitrary aspects of its complexity, to have been invented twice.
Dawkins, Richard. The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution

An argument from incredulity.

Probabilities are Important

The importance of probablity in evolutionary thinking might best be seen in the following text:

If there are versions of the evolution theory that deny slow gradualism, and deny the central role of natural selection, they may be true in particular cases. But they cannot be the whole truth, for they deny the very heart of the evolution theory, which gives it the power to dissolve astronomical improbabilities and explain prodigies of apparent miracle.
Dawkins, Richard. The Blind Watchmaker: Why the Evidence of Evolution Reveals a Universe without Design

Independence of Events

While having said all this, I’d like to focus on the idea that evolutionary events are not independent and that this somehow rescues evolutionary theory from being guilty of appealing to vastly improbable outcomes, aka miracles.

Consider a toss of the dice in a game of craps. The odds of double six is 1/36. Sure, we can roll a single die twice, and the odds of a six on each roll is now only 1/6 vastly more likely to occur by chance (not really). 1/6 x 1/6 is still 1/36. The probabilities are multiplicative because the events are independent events. The fact that if you have two dice and you roll the first die until you get a six and then you keep that (by cumulative selection) and then roll the second die until you get the second six and now you have two sixes doesn’t change the probabilities one whit. Doesn’t that demonstrate that cumulative selection is helpless in reducing probabilities?

Well, you might say, you need to roll BOTH dice until only one of them shows a six and then keep that one and then roll the second die. But what in evolution is analogous to that?

Sure, if you roll two dice trying to roll a six you have a better chance of a six showing on one of the two dice than if you roll just one. It’s like rolling one die twice in an attempt to get a six rather than just once. Of course the probability of the second six would still be 1/6. But why aren’t justified in adding a third die after our first six is rolled so that once again we are trying to get a six from two dice and not just one? And doesn’t this again demonstrate that it is not cumulative selection at all that is responsible for the reduction in probability but rather the number of trials we allot each attempt to roll a six?

Closing

The fundamental question is why aren’t evolutionary events independent and thus multiplicative?

The secondary question is what is the true role of cumulative selection in reduction from the miraculous to the mere appearance of the miraculous?

291 thoughts on “Evolution and Probability”

colewd on December 12, 2018 at 1:11 am said:

DNA_Jock,

Bill, You appear to be conceding that you did not have an argument and that you are pretty much innumerate. Are you really unwilling to put a range on the proportion of random 80-mers that are capable of a consistent fold?
I mean, you could say “Between 10% and one in 100 million”. That would be kinda lame, covering 7 orders of magnitude, but would show some honesty.

I honestly have no idea.

Sorry, I have no clue what you are trying to say here. I have no idea what point is valid. Could you re-phrase?

Just explaining there are cases where conditions don’t matter and we are better steering away from cases where there is little workable empirical data such as the probability of a secondary fold.

if you are going to argue that RM+NS cannot explain the proteins we observe, you are going to have to calculate conditional probabilities.

You can argue this with the range of probabilities in Hunts 2004 PT article combined with observations like the flagellum and we haven’t even moved passed bacteria.

I will gladly move on to a new topic of conversation, if you wish. Recognize that any audience will assume you are conceding defeat. Do you now wish to discuss protein-protein interactions, or regulatory networks, or something else?

Yes, you won this one as you demonstrated that there is a condition where the probably of both events may be less then A x B given event B starts with a fold.

Where your argument that all observed events require conditional probability calculations breaks down is where proteins A and B bind together to perform function C. Protein A must fold successfully and bind to protein B that also must fold properly and bind to protein A and both need to provide an active site for molecule C. This is the case where Axe measured a probability of 10^-77. 30 plus orders of magnitude greater then the available evolutionary resources since first life. Art Hunt claimed in his 2004 paper that this was in the ball park of the range of evidence he had collected.

Per Art Hunt

10^-10 -> 10^-63 (or thereabout): this is the range of estimates of the density of functional sequences in sequence space that can be found in the scientific literature. The caveats given in Section 2 notwithstanding, Axe’s work does not extend or narrow the range.

Proteins observed at the low end of the range (10^-63) pretty much rule out RMNS as the cause of their origin without evidence of interim selectable steps. The combinations of 30 of these proteins to make a flagellum compounds the problem. I concede for the sake of argument that some conditions may exist however the numbers are so large this point is mute. The reason Art did not say Axe was out of the range is his protein was 50AA longer then the proteins in the range.
DNA_Jock on December 12, 2018 at 1:32 am said:

I am sorry Bill. But there are only two sentences in your comment that I can understand, viz:

Yes, you won this one as you demonstrated that there is a condition where the probably of both events may be less then A x B given event B starts with a fold.

No. I was about to demonstrate that the probability of A and B was greater than P(A) x P(B). You really don’t seem to understand probability at all. If anyone else is interested, I will happily explain.

This is the case where Axe measured a probability of 10^-77

He ‘measured’ no such thing. We have made fun of this claim before, calculating the mass of 10^77 anythings.

The rest of your comment is incoherent, I am afraid.
OMagain on December 12, 2018 at 9:02 am said:

colewd,
You keep referring to “a flagellum” or “the flagellum”.

But there is not just one.

colewd: The combinations of 30 of these proteins to make a flagellum compounds the problem.

And yet there seem to be several ways to put a flagellum together. Does that uncompound the problem?
OMagain on December 12, 2018 at 12:30 pm said:

Also if there is not just one flagellum then there are potentially other configurations that will provide motility but which just have not been discovered.

colewd: The combinations of 30 of these proteins to make a flagellum compounds the problem.

If there are many solutions to the problem of motility then perhaps the problem is simpler then you think.

Out of interest, is the combination of 30 of those proteins forming a flagella more or less likely then your birth, given the number of possible ancestors and spermatozoa?
Allan Miller on December 12, 2018 at 12:58 pm said:

The 30 (or so) proteins in the flagellum are related to each other, one should note. Another example where you can’t just ignore inheritance, and assume random picks from sequence space..
phoodoo on December 12, 2018 at 1:59 pm said:

Allan Miller: The 30 (or so) proteins in the flagellum are related to each other, one should note.

What does it mean to say proteins are related to each other?
colewd on December 12, 2018 at 2:19 pm said:

DNA_Jock,

No. I was about to demonstrate that the probability of A and B was greater than P(A) x P(B). You really don’t seem to understand probability at all. If anyone else is interested, I will happily explain.

I stand corrected, thanks. Go ahead and work this up assuming that 50% of AA’s will work in every spot of a 80AA protein in order to get a secondary fold. So given 20^80 total sequence space you have 2^80 sequence space capable of a secondary fold.

He ‘measured’ no such thing. We have made fun of this claim before, calculating the mass of 10^77 anythings.

This is the same comment Art Hunt made at peaceful science yet in his critique of Axe’s paper he again said:

10^-10 -> 10^-63 (or thereabout): this is the range of estimates of the density of functional sequences in sequence space that can be found in the scientific literature. The caveats given in Section 2 notwithstanding, Axe’s work does not extend or narrow the range.

In addition you have Hayashi saying that to find the wild type in his experiment he would need a library of 10^70 for a 100AA segment which got him to question RMNS as a viable mechanism for what he observed.

By extrapolation, we estimated that adaptive walking requires a library size of 10^70 with 35 substitutions to reach comparable fitness. Such a huge search is impractical and implies that evolution of the wild-type phage must have involved not only random substitutions but also other mechanisms, such as homologous recombination.
colewd on December 12, 2018 at 2:22 pm said:

OMagain,

And yet there seem to be several ways to put a flagellum together. Does that uncompound the problem?

It increases the case against RMNS being the mechanism that built it.
OMagain on December 12, 2018 at 2:34 pm said:

colewd: It increases the case against RMNS being the mechanism that built it.

Why?
OMagain on December 12, 2018 at 2:36 pm said:

phoodoo: What does it mean to say proteins are related to each other?

Really? Is doing your own research not a thing any more?
Corneel on December 12, 2018 at 2:37 pm said:

colewd: which got him to question RMNS as a viable mechanism for what he observed.

Where does it say that in the quote? Are your labouring under the misapprehension that recombination cannot result in random genetic mutations perchance?
OMagain on December 12, 2018 at 2:37 pm said:

colewd: It increases the case against RMNS being the mechanism that built it.

What other mechanisms are potential candidates for the origin of the flagellum then?
phoodoo on December 12, 2018 at 2:44 pm said:

OMagain: Really? Is doing your own research not a thing any more?

I am asking Allan what he means by this.

Are some proteins unrelated.
Rumraket on December 12, 2018 at 2:51 pm said:

phoodoo: What does it mean to say proteins are related to each other?

That they share a common ancestor. If today we have two proteins B and C, and they are related, then at some period in the past only one protein A existed, which was then duplicated so two copies of A existed. These then each mutated into the proteins we recognize as proteins B and C today.

Many of the proteins that make up the flagellum are known to exhibit these kinds of relationships as they are clearly derived from common ancestors. If I remember correctly the proteins exported from the Type-III secretory system–like component of the flagellum are among these proteins, such as the filament proteins (the ones that make up the flexible “tail”), and the axle and rod components too.
OMagain on December 12, 2018 at 2:55 pm said:

phoodoo: Are some proteins unrelated.

What’s your opinion?
colewd on December 12, 2018 at 3:36 pm said:

OMagain,

What other mechanisms are potential candidates for the origin of the flagellum.

Shapiro’s NGE
HGT
Rum’s gene duplication
genetic recombination
This a partial list but in no case can we build a model from these that will form the sequences inside an organism that can build this motor every cell time a cell divides.
Rumraket on December 12, 2018 at 3:42 pm said:

colewd: This a partial list but in no case can we build a model from these that will form the sequences inside an organism that can build this motor every cell time a cell divides.

Why not? It seems to me Nick Matzke did that very thing in 2003: Evolution in (Brownian) space: a model for the origin of the bacterial flagellum.

You can say you don’t find this model compelling, and you can point out issues you have with it, but to say that such a model does not even exist is just false.
colewd on December 12, 2018 at 3:42 pm said:

Corneel,

Where does it say that in the quote? Are your labouring under the misapprehension that recombination cannot result in random genetic mutations perchance?

I agree with Hayashi that with the wild type requiring a library of 10^70 that you are looking at a partially deterministic mechanism.
colewd on December 12, 2018 at 3:46 pm said:

Rumraket,

You can say you don’t find this model compelling, and you can point out issues you have with it, but to say that such a model does not even exist is just false.

I am talking about a model that generates DNA sequences that build the motor. That model does not exist.
Rumraket on December 12, 2018 at 3:59 pm said:

colewd: I am talking about a model that generates DNA sequences that build the motor. That model does not exist.

That’s what mutations are Bill. This is implicitly included in Matzke’s model.
dazz on December 12, 2018 at 4:16 pm said:

colewd: I am talking about a model that generates DNA sequences that build the motor. That model does not exist.

Bummer. I’m sure you’ll come up with a model based on the design paradigm, after you’ve found the cure for cancer. Or maybe you can work on both, should be piece of cake for a genius like you
colewd on December 12, 2018 at 4:20 pm said:

Rumraket,

That’s what mutations are Bill. This is implicitly included in Matzke’s model.

There is no model that shows the generation of the FI that builds the motor. Dawkins tried to build one with Weasel but for selection to be effective the target sequence had to be incorporated in the model. Their maybe substance to Eric MH’s claims.
Corneel on December 12, 2018 at 4:22 pm said:

colewd: I agree with Hayashi that with the wild type requiring a library of 10^70 that you are looking at a partially deterministic mechanism.

And two potatoes for miss Whimples. Hooray!

But to return to the subject: I doubt that Hayashi and colleagues question that natural selection on random genetic mutations was the mechanism that resulted in the wildtype sequence. In the paragraph you quoted they clearly say that evolution of the wildtype involved other known mechanisms than the substitutions they have tested in their experimental set up.
Rumraket on December 12, 2018 at 4:22 pm said:

colewd: I agree with Hayashi that with the wild type requiring a library of 10^70

With 35 sustitutions, from the particular protein they started with.

Starting from a completely different location it is entirely plausible it could climb to the top of a hill with an optimum comparable to the wild type. Christ, Bill I think I tried to explain this to you like 20 times.

Where you start has implications for where you can go from there. You can’t walk to Russia from Australia, but you can from Germany, or Tukey.

So to get from Australia to Russia, you need to be able to take “giant steps” (the 35 substitutions) to cross vast oceans, and if the surface of the planet is ginormous oceans with comparatively much smaller continents, and you are walking randomly and blindly, you also need to take a lot (the 10^70 library) of such “giant steps” to hit the correct continent.
Rumraket on December 12, 2018 at 4:27 pm said:

colewd: There is no model that shows the generation of the FI that builds the motor. Dawkins tried to build one with Weasel but for selection to be effective the target sequence had to be incorporated in the model. Their maybe substance to Eric MH’s claims.

I don’t give a crap about this FI nonsense you’re now bringing up after the fact.

The question is whether there is a model that shows how the genes that code for the proteins that make up the flagellum, and it’s construction and regulation, evolved.

However much IF you can then subsequently calculate there “is” in the flagellum genes is completely irrelevant. It’s just useless technobabble masturbation of no value or consequence. And you obviously only now thought about bringing FI into the question when your previous claim was shown false. And what does that even accomplish? Why should anyone care to show how much IF the flagellum’s evolution would generate? Whether you think it’s a lot or not is completely meaningless. It tells us nothing about whether the flagellum could or did evolve.
Neil Rickert on December 12, 2018 at 4:46 pm said:

colewd: I am talking about a model that generates DNA sequences that build the motor. That model does not exist.

That’s the kind of model we should expect from Intelligent Design theory (if the theory worked). It isn’t the kind of model we would expect from Evolutionary theory.

Here’s what’s amusing here. When the ID proponents are not criticizing evolution, they are criticizing materialism and mechanism. But here we see colewd demanding a mechanistic materialistic account.
Rumraket on December 12, 2018 at 4:49 pm said:

colewd: Dawkins tried to build one with Weasel but for selection to be effective the target sequence had to be incorporated in the model.

First of all Dawkins’ Weasel example has nothing to do with FI or the flagellum. Nevertheless I certainly agree that because it has a pre-defined target it is a bad analogy to evolution.

But it’s trivial to make a simulation that doesn’t have pre-specificed targets, where sequences evolve due to their phenotypic effects in an environment.

In such simulations “anything that works” evolves, rather than specific targets. And once such a “thing that works” has evolved, we can sit back after the fact and declare that out of all the possible evolutionary trajectories that COULD have happened, the simulation ended up with some particular specific result, and because that result was incredibly unlikely out of the total space, so it must have been the target directed by a invisible magical pixies pushing electrons around inside the computer wires, otherwise how could it have evolved?

Why isn’t this textbook sharpshooter fallacy obvious to you?

Go to BoxCar2D and watch cars evolve. Each car has a genome, which is actually a sequence of symbols that can be several hundred letters long, and the symbols determine the properties of the car.

Here’s an example of a BoxCar2D car’s genome:

eNqzv1/BUZ0oKODAfpOnflfHfPvnYnPuqd69Z/9j8/JlK2fOsr+ zYuqbtrQ0+wecAgYNDAz279v3Nd5NDXdg5hU3sQDyryeyvXUA0l/mOno2G xvbvz/Fs9VP0s3+g/9+sT1nztqvmvtb57vkEfvP26eog9SvPudfdVxgjQMz+0 WTBwwM/4HA/plA1YcuMRv7b2sbEi8f5GVgYGC1f51Remgd81n774wBBcvS 0oBi7A4C76yMnVfcs/9me+dmiIsLSK+DRAzPw0L22faPr53fNHPmTKA6ZvtX 2+e/zl70wP7tduYSoF6wOtYXm0QniiXav8+Zu3rnzJlgMUnF3yEuuhL2bxP7w o2NTYB6GezPnzvRd+eXnv2rrodth5SUGAo3FDDMsD0NxjeEXzGscytmuMUS zfDKX59hmcchBh3+XwzcvfEMJgKVDNYbAxmOZb5hYNm+iUGuyIfh4eKHIH
cDAExKnSs=

It allows all the letters of the english alphabet and the numbers, and it is 476 symbols in length. The total space for car genomes of a similar length and alphabet is ~10^745

Allow the simulation to run for 30 generations and now copy out the genome sequence of the car you get. Given that the sequence is hundreds of symbols long, the total space of possible car-genomes is unimaginably vast, and the vast vast majority of them are nonfunctional blocks of triangles with ineffectually placed wheels, comparable to sequence space for very large proteins. So how did this particular sequence manage to evolve, when it is such a tiny target in such a vast space?

Well it wasn’t a target at all, and yet that particular sequence still evolved.
DNA_Jock on December 12, 2018 at 5:45 pm said:

colewd: DNA_Jock,

No. I was about to demonstrate that the probability of A and B was greater than P(A) x P(B). You really don’t seem to understand probability at all. If anyone else is interested, I will happily explain.

I stand corrected, thanks. Go ahead and work this up assuming that 50% of AA’s will work in every spot of a 80AA protein in order to get a secondary fold. So given 20^80 total sequence space you have 2^80 sequence space capable of a secondary fold.

You mean to write that 1 in 2^80 can fold, or 2^-80 of the sequence space. The sequence space is not THAT sparsely populated with sequences capable of a secondary fold. I will explain how I know this for certain at the end of this comment.
Instead, I will “Go ahead and work this up assuming that” 1 in a million sequences can fold. That’s a popular number with IDists.
Here’s the calculation.
P(folds) = 10^-6
P(binds ATP) = 10^-11 (observed by Keefe & Szostak)
P(binds ATP | folds) = 10^-5.
Here’s the same calculation, expressed using the numbers of different 80mers instead
#folds = 10^98 (total space = 10^104)
#binds ATP = 10^93 (10^-11 of the total space)
fraction of folders that bind ATP : 10^-5
fraction of folders that bind [your comparable ligand here] = 10^-5

Because (per the IDists) only 1 in a million 80mers can fold, this means that the prevalence of peptides that can bind ATP and bind ubiquitin is not 10^-11 x 10^-11, but rather 10^-11 x 10^-5. That’s a million times more likely than your “let’s pretend they are independent and multiply probabilities” fallacious approach.

He ‘measured’ no such thing. We have made fun of this claim before, calculating the mass of 10^77 anythings.

This is the same comment Art Hunt made at peaceful science yet in his critique of Axe’s paper he again said:

10^-10 -> 10^-63 (or thereabout): this is the range of estimates of the density of functional sequences in sequence space that can be found in the scientific literature. The caveats given in Section 2 notwithstanding, Axe’s work does not extend or narrow the range.

I suspect that Art Hunt is being “charitable” (or extremely conservative), or he is not saying what you think he is saying. Without a proper citation, I cannot tell.

In addition you have Hayashi saying that to find the wild type in his experiment he would need a library of 10^70 for a 100AA segment which got him to question RMNS as a viable mechanism for what he observed.

By extrapolation, we estimated that adaptive walking requires a library size of 10^70 with 35 substitutions to reach comparable fitness. Such a huge search is impractical and implies that evolution of the wild-type phage must have involved not only random substitutions but also other mechanisms, such as homologous recombination.

You misunderstand both the calculation and the conclusion. The calculation is a estimate of how many phage they would have to start with in order to “find” the well of attraction of the wild-type phage, without recombination. See Rumraket’s ‘continents’ explanation. Remember, this is the same paper where the authors reckon that MOST random sequences lead to paths to higher fitness.

As promised, the reason why colewd’s “50% of AA’s will work in every spot of a 80AA protein in order to get a secondary fold” is a dramatic underestimate:
If only 1 in 2^80 can fold, that’s 1 in 10^24 that can fold, per colewd. Never mind “folding”, there are (at 1 in 10^11) a million million times more 80mers than that that can bind ATP!
IDists have a habit of not checking their calculations for sanity.

ETA: for clarity : colewd reckons only 10^80 80mers can fold, yet we know that 10^93 of them can bind ATP…
Allan Miller on December 12, 2018 at 6:40 pm said:

phoodoo: What does it mean to say proteins are related to each other?

It means they share a significant degree of sequence commonality. While this may not prove that they commonly descended from fewer ancestral proteins, the possibility can’t simply be dismissed out of hand. If one says that they are 30 random picks all of which happened (for no reason) to mine the same region of protein space, one is ignoring a possible cause for no better reason than that it makes the numbers big. It is, at best, a strawman view of the evolutionary scenario.
Rumraket on December 12, 2018 at 7:14 pm said:

All proteins for which significant sequence similarity can be shown, can also be neatly arranged into nesting hiearchies. That means the sequence data shows significant levels of nesting hiearchical structure, a fact for which the only sensible explanation is that they share a common genealogical relationship. As in they evolved, rather than was “designed” to be similar.

It is not merely the fact that different protein sequences are similar in their sequences that attest that they are evolved entities and share common descent.

And then there’s the fact that the differences between similar proteins are more frequently of the type that only take single mutations due the the structure of the genetic code.

And then there’s the fact that the mutations that are responsible for coding differences in protein coding genes, are also more frequently the types of mutations that are more frequent due to transition bias.

All this testifies the fact that the differences between different similar proteins got there due to mutation.

No reason to understate the case for the common descent of proteins. 🙂
colewd on December 13, 2018 at 4:41 am said:

DNA_Jock,

ETA: for clarity : colewd reckons only 10^80 80mers can fold, yet we know that 10^93 of them can bind ATP…

Interesting problem: Either more then 50% of AA’s can work per site for a secondary fold or ATP is sticking to non properly folded proteins Szostak’s experiment.
colewd on December 13, 2018 at 4:44 am said:

Rumraket,

So how did this particular sequence manage to evolve, when it is such a tiny target in such a vast space?

What did it evolve to? What is the function of the sequence?
Allan Miller on December 13, 2018 at 6:23 am said:

Rumraket,

No need to understate the case […]

Yes, excellent points!
DNA_Jock on December 13, 2018 at 12:01 pm said:

colewd: Interesting problem: Either more then 50% of AA’s can work per site for a secondary fold or ATP is sticking to non properly folded proteins Szostak’s experiment.

Very good.
I phrased my question thus

of the 20^80 possible 80-mers, what proportion do YOU think will consistently produce a reasonably stable secondary structure? You can give a range if you like.
[emphasis in original]

in order to include all ATP binders that would show up in Keefe’s experiment.
Now, my expectation was that you would go for typical IDist number in the range 10^-6 to 10^-8, given that ATP binders are seen at 10^-11, but no, you plumped for the deranged 10^-24. (Your “at every position, 50% of amino acids lead to no fold, irrespective of the rest of the chain”. It’s frikking nuts!)
Your intuitions about proteins (and about multiplication) are leading you astray. The interesting thing (from my perspective) is that you are unwilling or unable to cross-check your estimates for plausibility.
So to answer your “Interesting problem” either/or, your 50% number is WAAAAY off. Proteins, including the ones in you, are a lot floppier than you imagine.
BruceS on December 13, 2018 at 12:09 pm said:

DNA_Jock: The interesting thing (from my perspective) is that you are unwilling or unable to cross-check your estimates for plausibility.

Thaks for this exchange, which I enjoy and learn from.

I think your quoted sentence summarizes the basic attitude of many IDers: math+intuition are prioritized over biology+empirical investigation.
Rumraket on December 13, 2018 at 2:09 pm said:

colewd: What did it evolve to? What is the function of the sequence?

It specifies all the properties of the car.
dazz on December 13, 2018 at 2:38 pm said:

DNA_Jock:
P(folds) = 10^-6
P(binds ATP) = 10^-11 (observed by Keefe & Szostak)
P(binds ATP | folds) = 10^-5.

Because (per the IDists) only 1 in a million 80mers can fold, this means that the prevalence of peptides that can bind ATP and bind ubiquitin is not 10^-11 x 10^-11, but rather 10^-11 x 10^-5. That’s a million times more likely than your “let’s pretend they are independent and multiply probabilities” fallacious approach.

Just to make sure I understand this, we’re assuming that P(binds ubiquitin) is also 10^-11, hence P(binds ubiquitin | folds) = 10^-5 too, *and also that “binds ATP” and “binds ubiquitin” are independent events.

So P(binds ATP AND binds ubiquitin) = P(binds ATP) x P(binds ubiquitin | folds) because if it binds ATP, we know that it folds so it’s = 10^-11 x 10^-5 instead of P(binds ATP) x P(binds ubiquitin) = 10^-11 x 10^-11

Is that correct?

*ETA: Obviously “binds ATP” and “binds ubiquitin” are NOT independent events, that’s the entire point of this proof, what I meant to say is… something else that I’m trying to figure out

ETA2: Maybe “binds ATP | folds” and “binds ubiquitin | folds” are independent events?
DNA_Jock on December 13, 2018 at 3:31 pm said:

dazz: ETA2: Maybe “binds ATP | folds” and “binds ubiquitin | folds” are independent events?

Yes, your read on my argument is correct, and yes, that’s the approximation here. The point being, the effect of “given it folds” on P(random seq has the function of interest).
dazz on December 13, 2018 at 4:27 pm said:

DNA_Jock,

Thanks, DNA_Jock.
colewd on December 13, 2018 at 5:03 pm said:

Rumraket,

It specifies all the properties of the car.

How does the code translate to the property of the car? Can you give an example?
colewd on December 13, 2018 at 5:16 pm said:

DNA_Jock,

So to answer your “Interesting problem” either/or, your 50% number is WAAAAY off. Proteins, including the ones in you, are a lot floppier than you imagine.

Again you are assuming that Szostak’s proteins are all getting proper secondary folds yet I show you that 50% substitutability which is a very low bar blows away your assumptions. You simply move back to assertion. Why? Are you asserting that ATP binding in Szostak’s experiment requires stable secondary folds? Have you studied the sequence requirements of Alpha helixes and Beta sheets? How do you think hydrophilic amino acids do with these structures?

It appears some rug sweeping of contradictory evidence is happening 🙂
colewd on December 13, 2018 at 5:19 pm said:

BruceS,

I think your quoted sentence summarizes the basic attitude of many IDers: math+intuition are prioritized over biology+empirical investigation.

Or maybe with eyes wide open you may observing contradictory evidence being ignored. Yes, that includes empirical investigation. Have you not observed it is Jock doing the math and making assumptions.
Rumraket on December 13, 2018 at 5:22 pm said:

colewd: How does the code translate to the property of the car? Can you give an example?

This section describes how the car chromosome works in determining the properties of the car:
http://boxcar2d.com/about.html

As you can see, all the symbols in the chromosome correspond to attributes of the elements that make up the car. The size, angles, and position of the triangles, where wheels are located, how big the wheels are, how fast they spin, and so on. There is an incredible, absolutely mindboggling number of possible ways these attributes can be varied. The sequence space for a car chromosome is like the sequence space for very large proteins.

But there are no targets, and yet very unlikely sequences describing high performance cars reliably evolve.
colewd on December 13, 2018 at 6:00 pm said:

DNA_Jock,

Secondary structure prediction: This math looks like something in Eric’s wheel house 🙂

tion methods.
Chou-Fasman method. One of the first approaches for predicting protein secondary structure, due to Chou and Fasman [CF74], uses a combination of statistical and heuristic rules. First, using a set of solved protein structures, “propensities” are calculated for each amino acid ai in each structural conformation sj, by taking the frequency of ai in each structural conformation, and then normalizing by the frequency of this amino acid in all structural conformations. That is, if a residue is drawn at random from the space of protein sequences, and its amino acid identity A and structural class S are considered, propensities arecomputedasPr(A=ai|S=sj)/Pr(A=ai).2 Thesepropensitiescapturethemostbasic concept in predicting protein secondary structure: different amino acids occur preferentially in different secondary structure elements.
Once the propensities are calculated, they are used to categorize each amino acid as either a helix-former, a helix-breaker, or helix-indifferent. Each amino acid is also categorized as either a sheet-former, a sheet-breaker, or sheet-indifferent. For example, as expected, glycine and proline have low helical propensities and are thus categorized as helix-breakers. Then, when a sequence is input, “nucleation sites” are identified as short subsequences with a high-concentration of helix-formers (or sheet-formers). These sites are found with heuristic
2Sometimes propensities are defined by considering the frequency of a particular structural conformation given an amino acid, and normalizing by the frequency of that structural conformation. These two formulations are equivalent since Pr(A = ai|S = sj)/Pr(A = ai) = Pr(S = sj|A = ai)/Pr(S = sj).

Predicting Protein Secondary and Supersecondary Structure 29-7
rules (e.g., “a sequence of six amino acids with at least four helix-formers, and no helix- breakers”), and then extended by adding residues at each end, while maintaining an average propensity greater than some threshold. Finally, overlaps between conflicting predictions are resolved using heuristic rules.
GOR method. The GOR method [GOR78] formalizes the secondary structure prediction problem within an information-theoretic framework. If x and y are any two events, the definition of the information that y carries on the occurrence of event x is [Fan61]:
I(x;y) = log􏰀Pr(x|y)􏰁
Mung on December 13, 2018 at 6:39 pm said:

Rumraket: But there are no targets, and yet very unlikely sequences describing high performance cars reliably evolve.

It’s like a box of legos. Just shake vigorously.
Mung on December 13, 2018 at 6:42 pm said:

Rumraket: It specifies all the properties of the car.

And there are sequences that don’t specify anything? And the properties of cars car are known in advance? And you were not surprised in the least when it produced an airplane?
Mung on December 13, 2018 at 6:47 pm said:

phoodoo: Are some proteins unrelated.

Some people think proteins evolved from other proteins. Perhaps there is a LUCA for all extant proteins.

Like how eyes evolved from other eyes. And the LUCA of all extant eyes is something like a light-sensitive spot.
Rumraket on December 13, 2018 at 7:01 pm said:

Mung: It’s like a box of legos. Just shake vigorously.

It’s not a like a box of legos, but neither is actual biology.
DNA_Jock on December 13, 2018 at 7:02 pm said:

colewd: Again you are assuming that Szostak’s proteins are all getting proper secondary folds yet I show you that 50% substitutability which is a very low bar blows away your assumptions. You simply move back to assertion. Why? Are you asserting that ATP binding in Szostak’s experiment requires stable secondary folds?

I am assuming that Keefe’s ATP binders are able to form a reasonably stable secondary structure. Specifically, we know that 5 – 15% of the synthesized polypeptide was able to bind ATP.

Have you studied the sequence requirements of Alpha helixes and Beta sheets?

Yes, yes I have.

How do you think hydrophilic amino acids do with these structures?

Just fine actually. Amphipathic helices were a research interest of mine in 1988.

It appears some rug sweeping of contradictory evidence is happening

I’ll say 😀

I think it is only fair to warn you, Bill, that if you are going to claim that specific ATP binding does not require any stable secondary structure, then you have utterly destroyed every IDists ‘islands of function’ argument ever made, and your 50% criterion is utterly irrelevant (in addition to being wrong).
You do not seem to have thought this through at all. You have evidence and theory backwards.
You, Bill, invented a rule “In order to fold, an 80mer is restricted to 10 out of 20 possible amino acids at every site”
Thus, per Bill, only 10^80 80mers can fold. My best explanation for this loopy claim is that you don’t understand how multiplication works.
Which is a pretty weird claim, since you are claiming that for every single 80mer that can fold, there are 10 million million more 80mers that can specifically bind ATP. Hey, you are also claiming that for every single 80mer that can fold, there are 100 more 80mers that can specifically bind BOTH ubiquitin and ATP WTF?
You have 100 times more “double needles” in your haystack than you have proteins that can fold.
I point out just how deranged this is, and your response is to defend your rule as “a very low bar”.
Let’s do a demo with the Chou Fasman rules:
We will cut Bill a break and be really conservative by calculating the proportion of random 6mers that will form alpha-helix nucleation sites. [Note, once you have a nucleation site, the criteria for helix extension are LESS stringent].

Chou-Fasman rule is we need 4 or more helix formers, and no helix breakers.
Bill’s rule is 1 in 2^6, i.e. only 1.5% of six-mers will be able to fold a stable fold. Any stable fold.
Since 11/20 (EHWKAMVILQF) are helix formers, and only 4 (NGY and, of course, P) are helix breakers, we calculate that 31% of six-mers will be alpha helix nucleation sites. Gee, only a 20-fold difference, but raise it to the 13th power…
Rumraket on December 13, 2018 at 7:03 pm said:

Mung: And there are sequences that don’t specify anything?

No, the car chromosomes don’t have junk regions. They all specify properties of the car. In that way the car chromosomes are more like the coding regions of protein coding genes, where all the nucleotides constitute codons that specify amino acids.

And the properties of cars car are known in advance?

I don’t know the properties of cars in advance.

And you were not surprised in the least when it produced an airplane?

I have no idea what you’re even trying to suggest here.