How to calculate amino acid sequence space

Posted on November 28, 2015 by Alan Fox

I see long-time commenter at Uncommon Descent, Mung, in a thread entitled Backwards eye wiring? Lee Spetner comments, asks:

How do you calculate the size of amino acid sequence space?

As this seems somewhat off-topic there, I thought I’d attempt to answer Mung’s question. I’ll try and be brief. The two most fascinating biochemicals are nucleic acids (RNA and DNA) and proteins. Proteins seem ubiquitous in cellular systems; they function as catalysts (enzymes), structural elements (keratin, collagen), signal molecules (hormones, pheromones), binding agents (antibodies). Proteins are linear sequences of amino acids joined by a condensation (called so because a molecule of water is lost) reaction forming a peptide bond. There are twenty-one amino-acids found in eukaryotes and twenty of them are directly represented in the genetic code. The special case is selenocysteine which is coded indirectly and I’ll leave that out of the calculation for the sake of simplicity.

So what number of different amino acid sequences could theoretically exist, given twenty possibilities for each aa in the polymer. I guess we shouldn’t count twenty monomers. For dimers, there are 400 possibilities. For trimers, we have have 8,000 and so on. The general formula for the number of theoretically possible different protein sequences of length $n$ is $20^n$ . So the answer for all possible sequences is the sum of this calculation from $n=2$ to, well, what? There are some very large proteins; titin being the largest known at around 30,000 aa’s. So I guess we should sum at least to that number.

This is a very big number indeed! I leave it as an exercise for the reader to try representing the number that results when taking the upper limit of $n$ as 30,000. 🙂

Now I’ve answered Mung’s question, would he like to enlarge on what it signifies?

ETA categories and remove tautology

ETA 2 correction $20^n$ not $n^{20}$ (hat tip Joe Felsenstein)

182 thoughts on “How to calculate amino acid sequence space”

stcordova on December 3, 2015 at 4:52 pm said:

Allan,

I read the Bernhardt piece, I even said he had responses to the criticisms! But I pointed out those responses had to appeal to more specialized therefore more exceptional conditions.

I also have access (behind the paywall) to Kurland’s piece. Here is an interesting side note about the problem of proteolysis:

Systematic destruction of aberrant proteins by proteasomes and their homologs rids cellular proteomes of potential seeds for aggregation and precipitation, which otherwise would be lethal to cells. Indeed, the maintenance of the high protein densities characteristic of all cellular proteomes has had a profound influence on the evolution of cells 38, 26. Human degenerative diseases that arise from mutations affecting protein folding or from defective proteasome function underscore the impact of proteolytic surveillance systems 39–42. The metabolic consequence of such proteolytic surveillance is that protein degradation is a significant catabolic flow in healthy cells. As much as thirty percent of newly synthesized proteins are destroyed by proteolysis 43, while a steady-state background of protein turnover of about two percent per hour is observed in growing cells 44.

In the open, supposing there is a soup of proteins, how is proteolysis managed? How are the bad targeted and the good preserved? In a cell this is apparently done methodically, lest the whole cell be degraded. This is fine tuning of the machines to work. Fine tuning is an exceptional event.

Additionally, the RNAs seem to be replenished at an equilibrium rate since they have half-lives. This also is fine tuning like walking a tight rope.

One might argue natural selection will make such fine tuning happen, but that only works if the fine tuning isn’t necessary for life, otherwise the organism would be dead and its descendants evolve no more.

I appreciate your responses, but complex life emerging up from non-life doesn’t seem a typical event to me, it seems exceptional.
OMagain on December 3, 2015 at 5:05 pm said:

stcordova: This is fine tuning of the machines to work.

Sigh. You can point at anything at all and say that.

stcordova: One might argue natural selection will make such fine tuning happen, but that only works if the fine tuning isn’t necessary for life, otherwise the organism would be dead and its descendants evolve no more.

Case closed! Unless you are missing something crucial….
OMagain on December 3, 2015 at 5:05 pm said:

stcordova: I appreciate your responses, but complex life emerging up from non-life doesn’t seem a typical event to me, it seems exceptional.

Given that, how much more unlikely is the designer arising from non-designer?
petrushka on December 3, 2015 at 5:14 pm said:

OMagain: Given that, how much more unlikely is the designer arising from non-designer?

How stupid can you get? The designer is defined as existing.
stcordova on December 3, 2015 at 5:22 pm said:

The Designer’s existence may be predicted by physics:

From a 2005 essay in Nature

“The ultimate cause of atheism, Newton asserted, is ‘this notion of bodies having, as it were, a complete, absolute and independent reality in themselves.’”
…
The 1925 discovery of quantum mechanics solved the problem of the Universe’s nature. Bright physicists were again led to believe the unbelievable — this time, that the Universe is mental.
…
According to Sir James Jeans: “the stream of knowledge is heading towards a non-mechanical reality; the Universe begins to look more like a great thought than like a great machine. Mind no longer appears to be an accidental intruder into the realm of matter…we ought rather hail it as the creator and governor of the realm of matter.”
….
The Universe is immaterial — mental and spiritual.

Richard Conn Henry
The Mental Universe: Nature Volume 436

The Quantum Enigma of Consciousness and the Identity of the Designer

additionally

Now we are beginning to see that quantum mechanics might actually exclude any possibility of mind-independent reality….

Why do people cling with such ferocity to belief in a mind-independent reality? It is surely because if there is no such reality, then ultimately (as far as we can know) mind alone exists. And if mind is not a product of real matter, but rather is the creator of the illusion of material reality (which has, in fact, despite the materialists, been known to be the case, since the discovery of quantum mechanics in 1925), then a theistic view of our existence becomes the only rational alternative to solipsism.

Richard Conn Henry and Stephen R. Palmquist
Journal of Scientific Exploration Issue 21-3

So it is plausible the Designer exists, and there are exceptional phenomenon in the universe. Seems to me then, the Designer likely made certain exceptional phenomenon happen.

PS
Ironically I found the above essays while reading up on special relativity and Richard Conn Henry’s works. I walked by Dr. Henry’s office one day while at school. Never met him, but he does hold a respectable academic post.
Rumraket on December 3, 2015 at 5:45 pm said:

stcordova:
Rumraket,
Exceptional means something in many scientific discourses.In the field of mathematics it means sigmas from expectation.

Yeah I know the word means something, but it means something different depending on context. You seem to be merely using it as a synonym for ‘unlikely’.

Unlikely things don’t happen very often. Great, where does that get us? Nowhere. I’d like to see you actually try to answer the previous post I wrote.

stcordova: In regards to physical structures exceptional means components (bricks, atoms, whatever) are arranged in a way that not consistent with physical expectation in the face of uncertainty. i.e.500 fair coins 100% heads, 100dominos standing on their edge

This is all dependent on circumstances. You only get these kinds of open-ended odds if you remove all the constraints. It is not at all clear how this is relevant to the origin of biologcal polymers. We still don’t know much about what kind of environment life originated in, but you seem wanting to conclude there’s no environment that could narrow down the outcomes.

stcordova: RNAs floating around after millions of years even though some have half-lives as short as hours in a cytoplasmic type context, DNAs floating around after millions of years even though they have half-lives of 521 years give or take, homochiral amino acids after millions of years even though they have half lives of hundreds to thousands of years.

What the hell is this “floating around” crap you’re blathering about? Half-life is dependent on circumstances. Leave your DNA in direct summer sunlight and it’s half-life drops to minutes. Get it down around freezing, remove all the oxygen and water and physically enclose it in something so bacteria or radiation can’t normally get to it and it could potentially last indefinitely.

stcordova: The 3 chemical classes I listed (RNA, DNA, amino acids) have half lives that point to expected configurations (namely unsuitability for life).If that is the case, life is not expected in general to spontaneously assemble from the non-living state.For that to happen it would be an exceptional event.We know this experimentally since non-living things stay non-living without outside intervention.No known exceptions.

That’s a great inference. We also know experimentally that god doesn’t create living things. No known exceptions.

stcordova: So for life to emerge from an initial condition of non-life

That depends on which particular “non-life” condition we are talking about. That’s a rather broad term to apply to an almost infinite set of different possible environments. Going from something bordering on physically impossible, I don’t believe life is likely to originate in the core of the sun – to something just extremely extremely unlikely – on the surface of a silicate rock sitting somewhere exposed to direct sunlight on the surface of Mars, or here on Earth. Those would be two “non-life” options that would be among some of the most improbable to bring about life.

But there’s still a lot of unexplored physical circumstances we know very little about. How likely is life to emerge in some of those? Alkaline hydrothermal vents perhaps? I have no idea, because I don’t know how those circumstances might constrain the outcomes of what can happen. It’s the subject of ongoing investigations.

stcordova: it would be an exceptional vs. ordinary event.

I fully agree that the unlikely happens rarely. How unlikely the origin of life is, though, I cannot say. Because I just don’t know. What I do know is that we cannot extrapolate from what little we know into a general rule about all possible unknown and unexplored environments.
Rumraket on December 3, 2015 at 6:00 pm said:

stcordova: How exceptional will something have to before one accepts it is a miracle?

I already asked you this question and you didn’t answer. What totally arbitrary border are you going to draw? Is something having a probability of 1 in 10^60 a miracle? 1 in 10^80 ? 1 in 10^100? Where does it become miraculous, and why is THAT the line and not before or after?

Since you are the one saying there is something like a miracle, perhaps you should, yourself, try to come up with some good way to determine whether something qualifies. On the face of it, merely having low odds of happening seems rather unmiraculous to me.
Rumraket on December 3, 2015 at 6:05 pm said:

stcordova: It is a relatively known fact we shouldn’t expect DNAs to float around forever. RNA is a brittle molecule. If you need special environmental conditions to preserve RNAs then, that leans more toward an exceptional set of circumstances.

Here are some basics, for starters the temperatures have to be in a relatively narrow range relative to the range of temperatures found in the universe. Most of the matter in the universe (say 99%) is in the plasma state, not solid or liquid state. One needs a few basic chemicals to get things going and that in the right sustained temperature range.

So just for starters, life with an DNA-RNA-protein replication cycle needs an exceptional environment. We don’t expect algorithmically controlled metabolisms to emerge in plasmas in space or in stars.

What I said, I think is well within the scientific mainstream, life is an exceptional phenomenon from typical initial conditions. Maybe there is life throughout the universe, just like there are many buildings in a city. Abundance is not evidence against life’s exceptional properties. It could well-be testament of some extra-ordinary sets of events.

The Mt. Everest is starting to look rather miraculous. We only know of one Mt Everest. Only one set of very special conditions, presumably only having happened once in the entire universe, could have led to the exact and particular Mt. Everest we have.
Rumraket on December 3, 2015 at 6:16 pm said:

stcordova: If anything, natural selection selects against more complexity.

No. It selects against higher energetic cost when resources are scarce. When they aren’t, starting from 0 complexity there’s only one way to go: Up.

stcordova: Even Michael Lynch points out complex replicators (aka multicellular animals) don’t compete well against simpler ones.

There’s a world of difference between single and multi-cellular life, and simpler vs more complex but still single-cellular life. And it’s probably even more so if you merely compare self-replicating molecules themselves.

stcordova: That principle is extensible to simple replicators, and the Spiegelmann experiment should be informative that the claim you asserted:

is a little suspect as far as evolving more replicators, and is dubious whether the replication can be self-sustaining since one needs a special environment to make it feasible.

Heat flux across an open pore enables the continuous replication and selection of oligonucleotides towards increasing length.

I should note that heat flux across an open pore is what you find in an alkaline hydrothermal vent.

stcordova: Finally, I think many underestimate the importance of the cytoplasmic chemistry in the origin of life. We barely understand the significance of glycoprotein complexes and the difficulty of evolving interactome cascades. When Ventner made artificial life, he had to use pre-existing cytoplasmic machinery. Ventner nor anyone on the planet is able to synthesize a cytoplasm from scratch and put it in the right interactome state. I think the problem of a DNA-RNA-protein replicator requires much more machinery in place simultaneously than you think.

I think you’re wrong, because otherwise PCR wouldn’t work with cycling temperatures, primers, template, a polymerase and dNTPs.

stcordova: A mere RNA autocatalytic replicator has to have the capacity to evolve. The Spiegelmann monster highlights the problem that simpler replicators are favored in the short term over more complex ones.

See the above reference.
Rumraket on December 3, 2015 at 6:25 pm said:

Frankie:
Rumraket,
Then you don’t have a point. You lose.

That is only an opinion.

If you say so. I guess with you responding like this there’s not much use continuing discussing this with you. You could make a claim and I could respond in the same juvenile and defiant way for the heck of it. “You lose” and “That’s just an opinion”. Then you could quote a reference and then I could just repeat myself “That’s just the authors opinion”. Great, what a silly way to argue.

You win, Joe. I don’t know how to respond to that way of argumentation. I’m sorry for having thought we could, maybe, discuss this like adults. My mistake and I’m sorry, I won’t make it again.
petrushka on December 3, 2015 at 6:27 pm said:

The only sensible response to persistent refusal to learn anything is, “That’s nice, dear. Now go back to the children’s table.”

Not that this applies to anyone here.
Rumraket on December 3, 2015 at 6:29 pm said:

stcordova: I read the Bernhardt piece, I even said he had responses to the criticisms! But I pointed out those responses had to appeal to more specialized therefore more exceptional conditions.

What a terrible objection. Of course the conditions are going to be rather rare, after all life doesn’t spring up all around us. Perhaps those conditions only exist on 1 in every 100.000 planets? Perhaps 1 in every million? Perhaps 1 in every 60 trillion planets? Would those count as miracles then?
Rumraket on December 3, 2015 at 6:33 pm said:

stcordova: I appreciate your responses, but complex life emerging up from non-life doesn’t seem a typical event to me, it seems exceptional.

I agree, complex life emerging up from non-life doesn’t seem like a frequent event to me either, it seems like a rare event. How rare? I don’t know. Maybe only once on every 100.000 planets, at some small interval of time (say, a window of 50 million years) around the early life of the planet? Who knows?

I’ve only been alive for about 35 years. Modern science has only existed for a couple of hundred, and we’ve only got evidence of the existence of planets outside our solar system within the last couple of decades. I thin it is rather premature to declare we must relegate the origin of life to divine intervention.
Allan Miller on December 3, 2015 at 8:09 pm said:

stcordova,

In the open, supposing there is a soup of proteins, how is proteolysis managed?

I don’t suppose a ‘soup’ of anything. Certainly not proteins.

I appreciate your responses, but complex life emerging up from non-life doesn’t seem a typical event to me, it seems exceptional.

One can call it ‘exceptional’ if one likes – it probably doesn’t happen every day, and probably takes a while even when conditions are favourable. And on a planet already furnished with Life, it is probably even harder to get going due to the niche already being occupied.

But you simply lack the information to make a valid, non-hand-wavy probabilistic case. Let’s say the probability of Life arising is once every p planets every y years, or 1 in z in the universe as a whole. If you know p, z or y you might be able to make some progress, plugging in real values for total planets and years available. But you don’t know any of it. All we can say is that, if there is a nonzero value for z and Creation didn’t occur, we are one of the places it happened naturally. If z is low enough, maybe the only place. But we cannot say that z was so low it could not have occurred ‘naturally’, therefore Creation must have occurred, which is where you want the reasoning to take you.

All you do is hunt for problems with any given scenario. You don’t seem genuinely interested in any possible ways in which they may be circumvented by real systems. Keep an open mind; it’s a work in progress.
petrushka on December 3, 2015 at 8:15 pm said:

Allan Miller: Keep an open mind; it’s a work in progress.

I don’t recall ever getting a response for my question, How has ID thinking worked out historically, when applied to problems like planetary orbits, or cause of disease, or cause of volcanos?

I did get banned from UD for asking.
Allan Miller on December 3, 2015 at 8:22 pm said:

stcordova,

Spiegelmann’s monster […]

I forgot to mention, Speigelman’s Monster shrinks because it has no function. It doesn’t actually do anything; it’s a string of RNA that is polymerised by the introduced RNA polymerase. As I said, it is akin to a parasite, which do indeed move toward simplicity. So, serial rounds of selection tend to select for all parts of the string to be lost except for RNA polymerase binding sites. If Spiegelman’s Monster incorporated the RNA polymerase, as part of the string replicated, we would not see a drive to loss of that function, or streamlining without limit. I guarantee it.
Rumraket on December 3, 2015 at 8:24 pm said:

I have to note the rather obvious double standard at work for the people who try to argue for a divine origin of life. They make all these arguments about improbable odds. Then turn around and argue that because the improbable, but strictly speaking still physically possible, doesn’t seem to have happened yet while they were around to witness it, then surely the physically IMpossible must have done it. That makes zero sense.

That whole thing about “we don’t see this happen in nature. No exceptions.” thing. The exact same reasoning applies just as well against divine intervention. It doesn’t seem to happen either.
petrushka on December 3, 2015 at 8:30 pm said:

Divine intervention happened because it is defined as having happened.
Robin on December 3, 2015 at 9:05 pm said:

Rumraket:
I have to note the rather obvious double standard at work for the people who try to argue for a divine origin of life. They make all these arguments about improbable odds. Then turn around and argue that because the improbable, but strictly speaking still physically possible, doesn’t seem to have happened yet while they were around to witness it, then surely the physically IMpossible must have done it. That makes zero sense.

That whole thing about “we don’t see this happen in nature. No exceptions.” thing. The exact same reasoning applies just as well against divine intervention. It doesn’t seem to happen either.

http://tvtropes.org/pmwiki/pmwiki.php/Main/RuleOfCool

Trumps rational…always.
stcordova on December 3, 2015 at 9:25 pm said:

But we cannot say that z was so low it could not have occurred ‘naturally’, therefore Creation must have occurred, which is where you want the reasoning to take you.

I agree we don’t know for sure. Unlike many of my ID colleagues, I’m willing to say my inference has a formal chance of being wrong.

I’ve said, at a personal level, it’s a wager I’m willing to make. What will I really have to lose a million years from now if there is no Designer. You and I won’t even care who was right a million years from now if there is no Designer.

I’ve come here to TSZ to see how many some of those astronomical numbers might possibly be reduced. Maybe at most we can agree, a phenomenon on the order of OOL is not an every day thing on a planet, maybe not even something that happens in a century even given all the Earth’s resources.

Thanks for taking the time to read and respond. Much appreciated.
Rumraket on December 3, 2015 at 9:42 pm said:

stcordova: aybe at most we can agree, a phenomenon on the order of OOL is not an every day thing on a planet, maybe not even something that happens in a century even given all the Earth’s resources.

I definitely agree with that. I don’t think life is just something that happens anywhere in a warm little pond. I would go so far as to say I think the vast majority of planets in the universe have never brought forth life. Our sample size is still extremely small though, so it’s just not possible to extract a frequency currently.
Frankie on December 3, 2015 at 9:48 pm said:

Rumraket:
I have to note the rather obvious double standard at work for the people who try to argue for a divine origin of life. They make all these arguments about improbable odds. Then turn around and argue that because the improbable, but strictly speaking still physically possible, doesn’t seem to have happened yet while they were around to witness it, then surely the physically IMpossible must have done it. That makes zero sense.

That whole thing about “we don’t see this happen in nature. No exceptions.” thing. The exact same reasoning applies just as well against divine intervention. It doesn’t seem to happen either.

The reason for the improbable odds argument is because you don’t have anything else. And truthfully you don’t even deserve a seat at the probabilistic table either.

Design is a top-down approach. We infer design because of the physical evidence. If we actually observed it happening then we wouldn’t have to infer via science. Yours is the bottom-up approach and that is why we ask you for some details.
Rumraket on December 4, 2015 at 8:48 am said:

Frankie: The reason for the improbable odds argument is because you don’t have anything else.

That’s fine, we know that we don’t know how life originated. That is not a very good excuse for trying to put probability estimates on an unknown process and a sample size of 1.

Another issue is, in order for that probability number to mean something you need another probability to compare it to. It might incidentally be correct that life is unfathomably improbable, but how probable is design? I don’t see a way of estimating the relative probability that a designer will design life (in general), or our kind of life specifically.
It could be even lower, we just don’t know. This is not the time to start declaring the matter settled.

Frankie: And truthfully you don’t even deserve a seat at the probabilistic table either.

I don’t know what it means to have a set at the probabilistic table.

Frankie: Design is a top-down approach.

I don’t know what you mean by a top-down approach in this sense. I have seen the term used in science regarding different approaches to trying to solve the origin of life.
In this sense, the bottom-up approach denotes attempts to synthesize the biomolecules of life under prebiotically plausible conditions and then to see how far they can get with that. And the top-down approach is to try and use comparative genetics and phylogenetic reconstructions from known life to see how far back in time we can go and perhaps get clues to how the earliest stages of life might have looked.
It isn’t clear to me in what way ID fits into either of those approaches.

Frankie: We infer design because of the physical evidence.

With regards to the origin of life, what evidence would that be?

Frankie: Yours is the bottom-up approach and that is why we ask you for some details.

I’m not aware of a design having taken place anywhere that did not have some detail to it. If I ask microsoft how they designed windows 10, I’m going to get a lot of explanations about what they were trying to achieve, who wrote the software, for what purpose. What computers they used to write it, talks about iterations of alpha and beta testing and so on. There will be an actual explanation there.

If I ask a car manufacturer how they designed their cars, the same will be true. I will be told about when it was designed, who came up with the ideas, who made the drawings, who fabricated the parts, pieced them togher, test-driving a prototype and how and so on.

You don’t get to just declare “design” and then think your job is done, that would betray a huge double standard. You have to give details too.
Allan Miller on December 4, 2015 at 9:05 am said:

stcordova,

I’ve said, at a personal level, it’s a wager I’m willing to make. What will I really have to lose a million years from now if there is no Designer. You and I won’t even care who was right a million years from now if there is no Designer.

True enough – though you clearly care right now, whereas I don’t. Or rather, I don’t think anything hinges on me getting it wrong, now or in the future. I would like to know the actual state-of-affairs. If there is a designer, I’d like to know about it, for curiosity’s sake rather than eternity’s. I just don’t see the arguments presented as pointing that way.

Maybe at most we can agree, a phenomenon on the order of OOL is not an every day thing on a planet, maybe not even something that happens in a century even given all the Earth’s resources.

I’d be happy to concede much more than that. I’d be happy to say it could readily be something that may happen on a randomly-chosen planet just once in 500 million years, maybe less often still. I also think that, once it has happened, the chances of it happening again drop very sharply indeed, in a non-sterile environment, for both ecological and chemical reasons (mainly down to biogenic free oxygen and other redox species).
Allan Miller on December 4, 2015 at 9:10 am said:

Rumraket,

I don’t know what it means to have a seat at the probabilistic table.

I tried to eat at a quantum restaurant once. Nightmare.
Allan Miller on December 4, 2015 at 11:13 am said:

Allan Miller,

I guarantee it.

Surprised no-one mention Muller’s Ratchet. Phew!
Rumraket on December 4, 2015 at 9:16 pm said:

This video just popped up in my youtube feed regarding exoplanets and the search for “Earthlike” planets:

Planets Everywhere: The 7th Kepler Planet Catalog – Fergal Mullally (SETI Talks.
Mung on December 5, 2015 at 6:08 am said:

And still no evidence that Nick Matzke knows how to calculate the size of amino acid sequence space.
Adapa on December 5, 2015 at 6:17 am said:

Mung:
And still no evidence that Nick Matzke knows how to calculate the size of amino acid sequence space.

Since you’re such a self-proclaimed expert in ID why don’t you tell us how ID does it? Or is making snide remarks all you know?
Allan Miller on December 5, 2015 at 10:11 am said:

Mung,

And still no evidence that Nick Matzke knows how to calculate the size of amino acid sequence space.

Wha? You have reason to think he doesn’t? v^n is not the most difficult of algebraic functions. Even I know it, and I’m shit at maths.
Allan Miller on December 5, 2015 at 10:14 am said:

Frankie,

The reason for the improbable odds argument is because you don’t have anything else.

The improbable odds argument does not come from evolutionistas … if not having anything else is the reason for it, you might prefer to keep that reasoning under your hat.
Alan Fox on December 6, 2015 at 11:33 am said:

Mung:
And still no evidence that Nick Matzke knows how to calculate the size of amino acid sequence space.

I suspect Nick Matzke is very aware of this issue, judging by this comment at Pandas Thumb.

A snippet for Mung:

Protein folds are the emergency backup-backup-backup argument after claims about specified complexity, “evolution can’t produce new information”, “evolution can’t produce new genes”, etc., have been debunked and tacitly abandoned.
Alan Fox on December 6, 2015 at 12:03 pm said:

@ Mung

The links Nick included in that comment are worth having look at, too.

This paper by Michael Buratovich looks at Stephen Meyer’s claims in Darwin’s Doubt

He also suggests the work of Nick Grishin. There’s lots of material available. Here is just one paper of many discussing the evolutionary pathways of proteins.
Mung on December 6, 2015 at 8:43 pm said:

Alan Fox: Protein folds are the emergency backup-backup-backup argument after claims about specified complexity

I can go one better than that, so you’re already behind the curve. 😉

What good is are folded proteins if there’s no active site(s)?
Mung on December 6, 2015 at 9:20 pm said:

ok Alan, I did some work for you and found the initial claim made by Nick and my responses to his claim. Feel free to point out where he ever responded.

That article shows just how bogus it is to take the (length of sequence)^20 as a measure of sequence space.

here

As I said above Nick corrected his initial mistake, but never answered the questions.

Nick:

Correcting a previous post: I said (length of sequence)^20, I meant 20^(length of sequence)

If it is “bogus” to take 20^n as a measure of [amino acid] sequence space, what is the correct measure?
Rumraket on December 7, 2015 at 8:09 am said:

Mung: Mung on December 6, 2015 at 8:43 pm said:
Alan Fox: Protein folds are the emergency backup-backup-backup argument after claims about specified complexity

I can go one better than that, so you’re already behind the curve.

What good is are folded proteins if there’s no active site(s)?

An active site usually denotes an enzyme. There are structural proteins that don’t have “active sites”. They just sit somewhere and form a structure, they don’t catalyze chemical reactions.

There are also proteins known that are chemical catalysts despite not having any “folds” (they’re “wobbly”). Also, even tiny peptides only 2-3 amino acids long are known to have catalytic activity. The dipeptide of glycine is a catalyst for peptide bond formation and there are dipeptides that catalyze RNA oligomerization.
This means that this idea you have that in order for a protein to be functional, it has to be this huge folding entity with an active site, is just wrong. It is a picture you have been sold by reading apologetics instead of the primary literature.
Rumraket on December 7, 2015 at 8:29 am said:

Mung: If it is “bogus” to take 20^n as a measure of [amino acid] sequence space, what is the correct measure?

He’s clearly made a minor mistake there when he says sequence space isn’t 20^n. The size of sequence space is 20^n for a 20 amino acid alphabet. But he’s right about it being bogus to take that number as the space evolution has to search.

Reading the publication he references, he seems to be talking about the ability of evolution to “probe” sequence space without having to actually produce every literal sequence, since (for the reasons well stated in the paper) the space you need to search to get a *hit* for something functioning and/or folding is much, much smaller than the totality of sequence space, because many many dissimilar sequences will still produce similar folds and related functions.

In this sense, it is a mistake to use 20^n as the size of the space evolution has to search, instead of just a small subset of 20^n (which in some cases might be as small as 4^n, possibly even reducible to 3^n or 2^n (polar vs non-polar)). Do you understand?
Allan Miller on December 7, 2015 at 9:27 am said:

Mung,

What good is are folded proteins if there’s no active site(s)?

As Rumraket says, there are non-enzymatic functional proteins. And the active site in enzymes (along with binding sites for cofactors etc) is due to folding. What good is an active site if there is no folding? What would it even mean?

Proteins that don’t do anything at all are not hypothesised to be significant. Leaving the rest.
Allan Miller on December 7, 2015 at 9:32 am said:

Mung,

If it is “bogus” to take 20^n as a measure of [amino acid] sequence space, what is the correct measure?

While I might take issue with the exact expression of his comment, the answer is indeed in the paper. The specific digital sequence is 20^n. But since there is much grouping of property, and complete substitutability at many sites, 20^n is a vast overestimate of functional protein space.

You think Nick doesn’t know biology or summink, based on a brief comment in a blog somewhere?
DNA_Jock on December 7, 2015 at 12:03 pm said:

Rumraket: In this sense, it is a mistake to use 20^n as the size of the space evolution has to search, instead of just a small subset of 20^n (which in some cases might be as small as 4^n, possibly even reducible to 3^n or 2^n (polar vs non-polar)). Do you understand?

Absolutely.
What IDists evidently fail to appreciate in their “ooh-err, really big number!” argument is that the size of the amino acid space is completely and utterly irrelevant — “bogus”, even. What matters is how sparsely populated it is with proteins that have some function, and how interconnected these minimally functional regions are.
Here’s a thought experiment for any honest IDist: imagine there were only 4 amino acids in proteins, or imagine there were forty. Does this make the navigation of sequence space and the evolution of proteins easier, or harder?
Any naive “p = v^-n” calculation produces obviously bogus results.
petrushka on December 7, 2015 at 1:49 pm said:

The only space that is relevant is the sequence that is one character different from the current sequence.
DNA_Jock on December 7, 2015 at 2:20 pm said:

petrushka,

True iff you ignore recombination, which allows one to “hop”.
petrushka on December 7, 2015 at 2:26 pm said:

DNA_Jock:
petrushka,
True iff you ignore recombination, which allows one to “hop”.

There are lots of kinds of mutation. I don’t know whic are most common, but I’d guess simple point mutations provide most of the new sequences.
Allan Miller on December 7, 2015 at 2:46 pm said:

petrushka,

There are lots of kinds of mutation. I don’t know whic are most common, but I’d guess simple point mutations provide most of the new sequences.

It depends how ‘new’ is new. Point mutations explore local space, and provide more in the way of refinement than novelty. Recombinational methods (which include slippage, ectopic pairing and full-fledged migration to another locus) must provide a substantial fraction of the genuinely ‘new’ functions – domain transfer, etc.
petrushka on December 7, 2015 at 6:34 pm said:

Allan Miller:
petrushka,
It depends how ‘new’ is new. Point mutations explore local space, and provide more in the way of refinement than novelty. Recombinational methods (which include slippage, ectopic pairing and full-fledged migration to another locus) must provide a substantial fraction of the genuinely ‘new’ functions – domain transfer, etc.

I’d be curious about the details. Are there new protein domains created that way?

How about regulation? I’m just curious what processes lead to what.
Allan Miller on December 8, 2015 at 1:22 pm said:

petrushka,

I’d be curious about the details.

There are numerous mechanisms – a brief summary They take place during repair and during meioisis (a related process). LGT is relevant too – uptake of foreign gene fragments can cause a significant change in a protein.

Are there new protein domains created that way?

Probably. There is little information available to cellular process as to the start and end points of domains, so it’s not likely to be just a question of swapping discrete bounded domains (though this does have a significant role). There is no particular lower or upper bound on the amount of a gene that can be swapped with another, so chimeric domains could readily arise and then be copied subsequently as a composite block.

Selection would tend to restrict ‘successful’ swaps to a lower level than some (but not all) point-mutational methods, so it’s probably infrequent but highly significant. Simple swapping etc can produce sequences that, on a simplistic ‘Hoyle’ view of space, would be deemed ‘impossible’ after the fact. Especially if the precursors die out. But they can bridge islands.

How about regulation? I’m just curious what processes lead to what.

Yes, again; because the mutational processes have little sight of gene boundaries, they also have little sight of the regulatory regions around them. These can readily be swapped, or copied. Lenski’s cit+ derived from such a copy. The ability of a single regulatory protein to control multiple genes is almost certainly due to copy-paste of a single regulatory region in an ancestor, rather than separate origins of each instance.

Transposons occasionally fall into genes or regulatory regions, with beneficial results. This excites Creationists, who think that the few fragments falling into a gene and affecting its operation give an explanation for the entirety (gt 1 million for some) of the transposon copies in a genome. They find themselves in the cleft stick of denying that evolution can proceed by such gross rearrangements, except when it attacks the junk DNA paradigm.
petrushka on December 8, 2015 at 1:35 pm said:

This seems to support my belief that chemistry is a unique kind of phase space, and that biological evolution cannot be modeled or emulated unless we can model chemistry. This makes design rather difficult, to put it mildly.
Allan Miller on December 8, 2015 at 2:01 pm said:

petrushka,

This seems to support my belief that chemistry is a unique kind of phase space, and that biological evolution cannot be modeled or emulated unless we can model chemistry. This makes design rather difficult, to put it mildly.

Well, certainly the ‘front-loading’ kind. Hard to see how you could front-load a particular transposition, let alone get it through rarity towards fixation if you’re an absentee kind of God.

I don’t think what I’m talking of quite reduces to chemistry, however. There is a space of primary sequences (even that is a gross over-egging of functional space, since there is high redundancy even at the amino acid level, let alone the genetic). But it is highly interconnected.

Folding and multimer binding remain in the chemical level, but you soon depart it when talking of utility for the cell, multicellular interaction and, ultimately, to ecology (which is really the arena where much of selection takes place). But yes, hard to design for. Unintended consequences. Like all the mess we made every time we mixed New and Old World species. Super-Duper Designers can handle it all, of course. Not us Sorcerer’s Apprentices – which weakens the argument from analogy.
Mung on December 10, 2015 at 1:29 am said:

Rumraket: But he’s right about it being bogus to take that number as the space evolution has to search.

But that’s not what he said. That was MY argument.
Mung on December 10, 2015 at 1:36 am said:

Allan Miller: The specific digital sequence is 20^n. But since there is much grouping of property, and complete substitutability at many sites, 20^n is a vast overestimate of functional protein space.

But that was not Nick’s claim. In fact, that was my response to Nick’s claim.

Mung @UD:

Nick, it seems to me that they are confusing the amino acid sequence space, which is the space of all possible amino acid sequences, with sequences which are functional, or which can make a protein, or how much of that space has been searched.

Why am I wrong?