Axe, EN&W and protein sequence space (again, again, again)

Posted on October 12, 2016 by Rumraket

So I wanted to write a post on the whole “density of functional proteins in amino acid sequence space” because, it seems to me, most ID proponents have some completely unwarranted beliefs about it. Recently it seems the main champion of the idea that evolving new protein functions would be so unlikely as to be practically impossible for evolution, even given several billion years, planet-spanning population sizes and natural selection, is Douglas Axe. He produced a paper in an (actual) peer reviewed journal back in 2004, and ID creationists have been spinning it ever since.

In 2007 a review of Axe 2004 appeared on Panda’s Thumb, by Arthur Hunt: Axe (2004) and the evolution of enzyme function. It recaps what Axe actually did and what he concluded from it, and I will add a bit about how it’s been pushed in ID circles since (see for example how it is pushed here: Imagine: 60 Million Proteins in One Cell Working Together.

This paper is interesting because it relates to the work of Douglas Axe that resulted in a paper in the Journal of Molecular Biology in 2004. Axe answered questions about this paper earlier this year, and also mentioned it in his recent book Undeniable (p. 54). In the paper, Axe estimated the prevalence of sequences that could fold into a functional shape by random combinations. It was already known that the functional space was a small fraction of sequence space, but Axe put a number on it based on his experience with random changes to an enzyme. He estimated that one in 10^74 sequences of 150 amino acids could fold and thereby perform some function — any function.

Notice the last part I highlighted in bold. This is apparently what most ID-creationists believe, because this is what the higher-ups in the ID movement claim that Axe’s research shows. It does not, as I will attempt to demonstrate pretty conclusively.
To still believe that Axe’s number* is an estimate(never mind a correct estimate) of the number of “sequences of 150 amino acids [that] could fold and thereby perform some function — any function” – will be a form of blind faith after you’ve read this post through. Provided you understand it. Which isn’t too much to ask, because it’s really not that complicated.

*(The number seems to fluctuate a bit in the various writings, between 10^64 and 10^90, I’ll just go by 10^77 as it was used back in 2004)

Back to Arthur Hunt on Panda’s Thumb:

Axe (2004) has performed site directed mutagenesis experiments on a 150-residue protein-folding domain within a B-lactamase enzyme. His experimental method improves upon earlier mutagenesis techniques and corrects for several sources of possible estimation error inherent in them. On the basis of these experiments, Axe has estimated the ratio of (a) proteins of typical size (150 residues) that perform a specified function via any folded structure to (b) the whole set of possible amino acids sequences of that size. Based on his experiments, Axe has estimated his ratio to be 1 to 10^77. Thus, the probability of finding a functional protein among the possible amino acid sequences corresponding to a 150-residue protein is similarly 1 in 10^77.

Notice the last sentence, that’s the key claim much of ID-arguments against evolution of new proteins, new enzymes, new protein domains and folds, rests on. The claim that the probability of finding a functional protein 150 amino acids in length is 1 in 10^77. That would of course imply it would take, on average, 10^77 sequences before you found a functional one if you just randomly generated them.

Put another way, say you wanted to find some arbitrary function in the sequence space of 150 amino acid-long proteins, you’d probably have to generate all of those 10^77 sequences before you found it. At least on average. And if you then wanted to find one more function, a different arbitrarily picked one, you’d have to generate another 10^77 sequences.

To put that number into perspective, quick back of the envelope calculation suggests Axe is saying that you could literally fill a sphere the size of a small galaxy, around ten thousand lightyears in diameter, up completely with random protein sequences 150 amino acids in length, and in this unfathomably vast sphere packed densely with random polymers you’d only find A SINGLE FUNCTIONAL SEQUENCE.

By the way, the crap they’re pushing on EN&W now as vjtorley brought to our attention, has become so absurd they’re suggesting functional proteins exist at a rate in sequence space so low that you could fill the entire observable universe with randomly generated proteins – and in this universe-sized sphere of proteins there’d be a good chance you still wouldn’t find a single functional one. C’mon guys?

I honestly think the person who wrote the EN&W article doesn’t understand how exponents work. A number like 10^240 is, well I can’t draw an analogy that would make such a quantity intuitively comprehensible. The whole thing is ridiculous in the extreme. I’m a sucker for a bit of hyperbole I readily concede, but in this particular instance, I have a hard time finding a way to exaggerate how utterly ridiculous their numbers are. Someone over there has lost his/her mind, or wasn’t around when exponents were taught in school.

Anyway, in Axe’s own words from his paper:

Combined with the estimated prevalence of plausible hydropathic patterns (for any fold) and of relevant folds for particular functions, this implies the overall prevalence of sequences performing a specific function by any domain-sized fold may be as low as 1 in 10(77), adding to the body of evidence that functional folds require highly extraordinary sequences.

This seems to imply even Axe himself really believes that is the true density of functional proteins in 150 amino acid sequence space. This is also what your average ID proponent today believes. This is the number they read in books like Darwin’s Doubt, is pushed on EN&W and they believe this number corresponds to the odds of finding anything functional in 150 amino acid-sized sequence space. I don’t see how you can believe this isn’t what Axe, Meyer and colleagues wants them to believe, because they’ve never taken even a single step to correct that misapprehension. Indeed, they’re actively pushing it as the correct way to understand Axe 2004.

My contention will be that this number simply cannot be correct because it is directly contradicted by experiment. In the words of Richard Feynman:
”In general, we look for a new law by the following process. First, we guess it (audience laughter), no, don’t laugh, that’s really true. Then we compute the consequences of the guess, to see what, if this is right, if this law we guess is right, to see what it would imply and then we compare the computation results to nature, or we say compare to experiment or experience, compare it directly with observations to see if it works.
If it disagrees with experiment, it’s wrong. In that simple statement is the key to science. It doesn’t make any difference how beautiful your guess is, it doesn’t matter how smart you are who made the guess, or what his name is… If it disagrees with experiment, it’s wrong. That’s all there is to it.”

Instead of arguing much about the technicalities of whether such an inference can be extracted by Axe’s experiment (though I will argue briefly against that too), I will primarily bring experimental results that contradict it. Because it should be pretty obvious that, regardless of technical and logical arguments, if actual laboratory experiments consistently and massively contradict Axe’s number, “It’s wrong. That’s all there is to it.”

I will give evidence that it is wrong by at least 60 orders of magnitude.

Okay, so the first thing to note is that one can make a single factual statement that immediately throws what ID-creationists believe about Axe’s number into very serious doubt:
1. Axe only tested for one function, not all possible functions.
There, done. At a basic level, this would be enough. I could stop here and you should immediately disbelieve Axe’s conclusion. It cannot possibly be extracted from the experiment he did.

He could have tested for hundreds of additional catalytic functions, protein-DNA/protein-RNA interactions, protein-protein binding, small molecule binding and so on. There are quite literally millions of functions one can test for, besides the single function the enzyme originally had. Yet all he did was test how many mutations it took for the original function to disappear below detectable levels, then try to derive the total sequence space that correspond to the fold and then calculate the proportion of them in turn that retain the one tested function.

Ian Musgrave also pointed this out back on Panda’s Thumb in the comments section with this very informative post:

Ian Musgrave | January 15, 2007 12:40 AM
Katerina Wrote: “Does this mean that if the media included antibiotics besides penicillin, a structure associated with one of those peaks may have chanced to work on them?”
Yes. Axe only looked at ampicillin resistance (a relative of penicillin). It is entirely possible that other functional beta lactamases activities were conserved, or new ones generated. Mutations can indeed generate novel functions, and if you only look at one function, you may miss the big picture.

If you look at the data of Peimbert M, Segovia L. Evolutionary engineering of a beta-Lactamase activity on a D-Ala D-Ala transpeptidase fold. Protein Eng. 2003 Jan;16(1):27-35, where they mutated the D-Ala D-Ala transpeptidase (one of the family of enzymes from which the beta-lactamases first evolved from) and looked only at ampicilin, you would conclude that those mutations had no effect, while missing out on enormous changes in cefotaxamine resistance (and changes in quite a few other natibiotics as well). Even with related substrates, some showed big changes others little, so without a good panel of beta-lactams to check the function of the crippled enzyme, the functional space represented is under sampled.

Speaking of undersampling, it would have been good to look at the breakdown of common substrates of DD-peptidyl ligases and beta lactamases, such as m-[[(phenylacetyl)-glycyl]oxy]benzoic acid and see what the muations did to these activities. Beta lactamases have kept these activities, despite structural changes which have resulted in losing DD-peptidyl ligase activity, if these activities are kept, then the enzyme is more robust than Axe contends (but to be fair this is harder to monitor in the kinds of experiments Axe did)..

Stephen Matheson makes a similar point on this blog back in 2011 too: Exploring the protein universe: a response to Doug Axe

6. In my opinion, Axe significantly overstates his findings on the topic of “function.” So for example, in both the 2004 paper and the new BIO-Complexity paper, the experiments involve measuring a single function for each enzyme. It seems to me (and I could be wrong) that when the authors see that a particular variant (mutant) of the protein stops performing that one function, they conclude that the protein “has no function.” (In the BIO-Complexity paper, it’s two proteins and two functions, but the point is the same.) But of course we don’t know that, and evolutionary explanations would propose that new functions frequently arise when an enzyme has more than one function (or is broad-based in its function, or is modular in its structure and function).

How in the hell can Axe then conclude that his results extend to all functions for all sequences of that size? He can’t and it’s obvious. Yet that is what ID proponents now, today, believe his number represents. That is what Stephen Meyer writes in Darwin’s Doubt that Axe has shown. This is what they’re pushing on EN&W.

It should be obvious that you can’t just take an enzyme, throw mutations into it and then when it stops working, say “this means there’s no other function out there”. That obviously doesn’t follow.
For those reasons, Axe’s number just can’t be trusted as an estimation of that. It is a number that represents an estimate of a single very particular thing, which is the frequency of sequences 150 amino acids in length, that adopt the known Beta-lactamase fold and catalyze break down of ampicillin. For that reason it is NOT an estimate of the frequency of all functional proteins in all of 150 amino acid sequence space.
Arthur Hunt points out other issues that also suggest that even for the particular thing the number is an estimate of, it’s probably still an underestimate(meaning the frequency is estimated too low). I don’t need that for what I will show, however.

The experimental evidence will consist of a sample of papers that correspond to three types of experiments. I’ll just call them tybes A, B and C.
Type A experiments will be references to papers wherein the researchers generated either totally random protein sequences of lengths over 50 amino acids with no bias for structure, function or folding and then subsequently screened large libraries for functions they decided on.
Or they will be experiments where the researchers randomly assembled larger proteins from smaller fragments of already functional proteins in a type of fragment-shuffling and recombination, which is incidentally how the evolutionary process is thought to have actually made most of the proteins in extant life (from insertions and duplications of smaller parts of already existing proteins into others).

Tybe B will be references to papers wherein the researchers generated protein sequences of various lengths, but this time with the random sequences constituting only a smaller portion of a larger fixed structure. In these experiments the “sequence neighborhood” of these fixed folded structures are then probed for functions by generating variations of these structures with mutations in a subset of the proteins. These types of experiments demonstrate that if you have a functional protein already, there’s usually another function close by in sequence space, meaning most of the functionality is arranged in interconnected networks that can be navigated by relatively few substitutions. Which basically makes Axe’s claim doubly absurd, since there’s probably another function nearby.

Type C will be references to papers that are similar to types A and B, but the sizes of the proteins are less that 50 amino acids, typically in the 7-22 amino acid range. These types of experiments demonstrate that cellular functions don’t need to be carried out by huge, multi-domain proteins and that such large extant proteins could be generated by the stepwise accretion of smaller fragments over the course of evolution (and probably even a random process at or around the origin of life).

Type A experiments:

First I will note that subsequent experiments by the Szostak lab repeated the results (from the Keefe & Szostak paper I have mentioned before around here several times) for a different polymer and different conditions, among them they also selected and evolved an ATP binding RNA structure from a pool of randomized polymers, under “physiological pH and salt conditions”.

The numbers were about the same as the experiment with random protein polymers, approximately 1 in 10^11 had the desired function. That’s 66 orders of magnitude more frequent than what IDcreationist believe Axe 2004 has shown.

But it gets better: Another lab subsequently decided to elucidate the crystal structure of the originally evolved ATP binding protein and found out, by accident, that it actually catalyzes ATP hydrolysis to ADP. The Szostak lab never tested for ATP hydrolysis so had no idea their randomly generated protein sequence, originally selected just for ATP binding, also had catalytic activity. It was an enzyme: (How would you know if you don’t test for it? Think about it)
A synthetic protein selected for ligand binding affinity mediates ATP hydrolysis.

Perhaps even more striking than the fold was the observation that each of these protein crystal structures contained electron density that was consistent with the presence of ADP in the ligand-binding pocket even though 1 molar equiv of ATP was present in the crystallization liquor (11, 12). Since the half-life of ATP for spontaneous hydrolysis is no shorter than 12 months (13) and protein crystals were obtained after 3 days of growth, it was unclear how ADP became bound to the protein. We reasoned that the presence of ADP in the protein crystal structure could be explained by one of three possible scenarios:
(i) the -phosphate of the ATP molecule is highly disordered and therefore the third phosphate of ATP is simply not visible in any of the electron density maps;
(ii) the Family B protein binds preferentially to ADP, which is present as a minor contaminant from the disproportionation of ATP to ADP and adenosine tetraphosphate (14); or
(iii) the Family B protein binds the ATP ligand in such a way that it is able to lower the activation barrier (.5 kcal mol1) and allow ATP hydrolysis to proceed at a rate that is faster than the uncatalyzed reaction.
Intrigued by the possibility that a protein selected entirely on the basis of its ability to bind ATP might also function as a primitive catalyst, we initiated a series of structural and biochemical experiments designed to explain the presence of ADP in our protein crystal structures.

They go on to find that yes, in fact, it’s an enzyme that catalyzes ATP hydrolysis. This is so funny. This team had not intended to test for catalytic function either, they were interested in the structure of the protein, and quite by accident they found out the protein was catalyzing the conversion of ATP to ADP.

How would you know whether your protein has some unknown function without testing for it? You can’t. You simply can’t. Axe has literally no idea whether his mutated Beta-Lactamases have lost any and all possible functions. Because he never bothered testing for anything but the single function the enzyme was known to have to begin with.

Subsequent experiments designed to elucidate whether such ATP binding and catalytic enzymes also existed in sequence space with smaller amino acid alphabets, were also successful: ATP selection in a random peptide library consisting of prebiotic amino acids.

Abstract
Based upon many theoretical findings on protein evolution, we proposed a ligand-selection model for the origin of proteins, in which the most ancient proteins originated from ATP selection in a pool of random peptides. To test this ligand-selection model, we constructed a random peptide library consisting of 15 types of prebiotic amino acids and then used cDNA display to perform six rounds of in vitro selection with ATP. By means of next-generation sequencing, the most prevalent sequence was defined. Biochemical and biophysical characterization of the selected peptide showed that it was stable and foldable and had ATP-hydrolysis activity as well.

These proteins were 137 amino acids in length (synthesized by joining together four random-sequence fragments of 25 amino acids with some spacer amino acids between, owing to the method of synthesis):

A DNA library with 108 sequential random codons was constructed by quadruplicating a DNA cassette consisting of 25 contiguous random VNM codons which was synthesized by standard phosphoramidite chemistry. The full length random-sequence library was assembled through digestion and ligation of DNA cassettes, as described previously [23,24]. We cloned and sequenced 30 individual library members from the final products. The results showed that each sequence was different. Given that each random DNA sequence was unique and the complexity of the library was essentially determined by the total number of molecules generated in the final ligation step [33,34], we can infer that the diversity of a 5 nmol DNA library is approximately 3 x 10^13. The random region (108 residues) of the resultant peptides (137 residues) is sufficiently long to form structured protein domains.

In this experiment they find many more ATP binding proteins in those 3×10^13 random sequences. The exact number isn’t stated in the paper, but they end up picking out 30 individual sequences.
That’s a function more prevalent in amino acid sequence space by at least 64 orders of magnitude compared to Axe’s estimate. And they only tested for one function. There could be others not tested for.

Here’s another: Evolvability of random polypeptides through functional selection within a small library.

A directed evolution with phage-displayed random polypeptides of about 140 amino acid residues was followed until the sixth generation under a selection based on affinity to a transition state analog for an esterase reaction. The experimental design deliberately limits the observation to only 10 clones per generation. The first generation consists of three soluble random polypeptides and seven arbitrarily chosen clones from a previously constructed library. The clone showing the highest affinity in a generation was selected and subjected to random mutagenesis to generate variants for the next generation. Even within only 10 arbitrarily chosen polypeptides in each of the generations, there are enough variants in accord to capacity of binding affinity. In addition, the binding capacity of the selected polypeptides showed a gradual continuous increase over the generation. Furthermore, the purified selected random polypeptides exhibited a gradual but significant increase in esterase activity. The ease of the functional development within a small sequence variety implies that enzyme evolution is prompted even within a small population of random polypeptides. …

How many random sequences need to be searched to permit the observance of evolving polypeptides towards acquiring higher functions? Is it feasible to decipher evolution from observing only a few variants per generation out of the vast number of available sequences, especially at the primitive stage? To address these issues, we deliberately limit our evolutionary studies on populations of arbitrarily chosen mutant random polypeptides to a maximum of 10 for each generation, although a phage display system with selection using transition state analogs (TSAs) has the capability of yielding highly diversified mutants of 108–9 rendering it an efficient tool for generating a protein catalyst (Patten et al., 1996; Fujii et al., 1998; Forrer et al., 1999). Here, we show that even examining only 10 arbitrarily chosen polypeptides with random sequences per generation could guarantee the observance of evolution, in which there is a continuous gradual increase in a function of the polypeptides via iterating mutation and selection.

Here the frequency of enzymes with Esterase functions is apparently ONE IN 1000 random sequences 140 amino acids in length (the initial library is 1000 members, the subsequent selection steps are from populations of only 10). Let that sink in. 1 in 1000. Not 1 in 10 to the seventy-seventh power. Just one in one thousand. That means Axe’s 1in 10^77 estimate is wrong by 74 orders of magnitude.

Next up we have this: Effective selection system for experimental evolution of random polypeptides towards DNA-binding protein.

Abstract
An experimental evolution with selection based on binding affinity to DNA was carried out on a library of phage-displayed random polypeptides of about 140 amino acid residues. First, we constructed a system to artificially evolve phage-displayed random polypeptides toward binding to a target DNA containing a restriction enzyme site, in which random polypeptides capable of binding the DNA were recovered as complexes with the target DNA by digestion with the restriction enzyme. The experimental evolution cycle, including the above selection system and random mutagenesis for generating the next mutant library, was repeated until the fourth generation. The ability to bind to the DNA was enhanced per generation. In the fourth generation, convergence of the selected clones to a dominant sequence was observed. These results indicate that the newly constructed selection system is effective for exploring the evolvability of random polypeptides towards DNA-binding proteins.

In this paper they find a function at a rate of about 1 in 10^6 (a million) random amino acid sequences of 140 length. That’s 71 orders of magnitude more prevalent than the IDcreationist belief about what Axe 2004 showed.

In searching for publications for this post I also discovered the second type A experiment, where the method was randomized recombination of smaller fragments of proteins(which can also be generated by alternative splicing): In vitro selection of GTP-binding proteins by block shuffling of estrogen-receptor fragments

Previously we have developed random multi-recombinant PCR (RM-PCR), which enables the combination of different DNA fragments without homologous sequences [9] and [10]. We have demonstrated that artificial alternative-splicing libraries of the human estrogen receptor α ligand binding domain (hERα LBD) [11] can be successfully constructed by RM-PCR (Fig. 1A). We were interested in whether affinity binders for a ligand structurally unrelated to estrogen could be obtained from such a library. Here we describe the isolation of GTP-binding proteins from an artificial alternative-splicing library of hERα LBD, using an mRNA display technique [12] and [13] combined with a GTP-affinity selection procedure.

Here they generate about 600 total variants to begin with, by recombining protein fragments by shuffling around exons and eventually select two GFP binding proteins from this library:

Discussion

The most significant insight obtained from this study is that a protein enriched from a combinatorial protein library based on building blocks from a single protein domain showed binding specificity for a ligand unrelated to that of the parent domain. This indicates that appropriate combinations of building blocks obtained from a parent domain can reorganize to form an environment that can interact specifically with another ligand. Previously, Keefe and Szostak [20] obtained novel ATP-binding proteins from an artificial random-sequence library, and estimated the probability of their formation as less than 10−11. By contrast, in this study, we demonstrated that our combinatorial protein library of about 600 members contains a protein with a specific binding activity that was not exhibited by the parent domain. This suggests that the probability of obtaining such a protein by artificial splicing could be unexpectedly high (10−2 to 10−3). Thus, peptide blocks derived from natural proteins may have greater functional plasticity than has been believed.

That’s two functions in 600, or 1 in 300. That would make Axe wrong by 75 orders of magnitude.

Type B experiments:
Can an Arbitrary Sequence Evolve Towards Acquiring a Biological Function?

Abstract
To explore the possibility that an arbitrary sequence can evolve towards acquiring functional role when fused with other pre-existing protein modules, we replaced the D2 domain of the fd-tet phage genome with the soluble random polypeptide RP3-42. The replacement yielded an fd-RP defective phage that is six-order magnitude lower infectivity than the wild-type fd-tet phage. The evolvability of RP3-42 was investigated through iterative mutation and selection. Each generation consists of a maximum of ten arbitrarily chosen clones, whereby the clone with highest infectivity was selected to be the parent clone of the generation that followed. The experimental evolution attested that, from an initial single random sequence, there will be selectable variation in a property of interest and that the property in question was able to improve over several generations. fd-7, the clone with highest infectivity at the end of the experimental evolution, showed a 240-fold increase in infectivity as compared to its origin, fd-RP. Analysis by phage ELISA using anti-M13 antibody and anti-T7 antibody revealed that about 37-fold increase in the infectivity of fd-7 was attributed to the changes in the molecular property of the single polypeptide that replaced the D2 domain of the g3p protein. This study therefore exemplifies the process of a random polypeptide generating a functional role in rejuvenating the infectivity of a defective bacteriophage when fused to some preexisting protein modules, indicating that an arbitrary sequence can evolve toward acquiring a functional role. Overall, this study could herald the conception of new perspective regarding primordial polypeptides in the field of molecular evolution…
The first generation was prepared by subjecting the random sequence in fd-RP to three consecutive rounds of error-prone PCR using primers P6 and P7. Among the variants generated, eight clones were arbitrarily chosen to be the member of the first generation. The clone with the highest infectivity, as evaluated by phage infectivity assay described below, was then selected to be the parent clone for the second generation.

They generate EIGHT random clones from a beginning random sequence 144 amino acids in length. Of these 8 clones, 4 of them have higher infectivity than the starting protein. That’s 4 in 8 of the proteins 144 amino acids in length that has some degree of function. Which comes out as One in four. That would make Axe wrong by an incredible 77 orders of magnitude.

Later on, in 2007, a similar experiment is carried out: Experimental Rugged Fitness Landscape in Protein Sequence Space

In this paper they basically repeat the experiment with a slightly shorter, 139 amino acid protein, but increase the number of generated mutants at each selection step to 1 million. The results are basically the same, they just manage to further improve the infectivity levels from a 240-fold increase achieved in the 2002 experiment, to a 1.7×10^4-fold (ablmost twenty thousand) increase in the 2007 experiment.

Besides multiple such studies that sample functional proteins in longer sequences (various methods are used to screen peptides between 50 and 800 amino acids in length in most of the papers I read) Basically you can go through the references given in the papers I cite here and find many more.

For smaller sequences, peptides <50 amino acids in length, there is an even more extensive literature. Going through all of it is more work than I can be bothered with, so here’s a few handpicked examples:

First up, Phage display screening identifies a novel peptide to suppress ovarian cancer cells in vitro and in vivo in mouse models

In this paper, a library of 1.28 x 10^9 unique peptide sequences , but only 7 random amino acids in length (which is short enough that those 1.28×10^9 sequences constitute the entire space of possible sequences), which was linked to a minor coat protein of phage pIII, was used to screen for the ability to bind and inhibit cancer cells in vitro AND in vivo in mice.
They write:

Phage libraries and biopanning

The Ph.D.-c7c phage display peptide library was purchased from New England Biolabs (Ipswich, MA, USA). This library contained approximately 1 × 10^13 pfu/mL phages with a diversity of 1.28 × 10^9 unique peptide sequences for up to 70 copies of each. Biopanning is an affinity selection technique for identifying peptides that bind to a given target [23]. In this study, the biopanning began with the incubation of the phage display peptide library with both normal and tumor cells, in which normal cells were used to deplete peptides that only bind to normal cells and then further incubated with tumor cells for identifying the peptides that only specifically bind to tumor cells.

My highlight in bold.
They find a peptide with the amino acid sequence SWQIGGN which successfully binds HO8910 ovarian cancer cells, yet does not bind normal healthy cells, and it also inhibits tumor growth in vitro. Technically they find others, but this is the best one from the selection protocols used.

Then they tested it in vivo:

As shown in Table 3, the volume of ascites, the number of tumors, the total number of disseminated tumor nodules and the tumor weight were all significantly smaller in the Peptide 1 treated cell group than those of the HO8910-negative control peptide and HO8910 parental cell groups (p 0.05). These results suggested that Peptide 1 was able to inhibit the growth and dissemination of ovarian cancer cells in this mouse transplantation model.

Here’s another one using the same kit, which identifies a completely different peptide (NPMIRRQ), also able to bind HO8910 ovarian cancer cells: Identification of a peptide specifically targeting ovarian cancer by the screening of a phage display peptide library.

Basically the same procedure and numbers, indicating the above mentioned result is not some sort of statistical fluke. So this particular function is known to exist in at least 1 in 5×10^8 sequences.

Next one, mRNA display selection of a high-affinity, Bcl-XL-specific binding peptide.

Abstract
Bcl-X(L), an antiapoptotic member of the Bcl-2 family, is a mitochondrial protein that inhibits activation of Bax and Bak, which commit the cell to apoptosis, and it therefore represents a potential target for drug discovery. Peptides have potential as therapeutic molecules because they can be designed to engage a larger portion of the target protein with higher specificity. In the present study, we selected 16-mer peptides that interact with Bcl-X(L) from random and degenerate peptide libraries using mRNA display. The selected peptides have sequence similarity with the Bcl-2 family BH3 domains, and one of them has higher affinity (IC(50)=0.9 microM) than Bak BH3 (IC(50)=11.8 microM) for Bcl-X(L) in vitro. We also found that GFP fusions of the selected peptides specifically interact with Bcl-X(L), localize in mitochondria, and induce cell death. Further, a chimeric molecule, in which the BH3 domain of Bak protein was replaced with a selected peptide, retained the ability to bind specifically to Bcl-X(L). These results demonstrate that this selected peptide specifically antagonizes the function of Bcl-X(L) and overcomes the effects of Bcl-X(L) in intact cells. We suggest that mRNA display is a powerful technique to identify peptide inhibitors with high affinity and specificity for disease-related proteins.

The paper is a bit unclear on the exact numbers here, but they do manage to find 19 functional peptides that bind Bcl-X(L) and induce apoptosis, in a library of at least 10^12 sequences (the total space for 16 amino acid peptides is roughly 6.6×10^20).

Next up is: Identification of Calmodulin Isoform-specific Binding Peptides from a Phage-displayed Random 22-mer Peptide Library

In this one they manage to isolate 80 unique, 22 amino acid-long peptides from a pool of about 4.8×10^8. That comes out at 1 in 6 million peptides that has the particular function searched for, in a library with a total size of about 4×10^28.

Much more can be said about the possibility of evolving larger proteins from smaller fragments (and smaller fragments from dimers and trimers) but that is beyond the scope of what I wanted to achieve in this post. Any criticism of my case here is welcome.

152 thoughts on “Axe, EN&W and protein sequence space (again, again, again)”

Rumraket on October 12, 2016 at 1:48 pm said:

Is there really no way to edit the post after it’s been posted? I found several annoying typos and formatting issues after I pressed publish.
Patrick on October 12, 2016 at 4:19 pm said:

Rumraket:
Is there really no way to edit the post after it’s been posted? I found several annoying typos and formatting issues after I pressed publish.

If you send me the updates I’ll incorporate them. I’ll also look into how to allow authors to do that themselves.
REW on October 12, 2016 at 6:49 pm said:

A point you allude to is worth highlighting and stressing: a fold is not required for function.

Axe implies that folds are necessary for function and all his calculations are based on evolving the big, complex folds found in existing proteins in a single step. But if folds are not necessary for function his argument falls apart
OMagain on October 12, 2016 at 7:06 pm said:

Genuine truth seekers can sometimes find truth. Everyone else finds exactly what they are looking for.
OMagain on October 12, 2016 at 7:07 pm said:

Patrick: I’ll also look into how to allow authors to do that themselves.

IIRC that was turned off as some authors abused the privilege, deleting OPs wholesale.
REW on October 12, 2016 at 7:20 pm said:

Thanks for the wonderful article. This should go in the archives of ‘best anti-ID arguments’

I’ve often thought of writing something up on the plausibility of protein evolution but its not my field and tracking down all the disparate refs I’ve come across over the years would be a pain.

One topic that I think might convince a reasonable non-scientist not to get taken in by Axe’s claim that proteins are impossibly unlikely is catalytic nanoparticles. This is how I would approach it:

Computers have many complex functions. If you look inside a computer you see that its extremely complex at a microscopic level . Its reasonable to think this complexity is related to its ability to perform complex functions,..and so it is.
..But what if you found out that you could carve a piece of wood to look superficially like a computer and it would perform most the functions of a computer albeit poorly? You’d have to conclude that the complexity inside isnt directly related to its ability to perform those marvelous functions, perhaps its just improves efficiency.
Catalytic nanoparticles are inert spheres roughly the size of a protein that are coated with simple chemical groups. And they can perform many of the same reactions performed by big complex enzymes at lower efficiency. This should make it obvious that the combinatorial complexity of any particular protein isn’t necessary for it for perform it function.
Mung on October 13, 2016 at 12:19 am said:

Dude, could you at least write a summary?
Mung on October 13, 2016 at 12:25 am said:

Aren’t we getting to the point now that we can put claims, such as those made by Rumraket in his OP, to the test? You know, actual experimental science.

Thoughts?
hrun0815 on October 13, 2016 at 1:47 am said:

Mung,

I guess you stopped reading before you saw the “actual experimental science” that put Axe’s claims to the test.
Mung on October 13, 2016 at 1:53 am said:

hrun0815: I guess you stopped reading before you saw the “actual experimental science” that put Axe’s claims to the test.

Pay attention. I said nothing about putting the claims of Axe to the test. If Rumraket is not putting forth any testable claims, so be it.
Mung on October 13, 2016 at 1:58 am said:

Create a computer program that randomly generates a DNA/RNA sequence which just might code for an existing protein of 150 AA. Not too hard, right?
hrun0815 on October 13, 2016 at 2:01 am said:

Mung,

Rumraket’s claim is that Axe’s number is wrong and contradicted by experiment. What experiments, other than the ones that show Axe’s number to be wrong, would you like to see?
Mung on October 13, 2016 at 2:24 am said:

hrun0815: Rumraket’s claim is that Axe’s number is wrong and contradicted by experiment.

How can we test Rumraket’s claim? Can we generate all the possible 150AA sequences with a computer and compare them to all known functional proteins of length 150AA?
Mung on October 13, 2016 at 2:32 am said:

Q1: How long would it take a modern computer to generate all possible DNA/RNA sequences which could possibly code for a 150 amino acid polypeptide?

Lots of computer geniuses here at TSZ.
Mung on October 13, 2016 at 3:11 am said:

Q2: How long would it take all modern computers to generate all possible DNA/RNA sequences which could possibly code for a 150 amino acid polypeptide?

Lots of computer geniuses here at TSZ.
Rumraket on October 13, 2016 at 7:18 am said:

Mung, are you saying we have to test all possible sequences for all possible functions to have experimental evidence that contradicts Axe’s claim? Because if you are, you are wrong.

Statistically speaking the results obtained by experiment would be unbelievably unlikely if Axe’s number was correct. This is falsification by experiment.

The numbers used here are of course much much greater (and the uncertainties are colossal, I’m not saying we have found the true frequency of function in protein sequence space with great certainty), but an analogy would be polling around political elections. You don’t have to know what every person will vote to have good evidence about the relative support of candidates.

And more analogously to what I’m saying, we don’t have to poll everyone to have very strong evidence against someone saying “Trump will win with 99.99% of the vote”.
Rumraket on October 13, 2016 at 8:04 am said:

Mung: Q1: How long would it take a modern computer to generate all possible DNA/RNA sequences which could possibly code for a 150 amino acid polypeptide?

What do you mean “generate”? You couldn’t store them all in memory, but even if you ignored that, just going through all of them would take more than the age of the universe if you generated 10^20 every second (you’d need approximately 1.4×10^175 seconds).

Computers are no help here. The best we can do is experiment, and that will still not get us close to the “true number”, for reasons such as function being context specific*. But with randomly sampling big libraries of randomly generated sequences, we can amass evidence for or against claims about the true numbers.

You can make statistical statments about it. For example:
Assuming function is spread equally over all of sequence space (and to be sure it probably isn’t), if the true frequency of function is 1 in 10^77, what is the probability that by random sampling you find it to be 1 in 10^11? That’s going to be unbelievably unlikely.

Assuming it is still 1 in 10^77, but NOT spread equally over all of sequence space, and is instead spread out in clusters of higher and lower density (which is probably more correct), what is the probability that by random sampling you happen upon an area where the relative density is 1 in 10^11? That’s also going to be unbelievably unlikely.

In other words, the results obtained by the above detailed experiments are unbelievably unlikely if Axe’s number is correct. So as Richard Feynman would say, “if it disagrees with experiment, it’s wrong”. That’s all there is to it.

* What might be a critical metabolic enzyme in an E coli in your gut, will probably be a misfolding and catalytically inactive piece of junk for a hyperthermophilic bacterium living at 90 degrees C in a hydrothermal vent, meaning the same protein will be functional under one circumstance, and nonfunctional in another. All function is contextually determined.
Rumraket on October 13, 2016 at 8:07 am said:

Mung: If Rumraket is not putting forth any testable claims, so be it.

I did put forward testable claims, they were tested (though prior to me making the claim, which is why I made the claim, I had read the tests which is why I disbelieved Axe’s number). Specifically, I said:
“I will give evidence that it is wrong by at least 60 orders of magnitude.”

Which I then did. The evidence I brought is why I make my claim in the first place.
Rumraket on October 13, 2016 at 8:08 am said:

Mung: Create a computer program that randomly generates a DNA/RNA sequence which just might code for an existing protein of 150 AA. Not too hard, right?

What would that tell us?
Rumraket on October 13, 2016 at 1:58 pm said:

Mung: Can we generate all the possible 150AA sequences with a computer and compare them to all known functional proteins of length 150AA?

Would that tell us that there are no other functions out there in sequence space?

Why would we even need to generate them on a computer, what would having lots of sequences stored on a computer tell us?
vjtorley on October 13, 2016 at 4:20 pm said:

Hi Rumraket,

Congratulations on your meaty, substantive OP. For my part, I think you’ve made quite a strong case against Dr. Axe’s figure of 1 in 10^77.

I should warn you, however, that ID proponents are familiar with the work of Keefe and Szostak. Dr. Cornelius Hunter (who is a graduate of the University of Illinois where he earned a Ph.D. in Biophysics and Computational Biology) sent me a message a couple of years ago, in which he discussed their experiment. I’d like to quote a brief excerpt:

“First, Keefe and Szostak is not relevant as they were not seeking functional proteins, but merely mild ATP binding. Second, Taylor, et. al. deals with a simple, helix only, protein (homodimeric AroQ), biased the sequence toward helix forming amino acids and sequence patterns, did not fully randomize the sequence but only randomized regions, and is vague about how they arrive at their 10^24 tries required. Even if their calculation of 10^24 is reasonable, you’re dealing with a pretty simple protein… AroQ is toward the simple end of the spectrum… And finally there are several studies on slightly more complex, challenging proteins, all of which come in at around 10^60 – 10^80 attempts required.”

He also wrote:

“Proteins are by no means created equal. They occupy a wide spectrum of size and complexity… Nor is there reason to think that evolution could live with the shorter, simpler ones at first, and then later somehow the larger, more complex ones would evolve. The larger ones appear to be needed, and there are not obvious gradual pathways to forming them… We’re still not close to the more complex proteins.”

That’s the ID response. Is there anything you’d like to say in reply?

I take it you’ve read this post of Axe’s: http://www.biologicinstitute.org/post/19310918874/correcting-four-misconceptions-about-my-2004

By the way, has anyone seen this book: https://www.amazon.com/dp/9814436364 ? It looks interesting.

Thanks once again for writing this OP.
petrushka on October 13, 2016 at 4:22 pm said:

vjtorley: The larger ones appear to be needed, and there are not obvious gradual pathways to forming them…

Aside from being factually untrue, is there any other point?
REW on October 13, 2016 at 5:18 pm said:

vjtorley: “First, Keefe and Szostak is not relevant as they were not seeking functional proteins, but merely mild ATP binding.

He also wrote:

“Proteins are by no means created equal. They occupy a wide spectrum of size and complexity… Nor is there reason to think that evolution could live with the shorter, simpler ones at first, and then later somehow the larger, more complex ones would evolve. The larger ones appear to be needed, and there are not obvious gradual pathways to forming them… We’re still not close to the more complex proteins.”

That’s the ID response. Is there anything you’d like to say in reply?

For most proteins specific binding is all thats required for function. Most of the 3D complexity of the cell is a result of specific interactions between proteins. When more is required for function, such as catalysis, that usually involves tweaking the binding such that the substrate is bent in a way the facilitates the making or breaking of bonds. But the authors actually show that just selecting for binding can also give catalysis. In addition, the Bartel lab has shown with RNA enzymes that once one selects for binding its a short step to catalysis.

In many cases there are obvious pathways to the more complex proteins since the large complex proteins are clearly build up from modules. But more to the point, billions of years ago when small inefficient proteins were the only game in town there’s no reason the think they would not of sufficed for early life and quasi-life.

( feel free to ignore this post )
colewd on October 13, 2016 at 6:27 pm said:

REW,

In many cases there are obvious pathways to the more complex proteins since the large complex proteins are clearly build up from modules. But more to the point, billions of years ago when small inefficient proteins were the only game in town there’s no reason the think they would not of sufficed for early life and quasi-life.

This is an interesting point. Are there any known knock out experiments that can support this claim? How much can you mutate a ribosome of ATP synthase or critical enzymes in bacteria and still maintain life?
Mung on October 13, 2016 at 6:42 pm said:

vjtorley: By the way, has anyone seen this book: https://www.amazon.com/dp/9814436364 ? It looks interesting.

Yes, I have it.

Here’s another that I do not have.

Myths and Verities in Protein Folding Theories

This book tells the story of a whole field of research which went disastrously wrong in many different directions when the wrong tools were used and the wrong targets aimed at, to solve a problem.

It all started from the misinterpretation of Schellman’s experiments which were carried out in the mid-1950s, and which were later encapsulated by the so-called “Hydrogen-bond inventory argument.” As a result, hydrogen bonding was ignored in protein folding; along with that a whole repertoire of other possible hydrophilic effects were not even considered. Against this background the hydrophobic effects, as suggested by Kauzmann, became the single most important effects in protein folding.

In addition, as a result of the misinterpretation of Anfinsen’s hypothesis, as well as misunderstanding of the Second Law, people made futile attempts to search for the structure of proteins in the global minimum of either the Energy, or the Gibbs Energy Landscape.
Rumraket on October 13, 2016 at 7:14 pm said:

vjtorley: Hi Rumraket,

Congratulations on your meaty, substantive OP. For my part, I think you’ve made quite a strong case against Dr. Axe’s figure of 1 in 10^77.

Thank you.

Dr. Cornelius Hunter (who is a graduate of the University of Illinois where he earned a Ph.D. in Biophysics and Computational Biology) sent me a message a couple of years ago, in which he discussed their experiment. I’d like to quote a brief excerpt:

“First, Keefe and Szostak is not relevant as they were not seeking functional proteins, but merely mild ATP binding.

Why is ATP binding not a function? How is this even a coherent response by Hunter?

Torley, as someone who can clearly think for yourself, what goes through your head when you read Hunter there? Why is a protein that binds ATP not a functional protein?

Second, Taylor, et. al. deals with a simple, helix only, protein (homodimeric AroQ), biased the sequence toward helix forming amino acids and sequence patterns, did not fully randomize the sequence but only randomized regions, and is vague about how they arrive at their 10^24 tries required. Even if their calculation of 10^24 is reasonable, you’re dealing with a pretty simple protein… AroQ is toward the simple end of the spectrum

I don’t know this particular reference but I didn’t use it myself so it’s not relevant.

And finally there are several studies on slightly more complex, challenging proteins, all of which come in at around 10^60 – 10^80 attempts required.”

No, there aren’t. When I read this it looks to me like Hunter is making shit up here. Where do he get these numbers from? I’ve scoured the literature for making that post above and I have not found a single reference that comes up with these values. What the hell is he talking about? There are no studies that demonstrate it requires 10^60 to 10^80 attempts to find something functional. Anywhere.

He also wrote:

“Proteins are by no means created equal. They occupy a wide spectrum of size and complexity… Nor is there reason to think that evolution could live with the shorter, simpler ones at first, and then later somehow the larger, more complex ones would evolve.

Except I document that above. So Hunter is another one of those people who’s ideas disagree with experiment.

The larger ones appear to be needed, and there are not obvious gradual pathways to forming them… We’re still not close to the more complex proteins.”

He apparently just says this, contrary as it is to the amassed evidence. That just gives me reason to put Hunter in there with Axe amoong people who say things that are contradicted by evidence and experiment.

That’s the ID response. Is there anything you’d like to say in reply?

Hunters words doesn’t dictate reality, I researched these matters myself and found out he was wrong.

I take it you’ve read this post of Axe’s: http://www.biologicinstitute.org/post/19310918874/correcting-four-misconceptions-about-my-2004

Thank you for bringing that to my attention. There is nothing in there that constitutes a response to the points I bring up here. Which is whether his mutated proteins can do something else, such as work on related substrates. He never tested for that, yet it is known that beta-lactamases are functional on several related substrates.
Until Axe goes back into a laboratory and starts doing experimental work and actually tests his proteins for functions besides the single one he probed to begin with, he can say almost nothing about the relative frequency of functional proteins in sequence space.

He tries to dismiss it with an analogy to whether meaningful sentences are “mutationally isolated”, but that’s pure bluster. I don’t buy the semantics analogy as an accurate reflection of functional amino acid sequence space. Sentences don’t work like physics does.

By the way, has anyone seen this book: It looks interesting.

How proteins manage to fold is an interesting (and very important) question in it’s own right, but it doesn’t seem to have any direct relevance to the subject here.
John Harshman on October 13, 2016 at 8:16 pm said:

vjtorley: Dr. Cornelius Hunter (who is a graduate of the University of Illinois where he earned a Ph.D. in Biophysics and Computational Biology) sent me a message a couple of years ago, in which he discussed their experiment.

In the future, if anyone addresses or refers to me, I demand to be called “Dr. John Harshman (who is a graduate of the University of Chicago where he earned a Ph.D. in Evolutionary Biology)”. I realize that creationists use such language only for people whose claims they approve of, so this is unlikely. But one can dream.

The argument from authority breaks down quickly. Hunter, sorry, Dr. Cornelius Hunter, Ph.D., doesn’t bother to cite any support for most of his claims, vaguely waving his hands in the direction of “several studies”, or sometimes empty space. Not good.
Mung on October 13, 2016 at 9:39 pm said:

If someone is waving their hands in empty space, that space is obviously not empty, Dr. John.
John Harshman on October 13, 2016 at 10:04 pm said:

Mung:
If someone is waving their hands in empty space, that space is obviously not empty,Dr. John.

Closer, but not close enough: Dr. John Harshman (who is a graduate of the University of Chicago where he earned a Ph.D. in Evolutionary Biology).

And I didn’t say he was waving his hands in empty space. Perhaps you could read more carefully.
Rumraket on October 13, 2016 at 10:04 pm said:

Mung:
If someone is waving their hands in empty space, that space is obviously not empty,Dr. John.

You’re a funny guy Mung, that’s why I’m going to kill you last.
vjtorley on October 13, 2016 at 11:53 pm said:

Hi Rumraket,

Thank you for your latest response. I think you’ve managed to effectively counter Dr. Cornelius Hunter’s points, so the ball is definitely back in Dr. Axe’s court. It will be interesting to see if he responds to your post. Cheers.
DNA_Jock on October 14, 2016 at 12:21 am said:

vjtorley (quoting Dr. Hunter): And finally there are several studies on slightly more complex, challenging proteins, all of which come in at around 10^60 – 10^80 attempts required.

Wow! If someone had done such an experiment, I think we would have noticed:

Let’s be generous, and suppose that Hunter’s “slightly more complex” protein is actually much smaller, say only 650 Daltons (that’s the size of a single DNA base pair).
A library of 10^60 such oligopeptides would weigh
650 x 1.67 x 10-24 grams x 10^60.
That’s 10^36 kg, or the weight of 500,000 suns.

I think I can safely say that nobody has done that experiment.
Sampling from a library of 10^12 – 10^14 peptides is the only practicable approach, unless the Biologic Institute has a lot more resources than they are letting on about…
P.S. They call me Doctor Jock.
</Pumba>
otangelo on October 14, 2016 at 3:24 am said:

Rumraket

what about you tell your audience here how the first proteome set required to make the first living cell came to be ?

Mikkel wrote:

“Axe only tested for one function, not all possible functions. There, done. At a basic level, this would be enough.”

On pre dna -replication level, there would be no function for ANY protein, unless a minimal set of proteins were present, correctly interconnected through the metabolic network, and having the specific energy supply granted at the right place.

See, its utterly irrelevant to muse about protein evolution probabilities, if you cannot make a reasonable case of how the first gene and protein set came to be in the first place.
And when you dig deeper to understand the enormous , unfathomable task, you get the picture about how irrational naturalistic, unguided, non intelligent proposals are.
There is NO WAY the first interdependent proteome set to make life possible to come about by chance or physical necessity. NO WAY !!

But atheists/ proponents of naturalism are desperate and try to ressurrect their dead patient by all means , and at all costs……
phoodoo on October 14, 2016 at 5:49 am said:

John Harshman,

Yea, John, the same guy thinks that Nilsson-Pilger fake computer model fairytale about eye evolution is science. THAT John.

I will just call you John. But if you start performing some open heart surgeries, let me know.
Rumraket on October 14, 2016 at 7:18 am said:

otangelo:
Rumraket

what about you tell your audience here how the first proteome set required to make the first living cell came to be ?

I wasn’t aware that there was a “first proteome set required to make the first living cell” and I’m not required to know or tell about such a thing in this thread because it’s not the subject matter or something I’ve claimed to know.

By the way, you really should get over this obsession you have with me.
OMagain on October 14, 2016 at 8:31 am said:

otangelo: what about you tell your audience here how the first proteome set required to make the first living cell came to be ?

I can do that!

However you’ll have to do your bit by detailing what the first living cell contained.

That’s the deal I’m afraid, as otherwise I may provide the wrong explanation.
OMagain on October 14, 2016 at 8:32 am said:

Rumraket: By the way, you really should get over this obsession you have with me.

I’ll take this one. I have all the explanations available and am willing to provide them, as long as the person asking for one starts the ball rolling by explaining in detail what they want explained.

I won’t be busy.
OMagain on October 14, 2016 at 8:34 am said:

otangelo: There is NO WAY the first interdependent proteome set to make life possible to come about by chance or physical necessity. NO WAY !!

My god is more powerful then your god. My god can make a universe where life is an emergent property, whereas yours has to constantly poke and twist. My god can take a well earned break after making a universe, your god has to nursemaid it for it’s entire existence.

Your god is weak.
Rumraket on October 14, 2016 at 11:00 am said:

otangelo: There is NO WAY the first interdependent proteome set to make life possible to come about by chance or physical necessity. NO WAY !!

If only you could write NO WAY in even BIGGER letters, maybe that would make it true.
otangelo on October 14, 2016 at 11:27 am said:

Rumraket wrote:

“I wasn’t aware that there was a “first proteome set required to make the first living cell”

at least now you are. And of course you were aware, since you read my posts.

“and I’m not required to know or tell about such a thing in this thread because it’s not the subject matter or something I’ve claimed to know.”

I know your usual attempt of escape…………..nothing new under the sun.

” By the way, you really should get over this obsession you have with me.”

Calm down. Its not sexual…..
Hum… not having fun, actually feeling frustrated, seing your views not only being challenged, but being exposed over and over for what they are ? Irrational nonsense ?? kk…
Rumraket on October 14, 2016 at 11:37 am said:

otangelo: at least now you are. And of course you were aware, since you read my posts.

Right.

I know your usual attempt of escape…………..nothing new under the sun.

Hum… not having fun, actually feeling frustrated, seing your views not only being challenged, but being exposed over and over for what they are ? Irrational nonsense ?? kk…

You got me.

Welcome to ignore.
otangelo on October 14, 2016 at 11:38 am said:

OMagain: I can do that!

However you’ll have to do your bit by detailing what the first living cell contained.

That’s the deal I’m afraid, as otherwise I may provide the wrong explanation.

good question. I was waiting for it.

What Might Be a Protocell’s Minimal requirement of parts ?

http://reasonandscience.heavenforum.org/t2110-what-might-be-a-protocells-minimal-requirement-of-parts#3797

Proteins are essential building blocks of living cells; indeed, life can be viewed as resulting substantially from the chemical activity of proteins. Because of their importance, it is hardly surprising that ancestors for most proteins observed today were already present at the time of the ‘last common ancestor’, a primordial organism from which all life on Earth is descended. How did the first proteins arise? How can we bring a taxonomic order to the diversity of forms that evolved from them? These two questions are at the center of our scientific efforts, on which we bring to bear methods in bioinformatics, protein biochemistry and structural biology.

Based on the conjoint analysis of several computational and experimental strategies designed to define the minimal set of protein-coding genes that are necessary to maintain a functional bacterial cell, we propose a minimal gene set composed of 206 genes. Such a gene set will be able to sustain the main
vital functions of a hypothetical simplest bacterial cell with the following features.

(i) A virtually complete DNA replication machinery, composed of one nucleoid DNA binding protein, SSB, DNA helicase, primase, gyrase, polymerase III, and ligase. No initiation and recruiting proteins seem to be essential, and the DNA gyrase is the only topoisomerase included, which should perform
both replication and chromosome segregation functions.

(ii) A very rudimentary system for DNA repair, including only one endonuclease, one exonuclease, and a uracyl-DNA glycosylase.

(iii) A virtually complete transcriptional machinery, including the three subunits of the RNA polymerase, a factor, an RNA helicase, and four transcriptional factors (with elongation, antitermination, and transcription-translation coupling functions). Regulation of transcription does not appear to be essential in bacteria with reduced genomes, and therefore the minimal gene set does not contain any transcriptional regulators.

(iv) A nearly complete translational system. It contains the 20 aminoacyl-tRNA synthases, a methionyl-tRNA formyltransferase, five enzymes involved in tRNA maturation and modification, 50 ribosomal proteins (31 proteins for the large ribosomal subunit and 19 proteins for the small one), six proteins necessary for ribosome function and maturation (four of which are GTP binding proteins whose specific function is not well known), 12 translation factors, and 2 RNases involved in RNA degradation.

(v) Protein-processing, -folding, secretion, and degradation functions are performed by at least three proteins for posttranslational modification, two molecular chaperone systems (GroEL/S and DnaK/DnaJ/GrpE), six components of the translocase machinery (including the signal recognition particle, its receptor, the three essential components of the translocase channel, and a signal peptidase), one endopeptidase, and two proteases.

(vi) Cell division can be driven by FtsZ only, considering that, in a protected environment, the cell wall might not be necessary for cellular structure.

(vii) A basic substrate transport machinery cannot be clearly defined, based on our current knowledge. Although it appears that several cation and ABC transporters are always present in all analyzed bacteria, we have included in the minimal set only a PTS for glucose transport and a phosphate transporter. Further analysis should be performed to define a more complete set of transporters.

(viii) The energetic metabolism is based on ATP synthesis by glycolytic substrate-level phosphorylation.

(ix) The nonoxidative branch of the pentose pathway contains three enzymes (ribulose-phosphate epimerase, ribosephosphate isomerase, and transketolase), allowing the synthesis of pentoses (PRPP) from trioses or hexoses.

(x) No biosynthetic pathways for amino acids, since we suppose that they can be provided by the environment.

(xi) Lipid biosynthesis is reduced to the biosynthesis of phosphatidylethanolamine from the glycolytic intermediate dihydroxyacetone phosphate and activated fatty acids provided by the environment.

(xii) Nucleotide biosynthesis proceeds through the salvage pathways, from PRPP and the free bases adenine, guanine, and uracil, which are obtained from the environment.

(xiii) Most cofactor precursors (i.e., vitamins) are provided by the environment. Our proposed minimal cell performs only the steps for the syntheses of the strictly necessary coenzymes tetrahydrofolate, NAD, flavin aderine dinucleotide, thiamine diphosphate, pyridoxal phosphate, and CoA.
Mung on October 14, 2016 at 2:58 pm said:

OMagain: My god is more powerful then your god. My god can make a universe where life is an emergent property, whereas yours has to constantly poke and twist. My god can take a well earned break after making a universe, your god has to nursemaid it for it’s entire existence.

Oh goody. You want to talk theology. 🙂

Your God is like a man who doesn’t care about the child he created and abandons it.
Mung on October 14, 2016 at 3:02 pm said:

Rumraket: …and I’m not required to know or tell about such a thing in this thread because it’s not the subject matter or something I’ve claimed to know.

I agree with Rumraket. otangelo’s post is an attempt to derail the thread and under the latest site rule ought to be sent to Guano.

Or do we have to wait and see if it actually derails the thread? I’m still not clear on that part of the new rule.
phoodoo on October 14, 2016 at 3:21 pm said:

Mung,

Are you attempting to derail this thread?

Or wait, has this thread not already been derailed by Rumrakets discussion about what this thread is about? Because this thread is most certainly not about discussing what this thread is or isn’t about.

On a sidenote on this thread Rumraket has derailed: It seems Otangelo has given Rumraket many substantive replies, so isn’t it a bit undignified for Rumraket to say he is going to ignore Otangelo? Rumraket after all has made several insulting posts towards Otangelo, so for Rumraket to now say they are going to block Otangelo, well, its just not very spiriting. It seems an admission of defeat really.
Rumraket on October 14, 2016 at 5:47 pm said:

The mininum number of required genes for a free-living bacterial cell that is responsible for powering it’s own cell-cycle(growth and division) and biosynthesize it’s own components, is completely irrelevant to the subject of this thread. If it’s something Otangelo or Phoodoo wishes to discuss, they can start a new thread.
otangelo on October 14, 2016 at 6:19 pm said:

Rumraket:
The mininum number of required genes for a free-living bacterial cell that is responsible for powering it’s own cell-cycle(growth and division) and biosynthesize it’s own components, is completely irrelevant to the subject of this thread. If it’s something Otangelo or Phoodoo wishes to discuss, they can start a new thread.

Mikkel

yes, you wish it would be irrelevant. But its actually highly relevant. Musings about probability of the evolution of proteins in order to infer that it could happen naturally, is irrelevant, if you cannot explain in a meaningful manner how the proteins to start life emerged.

Btw. I am new here. I didn’t know anyone can start a new topic.
Mung on October 14, 2016 at 7:53 pm said:

phoodoo: Are you attempting to derail this thread?

Would I do that? Really?
Mung on October 14, 2016 at 7:56 pm said:

otangelo: Btw. I am new here. I didn’t know anyone can start a new topic.

At the top of the page select New -> Post.

Then post something in the Moderation Issues thread asking an admin to release it.

ETA: this was not an attempt to derail the thread, but you can never tell where these things are going to go.
Rumraket on October 14, 2016 at 9:42 pm said:

otangelo: Musings about probability of the evolution of proteins in order to infer that it could happen naturally, is irrelevant, if you cannot explain in a meaningful manner how the proteins to start life emerged.

*Sigh*

I’m sorry but it is completely irrelevant how life started to the question of whether new protein functions can evolve. God could have created first life and let the rest evolve. As usual your logic is completely messed up.