Axe, EN&W and protein sequence space (again, again, again)

So I wanted to write a post on the whole “density of functional proteins in amino acid sequence space” because, it seems to me, most ID proponents have some completely unwarranted beliefs about it. Recently it seems the main champion of the idea that evolving new protein functions would be so unlikely as to be practically impossible for evolution, even given several billion years, planet-spanning population sizes and natural selection, is Douglas Axe. He produced a paper in an (actual) peer reviewed journal back in 2004, and ID creationists have been spinning it ever since.

In 2007 a review of Axe 2004 appeared on Panda’s Thumb, by Arthur Hunt: Axe (2004) and the evolution of enzyme function. It recaps what Axe actually did and what he concluded from it, and I will add a bit about how it’s been pushed in ID circles since (see for example how it is pushed here: Imagine: 60 Million Proteins in One Cell Working Together.

This paper is interesting because it relates to the work of Douglas Axe that resulted in a paper in the Journal of Molecular Biology in 2004. Axe answered questions about this paper earlier this year, and also mentioned it in his recent book Undeniable (p. 54). In the paper, Axe estimated the prevalence of sequences that could fold into a functional shape by random combinations. It was already known that the functional space was a small fraction of sequence space, but Axe put a number on it based on his experience with random changes to an enzyme. He estimated that one in 10^74 sequences of 150 amino acids could fold and thereby perform some function — any function.

Notice the last part I highlighted in bold. This is apparently what most ID-creationists believe, because this is what the higher-ups in the ID movement claim that Axe’s research shows. It does not, as I will attempt to demonstrate pretty conclusively.
To still believe that Axe’s number* is an estimate(never mind a correct estimate) of the number of “sequences of 150 amino acids [that] could fold and thereby perform some function — any function” – will be a form of blind faith after you’ve read this post through. Provided you understand it. Which isn’t too much to ask, because it’s really not that complicated.

*(The number seems to fluctuate a bit in the various writings, between 10^64 and 10^90, I’ll just go by 10^77 as it was used back in 2004)

Back to Arthur Hunt on Panda’s Thumb:

Axe (2004) has performed site directed mutagenesis experiments on a 150-residue protein-folding domain within a B-lactamase enzyme. His experimental method improves upon earlier mutagenesis techniques and corrects for several sources of possible estimation error inherent in them. On the basis of these experiments, Axe has estimated the ratio of (a) proteins of typical size (150 residues) that perform a specified function via any folded structure to (b) the whole set of possible amino acids sequences of that size. Based on his experiments, Axe has estimated his ratio to be 1 to 10^77. Thus, the probability of finding a functional protein among the possible amino acid sequences corresponding to a 150-residue protein is similarly 1 in 10^77.

Notice the last sentence, that’s the key claim much of ID-arguments against evolution of new proteins, new enzymes, new protein domains and folds, rests on. The claim that the probability of finding a functional protein 150 amino acids in length is 1 in 10^77. That would of course imply it would take, on average, 10^77 sequences before you found a functional one if you just randomly generated them.

Put another way, say you wanted to find some arbitrary function in the sequence space of 150 amino acid-long proteins, you’d probably have to generate all of those 10^77 sequences before you found it. At least on average. And if you then wanted to find one more function, a different arbitrarily picked one, you’d have to generate another 10^77 sequences.

To put that number into perspective, quick back of the envelope calculation suggests Axe is saying that you could literally fill a sphere the size of a small galaxy, around ten thousand lightyears in diameter, up completely with random protein sequences 150 amino acids in length, and in this unfathomably vast sphere packed densely with random polymers you’d only find A SINGLE FUNCTIONAL SEQUENCE.

By the way, the crap they’re pushing on EN&W now as vjtorley brought to our attention, has become so absurd they’re suggesting functional proteins exist at a rate in sequence space so low that you could fill the entire observable universe with randomly generated proteins – and in this universe-sized sphere of proteins there’d be a good chance you still wouldn’t find a single functional one. C’mon guys?

I honestly think the person who wrote the EN&W article doesn’t understand how exponents work. A number like 10^240 is, well I can’t draw an analogy that would make such a quantity intuitively comprehensible. The whole thing is ridiculous in the extreme. I’m a sucker for a bit of hyperbole I readily concede, but in this particular instance, I have a hard time finding a way to exaggerate how utterly ridiculous their numbers are. Someone over there has lost his/her mind, or wasn’t around when exponents were taught in school.

Anyway, in Axe’s own words from his paper:

Combined with the estimated prevalence of plausible hydropathic patterns (for any fold) and of relevant folds for particular functions, this implies the overall prevalence of sequences performing a specific function by any domain-sized fold may be as low as 1 in 10(77), adding to the body of evidence that functional folds require highly extraordinary sequences.

This seems to imply even Axe himself really believes that is the true density of functional proteins in 150 amino acid sequence space. This is also what your average ID proponent today believes. This is the number they read in books like Darwin’s Doubt, is pushed on EN&W and they believe this number corresponds to the odds of finding anything functional in 150 amino acid-sized sequence space. I don’t see how you can believe this isn’t what Axe, Meyer and colleagues wants them to believe, because they’ve never taken even a single step to correct that misapprehension. Indeed, they’re actively pushing it as the correct way to understand Axe 2004.

My contention will be that this number simply cannot be correct because it is directly contradicted by experiment. In the words of Richard Feynman:
”In general, we look for a new law by the following process. First, we guess it (audience laughter), no, don’t laugh, that’s really true. Then we compute the consequences of the guess, to see what, if this is right, if this law we guess is right, to see what it would imply and then we compare the computation results to nature, or we say compare to experiment or experience, compare it directly with observations to see if it works.
If it disagrees with experiment, it’s wrong. In that simple statement is the key to science. It doesn’t make any difference how beautiful your guess is, it doesn’t matter how smart you are who made the guess, or what his name is… If it disagrees with experiment, it’s wrong. That’s all there is to it.”

Instead of arguing much about the technicalities of whether such an inference can be extracted by Axe’s experiment (though I will argue briefly against that too), I will primarily bring experimental results that contradict it. Because it should be pretty obvious that, regardless of technical and logical arguments, if actual laboratory experiments consistently and massively contradict Axe’s number, “It’s wrong. That’s all there is to it.”

I will give evidence that it is wrong by at least 60 orders of magnitude.

Okay, so the first thing to note is that one can make a single factual statement that immediately throws what ID-creationists believe about Axe’s number into very serious doubt:
1. Axe only tested for one function, not all possible functions.
There, done. At a basic level, this would be enough. I could stop here and you should immediately disbelieve Axe’s conclusion. It cannot possibly be extracted from the experiment he did.

He could have tested for hundreds of additional catalytic functions, protein-DNA/protein-RNA interactions, protein-protein binding, small molecule binding and so on. There are quite literally millions of functions one can test for, besides the single function the enzyme originally had. Yet all he did was test how many mutations it took for the original function to disappear below detectable levels, then try to derive the total sequence space that correspond to the fold and then calculate the proportion of them in turn that retain the one tested function.

Ian Musgrave also pointed this out back on Panda’s Thumb in the comments section with this very informative post:

Ian Musgrave | January 15, 2007 12:40 AM
Katerina Wrote: “Does this mean that if the media included antibiotics besides penicillin, a structure associated with one of those peaks may have chanced to work on them?”
Yes. Axe only looked at ampicillin resistance (a relative of penicillin). It is entirely possible that other functional beta lactamases activities were conserved, or new ones generated. Mutations can indeed generate novel functions, and if you only look at one function, you may miss the big picture.

If you look at the data of Peimbert M, Segovia L. Evolutionary engineering of a beta-Lactamase activity on a D-Ala D-Ala transpeptidase fold. Protein Eng. 2003 Jan;16(1):27-35, where they mutated the D-Ala D-Ala transpeptidase (one of the family of enzymes from which the beta-lactamases first evolved from) and looked only at ampicilin, you would conclude that those mutations had no effect, while missing out on enormous changes in cefotaxamine resistance (and changes in quite a few other natibiotics as well). Even with related substrates, some showed big changes others little, so without a good panel of beta-lactams to check the function of the crippled enzyme, the functional space represented is under sampled.

Speaking of undersampling, it would have been good to look at the breakdown of common substrates of DD-peptidyl ligases and beta lactamases, such as m-[[(phenylacetyl)-glycyl]oxy]benzoic acid and see what the muations did to these activities. Beta lactamases have kept these activities, despite structural changes which have resulted in losing DD-peptidyl ligase activity, if these activities are kept, then the enzyme is more robust than Axe contends (but to be fair this is harder to monitor in the kinds of experiments Axe did)..

Stephen Matheson makes a similar point on this blog back in 2011 too: Exploring the protein universe: a response to Doug Axe

6. In my opinion, Axe significantly overstates his findings on the topic of “function.” So for example, in both the 2004 paper and the new BIO-Complexity paper, the experiments involve measuring a single function for each enzyme. It seems to me (and I could be wrong) that when the authors see that a particular variant (mutant) of the protein stops performing that one function, they conclude that the protein “has no function.” (In the BIO-Complexity paper, it’s two proteins and two functions, but the point is the same.) But of course we don’t know that, and evolutionary explanations would propose that new functions frequently arise when an enzyme has more than one function (or is broad-based in its function, or is modular in its structure and function).

How in the hell can Axe then conclude that his results extend to all functions for all sequences of that size? He can’t and it’s obvious. Yet that is what ID proponents now, today, believe his number represents. That is what Stephen Meyer writes in Darwin’s Doubt that Axe has shown. This is what they’re pushing on EN&W.

It should be obvious that you can’t just take an enzyme, throw mutations into it and then when it stops working, say “this means there’s no other function out there”. That obviously doesn’t follow.
For those reasons, Axe’s number just can’t be trusted as an estimation of that. It is a number that represents an estimate of a single very particular thing, which is the frequency of sequences 150 amino acids in length, that adopt the known Beta-lactamase fold and catalyze break down of ampicillin. For that reason it is NOT an estimate of the frequency of all functional proteins in all of 150 amino acid sequence space.
Arthur Hunt points out other issues that also suggest that even for the particular thing the number is an estimate of, it’s probably still an underestimate(meaning the frequency is estimated too low). I don’t need that for what I will show, however.

The experimental evidence will consist of a sample of papers that correspond to three types of experiments. I’ll just call them tybes A, B and C.
Type A experiments will be references to papers wherein the researchers generated either totally random protein sequences of lengths over 50 amino acids with no bias for structure, function or folding and then subsequently screened large libraries for functions they decided on.
Or they will be experiments where the researchers randomly assembled larger proteins from smaller fragments of already functional proteins in a type of fragment-shuffling and recombination, which is incidentally how the evolutionary process is thought to have actually made most of the proteins in extant life (from insertions and duplications of smaller parts of already existing proteins into others).

Tybe B will be references to papers wherein the researchers generated protein sequences of various lengths, but this time with the random sequences constituting only a smaller portion of a larger fixed structure. In these experiments the “sequence neighborhood” of these fixed folded structures are then probed for functions by generating variations of these structures with mutations in a subset of the proteins. These types of experiments demonstrate that if you have a functional protein already, there’s usually another function close by in sequence space, meaning most of the functionality is arranged in interconnected networks that can be navigated by relatively few substitutions. Which basically makes Axe’s claim doubly absurd, since there’s probably another function nearby.

Type C will be references to papers that are similar to types A and B, but the sizes of the proteins are less that 50 amino acids, typically in the 7-22 amino acid range. These types of experiments demonstrate that cellular functions don’t need to be carried out by huge, multi-domain proteins and that such large extant proteins could be generated by the stepwise accretion of smaller fragments over the course of evolution (and probably even a random process at or around the origin of life).

Type A experiments:

First I will note that subsequent experiments by the Szostak lab repeated the results (from the Keefe & Szostak paper I have mentioned before around here several times) for a different polymer and different conditions, among them they also selected and evolved an ATP binding RNA structure from a pool of randomized polymers, under “physiological pH and salt conditions”.

The numbers were about the same as the experiment with random protein polymers, approximately 1 in 10^11 had the desired function. That’s 66 orders of magnitude more frequent than what IDcreationist believe Axe 2004 has shown.

But it gets better: Another lab subsequently decided to elucidate the crystal structure of the originally evolved ATP binding protein and found out, by accident, that it actually catalyzes ATP hydrolysis to ADP. The Szostak lab never tested for ATP hydrolysis so had no idea their randomly generated protein sequence, originally selected just for ATP binding, also had catalytic activity. It was an enzyme: (How would you know if you don’t test for it? Think about it)
A synthetic protein selected for ligand binding affinity mediates ATP hydrolysis.

Perhaps even more striking than the fold was the observation that each of these protein crystal structures contained electron density that was consistent with the presence of ADP in the ligand-binding pocket even though 1 molar equiv of ATP was present in the crystallization liquor (11, 12). Since the half-life of ATP for spontaneous hydrolysis is no shorter than 12 months (13) and protein crystals were obtained after 3 days of growth, it was unclear how ADP became bound to the protein. We reasoned that the presence of ADP in the protein crystal structure could be explained by one of three possible scenarios:
(i) the -phosphate of the ATP molecule is highly disordered and therefore the third phosphate of ATP is simply not visible in any of the electron density maps;
(ii) the Family B protein binds preferentially to ADP, which is present as a minor contaminant from the disproportionation of ATP to ADP and adenosine tetraphosphate (14); or
(iii) the Family B protein binds the ATP ligand in such a way that it is able to lower the activation barrier (.5 kcal mol1) and allow ATP hydrolysis to proceed at a rate that is faster than the uncatalyzed reaction.
Intrigued by the possibility that a protein selected entirely on the basis of its ability to bind ATP might also function as a primitive catalyst, we initiated a series of structural and biochemical experiments designed to explain the presence of ADP in our protein crystal structures.

They go on to find that yes, in fact, it’s an enzyme that catalyzes ATP hydrolysis. This is so funny. This team had not intended to test for catalytic function either, they were interested in the structure of the protein, and quite by accident they found out the protein was catalyzing the conversion of ATP to ADP.

How would you know whether your protein has some unknown function without testing for it? You can’t. You simply can’t. Axe has literally no idea whether his mutated Beta-Lactamases have lost any and all possible functions. Because he never bothered testing for anything but the single function the enzyme was known to have to begin with.

Subsequent experiments designed to elucidate whether such ATP binding and catalytic enzymes also existed in sequence space with smaller amino acid alphabets, were also successful: ATP selection in a random peptide library consisting of prebiotic amino acids.

Based upon many theoretical findings on protein evolution, we proposed a ligand-selection model for the origin of proteins, in which the most ancient proteins originated from ATP selection in a pool of random peptides. To test this ligand-selection model, we constructed a random peptide library consisting of 15 types of prebiotic amino acids and then used cDNA display to perform six rounds of in vitro selection with ATP. By means of next-generation sequencing, the most prevalent sequence was defined. Biochemical and biophysical characterization of the selected peptide showed that it was stable and foldable and had ATP-hydrolysis activity as well.

These proteins were 137 amino acids in length (synthesized by joining together four random-sequence fragments of 25 amino acids with some spacer amino acids between, owing to the method of synthesis):

A DNA library with 108 sequential random codons was constructed by quadruplicating a DNA cassette consisting of 25 contiguous random VNM codons which was synthesized by standard phosphoramidite chemistry. The full length random-sequence library was assembled through digestion and ligation of DNA cassettes, as described previously [23,24]. We cloned and sequenced 30 individual library members from the final products. The results showed that each sequence was different. Given that each random DNA sequence was unique and the complexity of the library was essentially determined by the total number of molecules generated in the final ligation step [33,34], we can infer that the diversity of a 5 nmol DNA library is approximately 3 x 10^13. The random region (108 residues) of the resultant peptides (137 residues) is sufficiently long to form structured protein domains.

In this experiment they find many more ATP binding proteins in those 3×10^13 random sequences. The exact number isn’t stated in the paper, but they end up picking out 30 individual sequences.
That’s a function more prevalent in amino acid sequence space by at least 64 orders of magnitude compared to Axe’s estimate. And they only tested for one function. There could be others not tested for.

Here’s another: Evolvability of random polypeptides through functional selection within a small library.

A directed evolution with phage-displayed random polypeptides of about 140 amino acid residues was followed until the sixth generation under a selection based on affinity to a transition state analog for an esterase reaction. The experimental design deliberately limits the observation to only 10 clones per generation. The first generation consists of three soluble random polypeptides and seven arbitrarily chosen clones from a previously constructed library. The clone showing the highest affinity in a generation was selected and subjected to random mutagenesis to generate variants for the next generation. Even within only 10 arbitrarily chosen polypeptides in each of the generations, there are enough variants in accord to capacity of binding affinity. In addition, the binding capacity of the selected polypeptides showed a gradual continuous increase over the generation. Furthermore, the purified selected random polypeptides exhibited a gradual but significant increase in esterase activity. The ease of the functional development within a small sequence variety implies that enzyme evolution is prompted even within a small population of random polypeptides. …

How many random sequences need to be searched to permit the observance of evolving polypeptides towards acquiring higher functions? Is it feasible to decipher evolution from observing only a few variants per generation out of the vast number of available sequences, especially at the primitive stage? To address these issues, we deliberately limit our evolutionary studies on populations of arbitrarily chosen mutant random polypeptides to a maximum of 10 for each generation, although a phage display system with selection using transition state analogs (TSAs) has the capability of yielding highly diversified mutants of 108–9 rendering it an efficient tool for generating a protein catalyst (Patten et al., 1996; Fujii et al., 1998; Forrer et al., 1999). Here, we show that even examining only 10 arbitrarily chosen polypeptides with random sequences per generation could guarantee the observance of evolution, in which there is a continuous gradual increase in a function of the polypeptides via iterating mutation and selection.

Here the frequency of enzymes with Esterase functions is apparently ONE IN 1000 random sequences 140 amino acids in length (the initial library is 1000 members, the subsequent selection steps are from populations of only 10). Let that sink in. 1 in 1000. Not 1 in 10 to the seventy-seventh power. Just one in one thousand. That means Axe’s 1in 10^77 estimate is wrong by 74 orders of magnitude.

Next up we have this: Effective selection system for experimental evolution of random polypeptides towards DNA-binding protein.

An experimental evolution with selection based on binding affinity to DNA was carried out on a library of phage-displayed random polypeptides of about 140 amino acid residues. First, we constructed a system to artificially evolve phage-displayed random polypeptides toward binding to a target DNA containing a restriction enzyme site, in which random polypeptides capable of binding the DNA were recovered as complexes with the target DNA by digestion with the restriction enzyme. The experimental evolution cycle, including the above selection system and random mutagenesis for generating the next mutant library, was repeated until the fourth generation. The ability to bind to the DNA was enhanced per generation. In the fourth generation, convergence of the selected clones to a dominant sequence was observed. These results indicate that the newly constructed selection system is effective for exploring the evolvability of random polypeptides towards DNA-binding proteins.

In this paper they find a function at a rate of about 1 in 10^6 (a million) random amino acid sequences of 140 length. That’s 71 orders of magnitude more prevalent than the IDcreationist belief about what Axe 2004 showed.

In searching for publications for this post I also discovered the second type A experiment, where the method was randomized recombination of smaller fragments of proteins(which can also be generated by alternative splicing): In vitro selection of GTP-binding proteins by block shuffling of estrogen-receptor fragments

Previously we have developed random multi-recombinant PCR (RM-PCR), which enables the combination of different DNA fragments without homologous sequences [9] and [10]. We have demonstrated that artificial alternative-splicing libraries of the human estrogen receptor α ligand binding domain (hERα LBD) [11] can be successfully constructed by RM-PCR (Fig. 1A). We were interested in whether affinity binders for a ligand structurally unrelated to estrogen could be obtained from such a library. Here we describe the isolation of GTP-binding proteins from an artificial alternative-splicing library of hERα LBD, using an mRNA display technique [12] and [13] combined with a GTP-affinity selection procedure.

Here they generate about 600 total variants to begin with, by recombining protein fragments by shuffling around exons and eventually select two GFP binding proteins from this library:


The most significant insight obtained from this study is that a protein enriched from a combinatorial protein library based on building blocks from a single protein domain showed binding specificity for a ligand unrelated to that of the parent domain. This indicates that appropriate combinations of building blocks obtained from a parent domain can reorganize to form an environment that can interact specifically with another ligand. Previously, Keefe and Szostak [20] obtained novel ATP-binding proteins from an artificial random-sequence library, and estimated the probability of their formation as less than 10−11. By contrast, in this study, we demonstrated that our combinatorial protein library of about 600 members contains a protein with a specific binding activity that was not exhibited by the parent domain. This suggests that the probability of obtaining such a protein by artificial splicing could be unexpectedly high (10−2 to 10−3). Thus, peptide blocks derived from natural proteins may have greater functional plasticity than has been believed.

That’s two functions in 600, or 1 in 300. That would make Axe wrong by 75 orders of magnitude.

Type B experiments:
Can an Arbitrary Sequence Evolve Towards Acquiring a Biological Function?

To explore the possibility that an arbitrary sequence can evolve towards acquiring functional role when fused with other pre-existing protein modules, we replaced the D2 domain of the fd-tet phage genome with the soluble random polypeptide RP3-42. The replacement yielded an fd-RP defective phage that is six-order magnitude lower infectivity than the wild-type fd-tet phage. The evolvability of RP3-42 was investigated through iterative mutation and selection. Each generation consists of a maximum of ten arbitrarily chosen clones, whereby the clone with highest infectivity was selected to be the parent clone of the generation that followed. The experimental evolution attested that, from an initial single random sequence, there will be selectable variation in a property of interest and that the property in question was able to improve over several generations. fd-7, the clone with highest infectivity at the end of the experimental evolution, showed a 240-fold increase in infectivity as compared to its origin, fd-RP. Analysis by phage ELISA using anti-M13 antibody and anti-T7 antibody revealed that about 37-fold increase in the infectivity of fd-7 was attributed to the changes in the molecular property of the single polypeptide that replaced the D2 domain of the g3p protein. This study therefore exemplifies the process of a random polypeptide generating a functional role in rejuvenating the infectivity of a defective bacteriophage when fused to some preexisting protein modules, indicating that an arbitrary sequence can evolve toward acquiring a functional role. Overall, this study could herald the conception of new perspective regarding primordial polypeptides in the field of molecular evolution…
The first generation was prepared by subjecting the random sequence in fd-RP to three consecutive rounds of error-prone PCR using primers P6 and P7. Among the variants generated, eight clones were arbitrarily chosen to be the member of the first generation. The clone with the highest infectivity, as evaluated by phage infectivity assay described below, was then selected to be the parent clone for the second generation.

They generate EIGHT random clones from a beginning random sequence 144 amino acids in length. Of these 8 clones, 4 of them have higher infectivity than the starting protein. That’s 4 in 8 of the proteins 144 amino acids in length that has some degree of function. Which comes out as One in four. That would make Axe wrong by an incredible 77 orders of magnitude.

Later on, in 2007, a similar experiment is carried out: Experimental Rugged Fitness Landscape in Protein Sequence Space

In this paper they basically repeat the experiment with a slightly shorter, 139 amino acid protein, but increase the number of generated mutants at each selection step to 1 million. The results are basically the same, they just manage to further improve the infectivity levels from a 240-fold increase achieved in the 2002 experiment, to a 1.7×10^4-fold (ablmost twenty thousand) increase in the 2007 experiment.

Besides multiple such studies that sample functional proteins in longer sequences (various methods are used to screen peptides between 50 and 800 amino acids in length in most of the papers I read) Basically you can go through the references given in the papers I cite here and find many more.

For smaller sequences, peptides <50 amino acids in length, there is an even more extensive literature. Going through all of it is more work than I can be bothered with, so here’s a few handpicked examples:

First up, Phage display screening identifies a novel peptide to suppress ovarian cancer cells in vitro and in vivo in mouse models

In this paper, a library of 1.28 x 10^9 unique peptide sequences , but only 7 random amino acids in length (which is short enough that those 1.28×10^9 sequences constitute the entire space of possible sequences), which was linked to a minor coat protein of phage pIII, was used to screen for the ability to bind and inhibit cancer cells in vitro AND in vivo in mice.
They write:

Phage libraries and biopanning

The Ph.D.-c7c phage display peptide library was purchased from New England Biolabs (Ipswich, MA, USA). This library contained approximately 1 × 10^13 pfu/mL phages with a diversity of 1.28 × 10^9 unique peptide sequences for up to 70 copies of each. Biopanning is an affinity selection technique for identifying peptides that bind to a given target [23]. In this study, the biopanning began with the incubation of the phage display peptide library with both normal and tumor cells, in which normal cells were used to deplete peptides that only bind to normal cells and then further incubated with tumor cells for identifying the peptides that only specifically bind to tumor cells.

My highlight in bold.
They find a peptide with the amino acid sequence SWQIGGN which successfully binds HO8910 ovarian cancer cells, yet does not bind normal healthy cells, and it also inhibits tumor growth in vitro. Technically they find others, but this is the best one from the selection protocols used.

Then they tested it in vivo:

As shown in Table 3, the volume of ascites, the number of tumors, the total number of disseminated tumor nodules and the tumor weight were all significantly smaller in the Peptide 1 treated cell group than those of the HO8910-negative control peptide and HO8910 parental cell groups (p  0.05). These results suggested that Peptide 1 was able to inhibit the growth and dissemination of ovarian cancer cells in this mouse transplantation model.

Here’s another one using the same kit, which identifies a completely different peptide (NPMIRRQ), also able to bind HO8910 ovarian cancer cells: Identification of a peptide specifically targeting ovarian cancer by the screening of a phage display peptide library.

Basically the same procedure and numbers, indicating the above mentioned result is not some sort of statistical fluke. So this particular function is known to exist in at least 1 in 5×10^8 sequences.

Next one, mRNA display selection of a high-affinity, Bcl-XL-specific binding peptide.

Bcl-X(L), an antiapoptotic member of the Bcl-2 family, is a mitochondrial protein that inhibits activation of Bax and Bak, which commit the cell to apoptosis, and it therefore represents a potential target for drug discovery. Peptides have potential as therapeutic molecules because they can be designed to engage a larger portion of the target protein with higher specificity. In the present study, we selected 16-mer peptides that interact with Bcl-X(L) from random and degenerate peptide libraries using mRNA display. The selected peptides have sequence similarity with the Bcl-2 family BH3 domains, and one of them has higher affinity (IC(50)=0.9 microM) than Bak BH3 (IC(50)=11.8 microM) for Bcl-X(L) in vitro. We also found that GFP fusions of the selected peptides specifically interact with Bcl-X(L), localize in mitochondria, and induce cell death. Further, a chimeric molecule, in which the BH3 domain of Bak protein was replaced with a selected peptide, retained the ability to bind specifically to Bcl-X(L). These results demonstrate that this selected peptide specifically antagonizes the function of Bcl-X(L) and overcomes the effects of Bcl-X(L) in intact cells. We suggest that mRNA display is a powerful technique to identify peptide inhibitors with high affinity and specificity for disease-related proteins.

The paper is a bit unclear on the exact numbers here, but they do manage to find 19 functional peptides that bind Bcl-X(L) and induce apoptosis, in a library of at least 10^12 sequences (the total space for 16 amino acid peptides is roughly 6.6×10^20).

Next up is: Identification of Calmodulin Isoform-specific Binding Peptides from a Phage-displayed Random 22-mer Peptide Library

In this one they manage to isolate 80 unique, 22 amino acid-long peptides from a pool of about 4.8×10^8. That comes out at 1 in 6 million peptides that has the particular function searched for, in a library with a total size of about 4×10^28.

Much more can be said about the possibility of evolving larger proteins from smaller fragments (and smaller fragments from dimers and trimers) but that is beyond the scope of what I wanted to achieve in this post. Any criticism of my case here is welcome.

152 thoughts on “Axe, EN&W and protein sequence space (again, again, again)

  1. AhmedKiaan:
    Glutathione is made up of 3 AAs and it does stuff.

    Thanks Ahmed. I checked at uniprot and found tons of short proteins there.
    Miracles. All of them

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.