Axe, EN&W and protein sequence space (again, again, again)

So I wanted to write a post on the whole “density of functional proteins in amino acid sequence space” because, it seems to me, most ID proponents have some completely unwarranted beliefs about it. Recently it seems the main champion of the idea that evolving new protein functions would be so unlikely as to be practically impossible for evolution, even given several billion years, planet-spanning population sizes and natural selection, is Douglas Axe. He produced a paper in an (actual) peer reviewed journal back in 2004, and ID creationists have been spinning it ever since.

In 2007 a review of Axe 2004 appeared on Panda’s Thumb, by Arthur Hunt: Axe (2004) and the evolution of enzyme function. It recaps what Axe actually did and what he concluded from it, and I will add a bit about how it’s been pushed in ID circles since (see for example how it is pushed here: Imagine: 60 Million Proteins in One Cell Working Together.

This paper is interesting because it relates to the work of Douglas Axe that resulted in a paper in the Journal of Molecular Biology in 2004. Axe answered questions about this paper earlier this year, and also mentioned it in his recent book Undeniable (p. 54). In the paper, Axe estimated the prevalence of sequences that could fold into a functional shape by random combinations. It was already known that the functional space was a small fraction of sequence space, but Axe put a number on it based on his experience with random changes to an enzyme. He estimated that one in 10^74 sequences of 150 amino acids could fold and thereby perform some function — any function.

Notice the last part I highlighted in bold. This is apparently what most ID-creationists believe, because this is what the higher-ups in the ID movement claim that Axe’s research shows. It does not, as I will attempt to demonstrate pretty conclusively.
To still believe that Axe’s number* is an estimate(never mind a correct estimate) of the number of “sequences of 150 amino acids [that] could fold and thereby perform some function — any function” – will be a form of blind faith after you’ve read this post through. Provided you understand it. Which isn’t too much to ask, because it’s really not that complicated.

*(The number seems to fluctuate a bit in the various writings, between 10^64 and 10^90, I’ll just go by 10^77 as it was used back in 2004)

Back to Arthur Hunt on Panda’s Thumb:

Axe (2004) has performed site directed mutagenesis experiments on a 150-residue protein-folding domain within a B-lactamase enzyme. His experimental method improves upon earlier mutagenesis techniques and corrects for several sources of possible estimation error inherent in them. On the basis of these experiments, Axe has estimated the ratio of (a) proteins of typical size (150 residues) that perform a specified function via any folded structure to (b) the whole set of possible amino acids sequences of that size. Based on his experiments, Axe has estimated his ratio to be 1 to 10^77. Thus, the probability of finding a functional protein among the possible amino acid sequences corresponding to a 150-residue protein is similarly 1 in 10^77.

Notice the last sentence, that’s the key claim much of ID-arguments against evolution of new proteins, new enzymes, new protein domains and folds, rests on. The claim that the probability of finding a functional protein 150 amino acids in length is 1 in 10^77. That would of course imply it would take, on average, 10^77 sequences before you found a functional one if you just randomly generated them.

Put another way, say you wanted to find some arbitrary function in the sequence space of 150 amino acid-long proteins, you’d probably have to generate all of those 10^77 sequences before you found it. At least on average. And if you then wanted to find one more function, a different arbitrarily picked one, you’d have to generate another 10^77 sequences.

To put that number into perspective, quick back of the envelope calculation suggests Axe is saying that you could literally fill a sphere the size of a small galaxy, around ten thousand lightyears in diameter, up completely with random protein sequences 150 amino acids in length, and in this unfathomably vast sphere packed densely with random polymers you’d only find A SINGLE FUNCTIONAL SEQUENCE.

By the way, the crap they’re pushing on EN&W now as vjtorley brought to our attention, has become so absurd they’re suggesting functional proteins exist at a rate in sequence space so low that you could fill the entire observable universe with randomly generated proteins – and in this universe-sized sphere of proteins there’d be a good chance you still wouldn’t find a single functional one. C’mon guys?

I honestly think the person who wrote the EN&W article doesn’t understand how exponents work. A number like 10^240 is, well I can’t draw an analogy that would make such a quantity intuitively comprehensible. The whole thing is ridiculous in the extreme. I’m a sucker for a bit of hyperbole I readily concede, but in this particular instance, I have a hard time finding a way to exaggerate how utterly ridiculous their numbers are. Someone over there has lost his/her mind, or wasn’t around when exponents were taught in school.

Anyway, in Axe’s own words from his paper:

Combined with the estimated prevalence of plausible hydropathic patterns (for any fold) and of relevant folds for particular functions, this implies the overall prevalence of sequences performing a specific function by any domain-sized fold may be as low as 1 in 10(77), adding to the body of evidence that functional folds require highly extraordinary sequences.

This seems to imply even Axe himself really believes that is the true density of functional proteins in 150 amino acid sequence space. This is also what your average ID proponent today believes. This is the number they read in books like Darwin’s Doubt, is pushed on EN&W and they believe this number corresponds to the odds of finding anything functional in 150 amino acid-sized sequence space. I don’t see how you can believe this isn’t what Axe, Meyer and colleagues wants them to believe, because they’ve never taken even a single step to correct that misapprehension. Indeed, they’re actively pushing it as the correct way to understand Axe 2004.

My contention will be that this number simply cannot be correct because it is directly contradicted by experiment. In the words of Richard Feynman:
”In general, we look for a new law by the following process. First, we guess it (audience laughter), no, don’t laugh, that’s really true. Then we compute the consequences of the guess, to see what, if this is right, if this law we guess is right, to see what it would imply and then we compare the computation results to nature, or we say compare to experiment or experience, compare it directly with observations to see if it works.
If it disagrees with experiment, it’s wrong. In that simple statement is the key to science. It doesn’t make any difference how beautiful your guess is, it doesn’t matter how smart you are who made the guess, or what his name is… If it disagrees with experiment, it’s wrong. That’s all there is to it.”

Instead of arguing much about the technicalities of whether such an inference can be extracted by Axe’s experiment (though I will argue briefly against that too), I will primarily bring experimental results that contradict it. Because it should be pretty obvious that, regardless of technical and logical arguments, if actual laboratory experiments consistently and massively contradict Axe’s number, “It’s wrong. That’s all there is to it.”

I will give evidence that it is wrong by at least 60 orders of magnitude.

Okay, so the first thing to note is that one can make a single factual statement that immediately throws what ID-creationists believe about Axe’s number into very serious doubt:
1. Axe only tested for one function, not all possible functions.
There, done. At a basic level, this would be enough. I could stop here and you should immediately disbelieve Axe’s conclusion. It cannot possibly be extracted from the experiment he did.

He could have tested for hundreds of additional catalytic functions, protein-DNA/protein-RNA interactions, protein-protein binding, small molecule binding and so on. There are quite literally millions of functions one can test for, besides the single function the enzyme originally had. Yet all he did was test how many mutations it took for the original function to disappear below detectable levels, then try to derive the total sequence space that correspond to the fold and then calculate the proportion of them in turn that retain the one tested function.

Ian Musgrave also pointed this out back on Panda’s Thumb in the comments section with this very informative post:

Ian Musgrave | January 15, 2007 12:40 AM
Katerina Wrote: “Does this mean that if the media included antibiotics besides penicillin, a structure associated with one of those peaks may have chanced to work on them?”
Yes. Axe only looked at ampicillin resistance (a relative of penicillin). It is entirely possible that other functional beta lactamases activities were conserved, or new ones generated. Mutations can indeed generate novel functions, and if you only look at one function, you may miss the big picture.

If you look at the data of Peimbert M, Segovia L. Evolutionary engineering of a beta-Lactamase activity on a D-Ala D-Ala transpeptidase fold. Protein Eng. 2003 Jan;16(1):27-35, where they mutated the D-Ala D-Ala transpeptidase (one of the family of enzymes from which the beta-lactamases first evolved from) and looked only at ampicilin, you would conclude that those mutations had no effect, while missing out on enormous changes in cefotaxamine resistance (and changes in quite a few other natibiotics as well). Even with related substrates, some showed big changes others little, so without a good panel of beta-lactams to check the function of the crippled enzyme, the functional space represented is under sampled.

Speaking of undersampling, it would have been good to look at the breakdown of common substrates of DD-peptidyl ligases and beta lactamases, such as m-[[(phenylacetyl)-glycyl]oxy]benzoic acid and see what the muations did to these activities. Beta lactamases have kept these activities, despite structural changes which have resulted in losing DD-peptidyl ligase activity, if these activities are kept, then the enzyme is more robust than Axe contends (but to be fair this is harder to monitor in the kinds of experiments Axe did)..

Stephen Matheson makes a similar point on this blog back in 2011 too: Exploring the protein universe: a response to Doug Axe

6. In my opinion, Axe significantly overstates his findings on the topic of “function.” So for example, in both the 2004 paper and the new BIO-Complexity paper, the experiments involve measuring a single function for each enzyme. It seems to me (and I could be wrong) that when the authors see that a particular variant (mutant) of the protein stops performing that one function, they conclude that the protein “has no function.” (In the BIO-Complexity paper, it’s two proteins and two functions, but the point is the same.) But of course we don’t know that, and evolutionary explanations would propose that new functions frequently arise when an enzyme has more than one function (or is broad-based in its function, or is modular in its structure and function).

How in the hell can Axe then conclude that his results extend to all functions for all sequences of that size? He can’t and it’s obvious. Yet that is what ID proponents now, today, believe his number represents. That is what Stephen Meyer writes in Darwin’s Doubt that Axe has shown. This is what they’re pushing on EN&W.

It should be obvious that you can’t just take an enzyme, throw mutations into it and then when it stops working, say “this means there’s no other function out there”. That obviously doesn’t follow.
For those reasons, Axe’s number just can’t be trusted as an estimation of that. It is a number that represents an estimate of a single very particular thing, which is the frequency of sequences 150 amino acids in length, that adopt the known Beta-lactamase fold and catalyze break down of ampicillin. For that reason it is NOT an estimate of the frequency of all functional proteins in all of 150 amino acid sequence space.
Arthur Hunt points out other issues that also suggest that even for the particular thing the number is an estimate of, it’s probably still an underestimate(meaning the frequency is estimated too low). I don’t need that for what I will show, however.

The experimental evidence will consist of a sample of papers that correspond to three types of experiments. I’ll just call them tybes A, B and C.
Type A experiments will be references to papers wherein the researchers generated either totally random protein sequences of lengths over 50 amino acids with no bias for structure, function or folding and then subsequently screened large libraries for functions they decided on.
Or they will be experiments where the researchers randomly assembled larger proteins from smaller fragments of already functional proteins in a type of fragment-shuffling and recombination, which is incidentally how the evolutionary process is thought to have actually made most of the proteins in extant life (from insertions and duplications of smaller parts of already existing proteins into others).

Tybe B will be references to papers wherein the researchers generated protein sequences of various lengths, but this time with the random sequences constituting only a smaller portion of a larger fixed structure. In these experiments the “sequence neighborhood” of these fixed folded structures are then probed for functions by generating variations of these structures with mutations in a subset of the proteins. These types of experiments demonstrate that if you have a functional protein already, there’s usually another function close by in sequence space, meaning most of the functionality is arranged in interconnected networks that can be navigated by relatively few substitutions. Which basically makes Axe’s claim doubly absurd, since there’s probably another function nearby.

Type C will be references to papers that are similar to types A and B, but the sizes of the proteins are less that 50 amino acids, typically in the 7-22 amino acid range. These types of experiments demonstrate that cellular functions don’t need to be carried out by huge, multi-domain proteins and that such large extant proteins could be generated by the stepwise accretion of smaller fragments over the course of evolution (and probably even a random process at or around the origin of life).

Type A experiments:

First I will note that subsequent experiments by the Szostak lab repeated the results (from the Keefe & Szostak paper I have mentioned before around here several times) for a different polymer and different conditions, among them they also selected and evolved an ATP binding RNA structure from a pool of randomized polymers, under “physiological pH and salt conditions”.

The numbers were about the same as the experiment with random protein polymers, approximately 1 in 10^11 had the desired function. That’s 66 orders of magnitude more frequent than what IDcreationist believe Axe 2004 has shown.

But it gets better: Another lab subsequently decided to elucidate the crystal structure of the originally evolved ATP binding protein and found out, by accident, that it actually catalyzes ATP hydrolysis to ADP. The Szostak lab never tested for ATP hydrolysis so had no idea their randomly generated protein sequence, originally selected just for ATP binding, also had catalytic activity. It was an enzyme: (How would you know if you don’t test for it? Think about it)
A synthetic protein selected for ligand binding affinity mediates ATP hydrolysis.

Perhaps even more striking than the fold was the observation that each of these protein crystal structures contained electron density that was consistent with the presence of ADP in the ligand-binding pocket even though 1 molar equiv of ATP was present in the crystallization liquor (11, 12). Since the half-life of ATP for spontaneous hydrolysis is no shorter than 12 months (13) and protein crystals were obtained after 3 days of growth, it was unclear how ADP became bound to the protein. We reasoned that the presence of ADP in the protein crystal structure could be explained by one of three possible scenarios:
(i) the -phosphate of the ATP molecule is highly disordered and therefore the third phosphate of ATP is simply not visible in any of the electron density maps;
(ii) the Family B protein binds preferentially to ADP, which is present as a minor contaminant from the disproportionation of ATP to ADP and adenosine tetraphosphate (14); or
(iii) the Family B protein binds the ATP ligand in such a way that it is able to lower the activation barrier (.5 kcal mol1) and allow ATP hydrolysis to proceed at a rate that is faster than the uncatalyzed reaction.
Intrigued by the possibility that a protein selected entirely on the basis of its ability to bind ATP might also function as a primitive catalyst, we initiated a series of structural and biochemical experiments designed to explain the presence of ADP in our protein crystal structures.

They go on to find that yes, in fact, it’s an enzyme that catalyzes ATP hydrolysis. This is so funny. This team had not intended to test for catalytic function either, they were interested in the structure of the protein, and quite by accident they found out the protein was catalyzing the conversion of ATP to ADP.

How would you know whether your protein has some unknown function without testing for it? You can’t. You simply can’t. Axe has literally no idea whether his mutated Beta-Lactamases have lost any and all possible functions. Because he never bothered testing for anything but the single function the enzyme was known to have to begin with.

Subsequent experiments designed to elucidate whether such ATP binding and catalytic enzymes also existed in sequence space with smaller amino acid alphabets, were also successful: ATP selection in a random peptide library consisting of prebiotic amino acids.

Abstract
Based upon many theoretical findings on protein evolution, we proposed a ligand-selection model for the origin of proteins, in which the most ancient proteins originated from ATP selection in a pool of random peptides. To test this ligand-selection model, we constructed a random peptide library consisting of 15 types of prebiotic amino acids and then used cDNA display to perform six rounds of in vitro selection with ATP. By means of next-generation sequencing, the most prevalent sequence was defined. Biochemical and biophysical characterization of the selected peptide showed that it was stable and foldable and had ATP-hydrolysis activity as well.

These proteins were 137 amino acids in length (synthesized by joining together four random-sequence fragments of 25 amino acids with some spacer amino acids between, owing to the method of synthesis):

A DNA library with 108 sequential random codons was constructed by quadruplicating a DNA cassette consisting of 25 contiguous random VNM codons which was synthesized by standard phosphoramidite chemistry. The full length random-sequence library was assembled through digestion and ligation of DNA cassettes, as described previously [23,24]. We cloned and sequenced 30 individual library members from the final products. The results showed that each sequence was different. Given that each random DNA sequence was unique and the complexity of the library was essentially determined by the total number of molecules generated in the final ligation step [33,34], we can infer that the diversity of a 5 nmol DNA library is approximately 3 x 10^13. The random region (108 residues) of the resultant peptides (137 residues) is sufficiently long to form structured protein domains.

In this experiment they find many more ATP binding proteins in those 3×10^13 random sequences. The exact number isn’t stated in the paper, but they end up picking out 30 individual sequences.
That’s a function more prevalent in amino acid sequence space by at least 64 orders of magnitude compared to Axe’s estimate. And they only tested for one function. There could be others not tested for.

Here’s another: Evolvability of random polypeptides through functional selection within a small library.

A directed evolution with phage-displayed random polypeptides of about 140 amino acid residues was followed until the sixth generation under a selection based on affinity to a transition state analog for an esterase reaction. The experimental design deliberately limits the observation to only 10 clones per generation. The first generation consists of three soluble random polypeptides and seven arbitrarily chosen clones from a previously constructed library. The clone showing the highest affinity in a generation was selected and subjected to random mutagenesis to generate variants for the next generation. Even within only 10 arbitrarily chosen polypeptides in each of the generations, there are enough variants in accord to capacity of binding affinity. In addition, the binding capacity of the selected polypeptides showed a gradual continuous increase over the generation. Furthermore, the purified selected random polypeptides exhibited a gradual but significant increase in esterase activity. The ease of the functional development within a small sequence variety implies that enzyme evolution is prompted even within a small population of random polypeptides. …

How many random sequences need to be searched to permit the observance of evolving polypeptides towards acquiring higher functions? Is it feasible to decipher evolution from observing only a few variants per generation out of the vast number of available sequences, especially at the primitive stage? To address these issues, we deliberately limit our evolutionary studies on populations of arbitrarily chosen mutant random polypeptides to a maximum of 10 for each generation, although a phage display system with selection using transition state analogs (TSAs) has the capability of yielding highly diversified mutants of 108–9 rendering it an efficient tool for generating a protein catalyst (Patten et al., 1996; Fujii et al., 1998; Forrer et al., 1999). Here, we show that even examining only 10 arbitrarily chosen polypeptides with random sequences per generation could guarantee the observance of evolution, in which there is a continuous gradual increase in a function of the polypeptides via iterating mutation and selection.

Here the frequency of enzymes with Esterase functions is apparently ONE IN 1000 random sequences 140 amino acids in length (the initial library is 1000 members, the subsequent selection steps are from populations of only 10). Let that sink in. 1 in 1000. Not 1 in 10 to the seventy-seventh power. Just one in one thousand. That means Axe’s 1in 10^77 estimate is wrong by 74 orders of magnitude.

Next up we have this: Effective selection system for experimental evolution of random polypeptides towards DNA-binding protein.

Abstract
An experimental evolution with selection based on binding affinity to DNA was carried out on a library of phage-displayed random polypeptides of about 140 amino acid residues. First, we constructed a system to artificially evolve phage-displayed random polypeptides toward binding to a target DNA containing a restriction enzyme site, in which random polypeptides capable of binding the DNA were recovered as complexes with the target DNA by digestion with the restriction enzyme. The experimental evolution cycle, including the above selection system and random mutagenesis for generating the next mutant library, was repeated until the fourth generation. The ability to bind to the DNA was enhanced per generation. In the fourth generation, convergence of the selected clones to a dominant sequence was observed. These results indicate that the newly constructed selection system is effective for exploring the evolvability of random polypeptides towards DNA-binding proteins.

In this paper they find a function at a rate of about 1 in 10^6 (a million) random amino acid sequences of 140 length. That’s 71 orders of magnitude more prevalent than the IDcreationist belief about what Axe 2004 showed.

In searching for publications for this post I also discovered the second type A experiment, where the method was randomized recombination of smaller fragments of proteins(which can also be generated by alternative splicing): In vitro selection of GTP-binding proteins by block shuffling of estrogen-receptor fragments

Previously we have developed random multi-recombinant PCR (RM-PCR), which enables the combination of different DNA fragments without homologous sequences [9] and [10]. We have demonstrated that artificial alternative-splicing libraries of the human estrogen receptor α ligand binding domain (hERα LBD) [11] can be successfully constructed by RM-PCR (Fig. 1A). We were interested in whether affinity binders for a ligand structurally unrelated to estrogen could be obtained from such a library. Here we describe the isolation of GTP-binding proteins from an artificial alternative-splicing library of hERα LBD, using an mRNA display technique [12] and [13] combined with a GTP-affinity selection procedure.

Here they generate about 600 total variants to begin with, by recombining protein fragments by shuffling around exons and eventually select two GFP binding proteins from this library:

Discussion

The most significant insight obtained from this study is that a protein enriched from a combinatorial protein library based on building blocks from a single protein domain showed binding specificity for a ligand unrelated to that of the parent domain. This indicates that appropriate combinations of building blocks obtained from a parent domain can reorganize to form an environment that can interact specifically with another ligand. Previously, Keefe and Szostak [20] obtained novel ATP-binding proteins from an artificial random-sequence library, and estimated the probability of their formation as less than 10−11. By contrast, in this study, we demonstrated that our combinatorial protein library of about 600 members contains a protein with a specific binding activity that was not exhibited by the parent domain. This suggests that the probability of obtaining such a protein by artificial splicing could be unexpectedly high (10−2 to 10−3). Thus, peptide blocks derived from natural proteins may have greater functional plasticity than has been believed.

That’s two functions in 600, or 1 in 300. That would make Axe wrong by 75 orders of magnitude.

Type B experiments:
Can an Arbitrary Sequence Evolve Towards Acquiring a Biological Function?

Abstract
To explore the possibility that an arbitrary sequence can evolve towards acquiring functional role when fused with other pre-existing protein modules, we replaced the D2 domain of the fd-tet phage genome with the soluble random polypeptide RP3-42. The replacement yielded an fd-RP defective phage that is six-order magnitude lower infectivity than the wild-type fd-tet phage. The evolvability of RP3-42 was investigated through iterative mutation and selection. Each generation consists of a maximum of ten arbitrarily chosen clones, whereby the clone with highest infectivity was selected to be the parent clone of the generation that followed. The experimental evolution attested that, from an initial single random sequence, there will be selectable variation in a property of interest and that the property in question was able to improve over several generations. fd-7, the clone with highest infectivity at the end of the experimental evolution, showed a 240-fold increase in infectivity as compared to its origin, fd-RP. Analysis by phage ELISA using anti-M13 antibody and anti-T7 antibody revealed that about 37-fold increase in the infectivity of fd-7 was attributed to the changes in the molecular property of the single polypeptide that replaced the D2 domain of the g3p protein. This study therefore exemplifies the process of a random polypeptide generating a functional role in rejuvenating the infectivity of a defective bacteriophage when fused to some preexisting protein modules, indicating that an arbitrary sequence can evolve toward acquiring a functional role. Overall, this study could herald the conception of new perspective regarding primordial polypeptides in the field of molecular evolution…
The first generation was prepared by subjecting the random sequence in fd-RP to three consecutive rounds of error-prone PCR using primers P6 and P7. Among the variants generated, eight clones were arbitrarily chosen to be the member of the first generation. The clone with the highest infectivity, as evaluated by phage infectivity assay described below, was then selected to be the parent clone for the second generation.

They generate EIGHT random clones from a beginning random sequence 144 amino acids in length. Of these 8 clones, 4 of them have higher infectivity than the starting protein. That’s 4 in 8 of the proteins 144 amino acids in length that has some degree of function. Which comes out as One in four. That would make Axe wrong by an incredible 77 orders of magnitude.

Later on, in 2007, a similar experiment is carried out: Experimental Rugged Fitness Landscape in Protein Sequence Space

In this paper they basically repeat the experiment with a slightly shorter, 139 amino acid protein, but increase the number of generated mutants at each selection step to 1 million. The results are basically the same, they just manage to further improve the infectivity levels from a 240-fold increase achieved in the 2002 experiment, to a 1.7×10^4-fold (ablmost twenty thousand) increase in the 2007 experiment.

Besides multiple such studies that sample functional proteins in longer sequences (various methods are used to screen peptides between 50 and 800 amino acids in length in most of the papers I read) Basically you can go through the references given in the papers I cite here and find many more.

For smaller sequences, peptides <50 amino acids in length, there is an even more extensive literature. Going through all of it is more work than I can be bothered with, so here’s a few handpicked examples:

First up, Phage display screening identifies a novel peptide to suppress ovarian cancer cells in vitro and in vivo in mouse models

In this paper, a library of 1.28 x 10^9 unique peptide sequences , but only 7 random amino acids in length (which is short enough that those 1.28×10^9 sequences constitute the entire space of possible sequences), which was linked to a minor coat protein of phage pIII, was used to screen for the ability to bind and inhibit cancer cells in vitro AND in vivo in mice.
They write:

Phage libraries and biopanning

The Ph.D.-c7c phage display peptide library was purchased from New England Biolabs (Ipswich, MA, USA). This library contained approximately 1 × 10^13 pfu/mL phages with a diversity of 1.28 × 10^9 unique peptide sequences for up to 70 copies of each. Biopanning is an affinity selection technique for identifying peptides that bind to a given target [23]. In this study, the biopanning began with the incubation of the phage display peptide library with both normal and tumor cells, in which normal cells were used to deplete peptides that only bind to normal cells and then further incubated with tumor cells for identifying the peptides that only specifically bind to tumor cells.

My highlight in bold.
They find a peptide with the amino acid sequence SWQIGGN which successfully binds HO8910 ovarian cancer cells, yet does not bind normal healthy cells, and it also inhibits tumor growth in vitro. Technically they find others, but this is the best one from the selection protocols used.

Then they tested it in vivo:

As shown in Table 3, the volume of ascites, the number of tumors, the total number of disseminated tumor nodules and the tumor weight were all significantly smaller in the Peptide 1 treated cell group than those of the HO8910-negative control peptide and HO8910 parental cell groups (p  0.05). These results suggested that Peptide 1 was able to inhibit the growth and dissemination of ovarian cancer cells in this mouse transplantation model.

Here’s another one using the same kit, which identifies a completely different peptide (NPMIRRQ), also able to bind HO8910 ovarian cancer cells: Identification of a peptide specifically targeting ovarian cancer by the screening of a phage display peptide library.

Basically the same procedure and numbers, indicating the above mentioned result is not some sort of statistical fluke. So this particular function is known to exist in at least 1 in 5×10^8 sequences.

Next one, mRNA display selection of a high-affinity, Bcl-XL-specific binding peptide.

Abstract
Bcl-X(L), an antiapoptotic member of the Bcl-2 family, is a mitochondrial protein that inhibits activation of Bax and Bak, which commit the cell to apoptosis, and it therefore represents a potential target for drug discovery. Peptides have potential as therapeutic molecules because they can be designed to engage a larger portion of the target protein with higher specificity. In the present study, we selected 16-mer peptides that interact with Bcl-X(L) from random and degenerate peptide libraries using mRNA display. The selected peptides have sequence similarity with the Bcl-2 family BH3 domains, and one of them has higher affinity (IC(50)=0.9 microM) than Bak BH3 (IC(50)=11.8 microM) for Bcl-X(L) in vitro. We also found that GFP fusions of the selected peptides specifically interact with Bcl-X(L), localize in mitochondria, and induce cell death. Further, a chimeric molecule, in which the BH3 domain of Bak protein was replaced with a selected peptide, retained the ability to bind specifically to Bcl-X(L). These results demonstrate that this selected peptide specifically antagonizes the function of Bcl-X(L) and overcomes the effects of Bcl-X(L) in intact cells. We suggest that mRNA display is a powerful technique to identify peptide inhibitors with high affinity and specificity for disease-related proteins.

The paper is a bit unclear on the exact numbers here, but they do manage to find 19 functional peptides that bind Bcl-X(L) and induce apoptosis, in a library of at least 10^12 sequences (the total space for 16 amino acid peptides is roughly 6.6×10^20).

Next up is: Identification of Calmodulin Isoform-specific Binding Peptides from a Phage-displayed Random 22-mer Peptide Library

In this one they manage to isolate 80 unique, 22 amino acid-long peptides from a pool of about 4.8×10^8. That comes out at 1 in 6 million peptides that has the particular function searched for, in a library with a total size of about 4×10^28.

Much more can be said about the possibility of evolving larger proteins from smaller fragments (and smaller fragments from dimers and trimers) but that is beyond the scope of what I wanted to achieve in this post. Any criticism of my case here is welcome.

152 thoughts on “Axe, EN&W and protein sequence space (again, again, again)

  1. phoodoo: It seems Otangelo has given Rumraket many substantive replies, so isn’t it a bit undignified for Rumraket to say he is going to ignore Otangelo?

    HAHa, that’s a good one.
    I took you off ignore earlier today, thank you for reminding me why you ended up there in the first place.

    I could not ask for a better demonstration that you are just a partisan hack. You will agree with the most insane nonsense imaginable just because it’s said by someone on “your” side.

  2. Mikkel wrote
    God could have created first life and let the rest evolve. As usual your logic is completely messed up.

    Well,

    if you agree on that, our views would be in perfect harmony ……..because that is EXACTLY what i support.

    Special creation of limited variation with INBUILT mechanisms of adaptation ( evolution ) , which has resulted into the biodiversity as we see today.

  3. Mung: At the top of the page select New -> Post.

    Then post something in the Moderation Issues thread asking an admin to release it.

    ETA: this was not an attempt to derail the thread, but you can never tell where these things are going to go.

    Well, you would think anyone could start a thread, but apparently not. Alan says this thread title is not allowed:

    John Harshman think Nilsson Pilger’s fairytale on eye evolution is science.

  4. phoodoo,

    Here is the rest of the thread that Alan is censoring:

    In an early post John, who wants to be called doctor, urged readers to take a look at a little paper by Dan Nilsson and Suzanne Pilger. He says it is a good conceptual example of how natural selection acting on variation can gradually create a new feature.

    Gee, that must be quite a paper. He says it can’t be beat! So what does their paper actually show? The paper is called , “A Pessimistic Estimate of the Time Required for an Eye to Evolve.” Well, that does sound interesting!

    A light sensitive patch will gradually turn into a focus lensed eye in 100,000 years. But of course they are using a pessimistic estimate, probably would happen even faster! Wow.

    In fact the paper is written in such a way as to make many people falsely believe that Nilsson and Pilger had come up with a computer model to show the steps of eye evolution, from “random” mutations. Now that is something.

    But because they didn’t ACTUALLY make any computer model (who has time) the task was made much easier for Nilsson and Pilger. Just take an eye, unfold all the pieces, and put them back together, one step at a time. POOF! (Yes Patrick, that is a poof!). But let’s make it even easier, because well, time, and let’s start with light sensitive cells. Then just poof in a depression where the light sensitive cells are. Next poof in another, deeper impression. That sure does help focus the light doesn’t it? Let’s keep that mutation!

    From there we can just keep poofing away. Fill the cavity with fluid. Poof in a lens. Make the lens better. Poof in some rods and cones. All at random mind you. Make them better…Keep those mutations (don’t let them mutate themselves, because they won’t help building an eye at all!) Rudyard Kipling would be so proud.

    This fable has already been passed down to generations. Take a look, these brilliant scientist doctors say, an eye is not so hard after all. If only Darwin had known Nilsson and Pilger he never would have had reason to doubt himself. Its as easy as POOF!

    Of course John is apparently intimidated to engage with people who actually challenge him (that is why here is generally a safe space with the moderators can give him cover, but he still hides). So what do you think?

    Doesn’t the fact that Alan won’t let me prove the point about the cover moderation gives some people?

  5. Why is no one interested in writing a program that randomly generates sequences and compares the randomly generated sequence to known proteins?

    Would it just take too long to find a match? But isn’t that the point? The “search space” is simply “hyper-astronomical.”

  6. phoodoo: Well, you would think anyone could start a thread, but apparently not. Alan says this thread title is not allowed:

    John Harshman think Nilsson Pilger’s fairytale on eye evolution is science.

    Try this one:

    A graduate of the University of Chicago with a Ph.D. in Evolutionary Biology thinks Nilsson Pilger’s fairytale on eye evolution is science.

  7. Mung: Why is no one interested in writing a program that randomly generates sequences and compares the randomly generated sequence to known proteins?

    Would it just take too long to find a match? But isn’t that the point? The “search space” is simply “hyper-astronomical.”

    Mung is getting forgetful. There were all those threads on Weasels that covered this, exhaustively, including providing code. Some of which was provided by Mung.

  8. But, we actually have databases now of protein sequences. Real data. We can get as many programs running simultaneously as we need. The search space is still “hyper-astronomical.”

    Or, if it pleases you, we could select a known sequence from the database and gradually change the sequence until we find that it matches another known protein.

    I’m flexible.

    The point is we have real computers that we have a dataset that we can use to test our theories about the nature of protein sequence space.

  9. Mung:
    But, we actually have databases now of protein sequences. Real data. We can get as many programs running simultaneously as we need. The search space is still “hyper-astronomical.”

    Or, if it pleases you, we could select a known sequence from the database and gradually change the sequence until we find that it matches another known protein.

    I’m flexible.

    The point is we have real computers that we have a dataset that we can use to test our theories about the nature of protein sequence space.

    What is it you imagine this research program might accomplish? It seems that all you would manage to show is that sequence space is very much larger than the number of known proteins. You don’t need a simulation or a database for that. All you need is a count of proteins and a calculation of the size of sequence space.

    Beyond that, you would show nothing useful. What you need to know is the ratio of sequences that perform some function to the total. But known proteins are not the total of all possible functional proteins. They’re just what we happened to get. It’s Texas sharpshooter all the way.

  10. Mung,

    Why is no one interested in writing a program that randomly generates sequences and compares the randomly generated sequence to known proteins?

    Would it just take too long to find a match? But isn’t that the point? The “search space” is simply “hyper-astronomical.”

    It would be utterly pointless. The ‘search space’ of bridge hands is pretty big. I could, if I were really retarded, write a program that eventually declared victory when it hit the hand just dealt. Yet bridge hands are routinely dealt.

    It’s not the size of the space, but the density of function that matters. Every bridge hand is functional, of course. But there are not simply two alternatives: everything-functional vs 1-single-target. I feel I am insulting my audience by having to spell this out in words.

  11. Mung,

    I can imagine these people who like to call themselves doctors, who are not medical doctors, being on a plane when a medical emergency arises. The stewardess calls out, “Are there any doctors on the plane?” John goes rushing to the front of the plane, “What can I help you with?”

    The stewardess points to the man having a seizure on the floor, and John says “Well, you know, birds fly better with one ovary than with two.”

    Then Lizzie rushes to the front of the plane, “What is it?” She sees the man on the ground and says, “His groans remind me of Dmitri Hvorostovsky in his prime.”

    Joe Felsenstein then arrives and says “Get back, get back, I am going to make an algorithm about computer generated tape worms, that may or may not have some relevance to reality, but have a smooth landscape”

    DNA Jock pushes everyone back and announces, “There is nothing that can be done. I have done the math and its all God’s fault. If there was a God. Which there isn’t of course, because I have done the math. Tom English says Hi by the way. He says IDists are all assholes.”

  12. Allan Miller: It’s not the size of the space, but the density of function that matters.

    I’m assuming that the proteins in our protein databases are in fact functional. Is that a bad assumption?

  13. John Harshman: But known proteins are not the total of all possible functional proteins. They’re just what we happened to get. It’s Texas sharpshooter all the way.

    Being from Texas, I ought to know what the fallacy consists of, but I don’t. So you, as a critic of Axe, resort to an argument from what we don’t know? Is that a fallacy?

    I just love it how the evidence that evolutionists need is nowhere to be found. And if they don’t have the evidence they need to support their speculative theories, they shift the burden of proof.

    Then they complain about being asked for details. Or actual probability calculations.

    But even so, when you start to really calculate the size of protein sequence space, it is beyond imaginable. Proteins of length 150AA are just a drop in the proverbial bucket.

    Given time, and resources, how much of that “hyper-astronomical” space could even be searched? Forget whether the search even turned up a functional protein.

  14. Mung,

    I’m assuming that the proteins in our protein databases are in fact functional. Is that a bad assumption?

    Not at all.

  15. Mung,

    Given time, and resources, how much of that “hyper-astronomical” space could even be searched? Forget whether the search even turned up a functional protein.

    Is it your opinion that there is a minimum length of peptide below which no function of any kind can be found? If so, what is that length?

  16. Allan Miller: Every bridge hand is functional, of course. But there are not simply two alternatives: everything-functional vs 1-single-target. I feel I am insulting my audience by having to spell this out in words.

    You are insulting your audience. Every bridge hand is functional. By analogy [your analogy, not mine], ever AA sequence is functional. We have some reason to believe that every bridge hand is functional. What reason do we have to believe that every AA sequence is functional?

    Do let us know when you have more time, I can start an OP on Compositional Evolution.

  17. Mung,

    Oh, fer chrissakes. You are trying to score a point by repeating something I already said. Yes, every bridge hand is functional. No, not every possible peptide is functional. Does that mean there is only one functional peptide? What is it with you people and dichotomies?

  18. I’ve just re-read: how could you possibly think I was suggesting that the density of functional peptides is the same as the density of functional bridge hands?

  19. Allan Miller: Is it your opinion that there is a minimum length of peptide below which no function of any kind can be found? If so, what is that length?

    No, not at all. It’s about the size of the search space and how much of it could reasonably searched given the available time and resources.

    But if I look in our protein databases, how many peptide sequences of length one will I find? Next we have 2^20 then 3^20 then 4^20. [Did I muck that up, lol. After calling out Nick Matzke for getting it wrong? probably :)]

    Or, if you don’t like 20AA at each location, what is your theory of the earliest coding apparatus and the amino acids it coded for? 2^6, 3^6, 4^6, 150^6?

  20. How to create an ‘impossible peptide’: change the STOP codon into an amino acid assignment. Bingo. It has a tail, and the chance of that peptide arising by random picks from an amino acid bingo machine (‘cos of course that’s how peptides are really made) increase as-ter-o-gnomically.

    There aren’t enough quarks to carry that number. Never mind that, there aren’t enough quarks in a mole of universes to signify that number. Imagine you were … OK, It’s big. Fucking big.

  21. Mung,

    What is it with “you people” and your lame analogies?

    One of many devices used to attempt to illustrate matters to the determinedly unillustratable.

  22. Mung,

    Or, if you don’t like 20AA at each location, what is your theory of the earliest coding apparatus and the amino acids it coded for? 2^6, 3^6, 4^6, 150^6?

    I think initial ribosomal peptide bonding could easily be achieved with an amino acid library of 1.

  23. Mung: Being from Texas, I ought to know what the fallacy consists of, but I don’t.

    You point a shotgun at the side of a barn and let fly. Then you paint a bullseye around the spot where each pellet hit and announce that you have performed a miracle of sharpshooting. And this is what you do when you assume that the proteins we see are equivalent to all possible functional proteins.

    And once more, your research program is useless because the result can be known in advance to anyone who cares to perform a simple calculation.

  24. Allan Miller: I’ve just re-read: how could you possibly think I was suggesting that the density of functional peptides is the same as the density of functional bridge hands?

    Every bridge hand is functional…

    You offered that by way of analogy. Perhaps I misunderstood your analogy.

    In bridge, every hand is functional, but every protein sequence is not functional, so it’s not all or nothing. I can’t even make sense of that to be honest. Why even introduce bridge hands at all?

    And why accuse me of thinking there is only one target when that’s rather obviously NOT what I was saying?

    But there are not simply two alternatives: everything-functional vs 1-single-target.

    I’m talking about an entire database of functional proteins and you talk as if I am speaking of “1-single-target.”

  25. John Harshman: And this is what you do when you assume that the proteins we see are equivalent to all possible functional proteins.

    I never claimed that the proteins in our databases exhaust all possible functional proteins. Do you understand that?

    ETA: Nor did I assume it.

  26. Mung,

    I’m talking about an entire database of functional proteins and you talk as if I am speaking of “1-single-target.”

    No, of course, you reject that characterisation. That was rather the point of the game. “of course I’m not saying the space has one functional member. But I am saying (without realising) that the space just has these functional members”. The rest is dead air, and it would take yadda yadda yadda.

  27. Allan Miller: But I am saying (without realising) that the space just has these functional members…

    I neither claimed nor assumed that the proteins in our databases exhaust all possible functional proteins. Do you understand that?

    Why the HELL would I claim or assume that?

  28. Mung: I neither claimed nor assumed that the proteins in our databases exhaust all possible functional proteins. Do you understand that?

    Why the HELL would I claim or assume that?

    Are you familiar with the phrase “nailing jello to the wall”?

  29. Mung,

    I neither claimed nor assumed that the proteins in our databases exhaust all possible functional proteins. Do you understand that?

    Why the HELL would I claim or assume that?

    Not knowingly. But if you think there is some merit or relevance in writing a computer program that comes up with one of them, that is an implicit assumption behind the exercise. No other proteins will do; it has to be one of these. Though you agree that functional proteins aren’t restricted to the members currently represented, the time taken to do it again is only relevant if there is nothing else.

    So far, we have come to an agreement that the density of functional proteins in protein space is somewhat less than 100%, but the functional subset definitely contains more than one member!

    Searching for a protein that exists by random pick is indeed analogous to searching for bridge hands that have been dealt. The fact that bridge hands are all functional is beside the point. We aren’t searching for a functional sequence, we are searching for a given sequence, which happens to be functional. The fact that you say there is a whole library of given sequences doesn’t make it any more useful – the bridge program could be extended to stop when it hits any real hand that has been played in any game, ever. That’s a subset of the overall space, and we’re looking for a member of it. Just as in your protein model. Tells us nothing about functional density – the chance of hitting a ‘well-formed’ string by random pick.

    Though I don’t even know what biological or chemical process people think is being modelled by this ‘Hoyle-o-matic’ method of looking at protein space.

  30. Mung:
    But, we actually have databases now of protein sequences. Real data. We can get as many programs running simultaneously as we need. The search space is still “hyper-astronomical.”

    Or, if it pleases you, we could select a known sequence from the database and gradually change the sequence until we find that it matches another known protein.

    I’m flexible.

    The point is we have real computers that we have a dataset that we can use to test our theories about the nature of protein sequence space.

    What would be the point? You still haven’t answered this question.

    So we write a program that randomly generates sequences, then compare them to the database. Why? What would that tell us Mung?

  31. Mung: I just love it how the evidence that evolutionists need is nowhere to be found. And if they don’t have the evidence they need to support their speculative theories, they shift the burden of proof.

    The evidence for what? What evidence is it that we need, which we don’t have? Why is that evidence needed? Do you even know where you are going with this?

  32. Mung: Given time, and resources, how much of that “hyper-astronomical” space could even be searched? Forget whether the search even turned up a functional protein.

    An infinitesimal fraction.

    So what? Please spell out the logic here.

  33. Mung: I never claimed that the proteins in our databases exhaust all possible functional proteins. Do you understand that?

    ETA: Nor did I assume it.

    The what the FUCK is the point of comparing randomly generated sequenecs to the ones in the database?

    What would that tell us Mung? Have you even thought about that at all? Do you even know what question you are trying to address and can you explain how it relates to the possibility of evolution?

    Do you have any clue?

  34. Rumraket: Do you even know what question you are trying to address and can you explain how it relates to the possibility of evolution?

    Mung accepts evolution in it’s entirety. He even has the books, 17 of them from a single author no less! He’s a TE. God is in there somewhere, at some level. All this is just about Mung finding a crack he can wedge his god into so he can sleep at night, being torn between undeniable reality and myth as he is.

  35. otangelo: good question. I was waiting for it.

    I’m glad. But the first cell was not the origin of life was it? Unless your position that something with 205 genes was not alive and something with 206 is? Your proto-cell had a precursor? Or did your intelligent designer create it in a single fell swoop? Is that your positon?

    It might be an interesting discussion – the first cell and how Intelligent Design explains it’s existence. Why don’t you start an OP?

  36. John Harshman:

    Mung: I neither claimed nor assumed that the proteins in our databases exhaust all possible functional proteins. Do you understand that?

    Why the HELL would I claim or assume that?

    Are you familiar with the phrase “nailing jello to the wall”?

    When you use a database of present-day sequences to identify those sequences that have function you necessarily assume that all other sequences don’t have function.

    (Or maybe you had some other mysterious criterion that somehow used the database. Getting you to tell us what that might be is like nailing jello to a wall.)

  37. Rumraket,

    1. Axe only tested for one function, not all possible functions.
    There, done. At a basic level, this would be enough. I could stop here and you should immediately disbelieve Axe’s conclusion. It cannot possibly be extracted from the experiment he did.

    First, thanks for the well researched op. I have been out of town and not had the chance to read it as carefully as I would like.

    This criticism of the ID position is valid that AXE’s experiment is not a knock out punch for all protein evolution.

    The case it does make is that for certain specific protein designs functional sequences can be quite rare.

    The question is does the construction of new living organisms require rare functional sequences?

  38. OMagain: God is in there somewhere, at some level. All this is just about Mung finding a crack he can wedge his god into…

    I’m sorry, but this is false. God upholds all things in their existence. There are no “cracks” where God can be inserted.

  39. colewd: The question is does the construction of new living organisms require rare functional sequences?

    Okay, what is the answer to that question then?

    Could you pick two closely related organisms, like humans and chimp, and point out a new rare functional sequence that evolution would have to produce?

  40. colewd:
    Rumraket,

    First, thanks for the well researched op.I have been out of town and not had the chance to read it as carefully as I would like.

    This criticism of the ID position is valid that AXE’s experiment is not a knock out punch for all protein evolution.

    The case it does make is that for certain specific protein designs functional sequences can be quite rare.

    The question is does the construction of new living organisms require rare functional sequences?

    The point is that you and all the rest of “IDists” have been lied, repeatedly, about the results and implications of Axe’s experiment. Anyone with just a bit of critical thinking left in him would be mad at them, but not you. Why?

    And I don’t even think Axe’s work demonstrates rare proteins are required for evolution, and even if that was true, the sequence space could be chock full with rare protein functions that make it easy for evolution to explore a vast ocean of functionality, so there really is nothing going on for Axe here as far as I can tell.

  41. dazz: And I don’t even think Axe’s work demonstrates rare proteins are required for evolution, and even if that was true, the sequence space could be chock full with rare protein functions that make it easy for evolution to explore a vast ocean of functionality, so there really is nothing going on for Axe here as far as I can tell.

    Yes, we can imagine all sorts of things. And we can always appeal to our ignorance.

  42. Mung: Yes, we can imagine all sorts of things. And we can always appeal to our ignorance.

    Funny how you would completely miss the point and then go on to mock me for what you guys do all the time. I know it pisses you off to see your new ID poster boy torn to pieces, but try to be a bit less pathetic in your lame comebacks

  43. Mung: Yes, we can imagine all sorts of things. And we can always appeal to our ignorance.

    Yeah it’s not like direct empirical tests tell us there’s tonnes of function out there, and it’s not at all implied by the interconnectedness of the proteins known from life.

    No, we should still retain an irrational level of skepticism in the vain hope that we’ve somehow in some way nobody can actually demonstrate(or even articulate), got it all wrong and a God was required to set it all up to begin with.

    The people who still wallow around in this delusion are having a hypocritical double-standard when it comes to evidence and inference. The evidence for evolution can never be good enough, somehow no inference can ever be sufficiently validated because there’s still some obscure technical detail out there we don’t know with great certainty. So there’s always room for more God.

    But nooo, it’s us who are the ones imagining things, it’s us who appeal to ignorance? What poppycock.

  44. Mung,
    Then what exactly is your problem? If you are not trying to insert your god into a gap then why don’t you publish a paper with your criticisms of whatever it is you don’t understand today? Then we can all have a laugh and that’s that.

    Why the HELL would I claim or assume that?

    All this confusion can be avoided if you simply state what your actual point is in plain language.

  45. OMagain: Then what exactly is your problem?

    I don’t have a problem. What is YOUR problem? Your strategy appears to be “let’s say something false and see if Mung complains.” Then if I tell you you’re wrong, I am the one to blame.

    If your God is so powerful, why are there no signs of His passing by?

Leave a Reply