Critical Analysis of a Paper on Evolution by Frameshift Translation

Ohno’s 1984 Proceedings of the National Academy of Science paper put forward the hypothesis that frameshift mutations can create novel proteins and illustrated his claim with the supposed evolution of nylon eating bacteria. Several researchers cite Ohno’s 1984 paper favorably including Dennis Venema, Ken Miller and our very own Arlin Stoltzfus of TSZ. Unfortunately, Ohno’s 1984 hypothesis, as far as nylon eating bacteria, is dead wrong.

Here is another paper that also cited Ohno’s 1984 hypothesis favorably. This paper may or may not hold promise as it claims to have found 470 frameshift translations in the human genome.

Frequent appearance of novel protein-coding sequences by frameshift translation

Now, just going through the first few examples of framshifts in the paper, when I actually went to the NIH GenBank to look up the exmaples I got messages like this for the very first “example”

NCBI Reference Sequence: NM_207478.1 (click to see this obsolete version)

Record removed. NM_207478.1 was permanently suppressed because currently there is insufficient support for the transcript and the protein.


and then the third “example”:

NCBI Reference Sequence: NM_022085.3 (click to see this obsolete version)

Record removed. NM_022085.3: This RefSeq was permanently suppressed because currently there is insufficient support for the transcript and the protein.

and then the fourth “example”

NCBI Reference Sequence: NM_023939.3 (click to see this obsolete version)

Record removed. NM_023939.3 was permanently suppressed because currently there is support for the transcript but not for the protein.

What the heck does this imply about the claims of the paper? Does it rely on dubious data? Is it possible after the paper was published, that these errors were found and corrected in the databases, hence the paper is making claims on now obsolete premises? I mean a lot can happen in 12 years since the paper’s publication in 2006.

If these “proteins” were predicted, rather than verified proteins in the first place, then maybe nothing got frameshifted if the first place. The error is in the gene prediction, not that a real frameshift translation was actually discovered.

That’s not to say there aren’t proteins generated from alternative reading frames. There are such proteins. But this paper claims some sort of gene duplication followed by an alternative reading frame translation. That’s not completely outrageous given we have proteins coded by alternative reading frames, but I just want to try to work through what this paper is saying vs. what the data is saying.

Thanks to all in advance who decide to participate.

76 thoughts on “Critical Analysis of a Paper on Evolution by Frameshift Translation

  1. Sal asks:

    Does it rely on dubious data?

    I’m reminded of this thread (I miss Alan Miller) where there was a discussion regarding data sampling and the possibility that sea louse data might be contaminated by their gut contents consisting of salmon flesh (their exclusive diet).

  2. Edit: accidentally hit enter too soon.

    I just scrolled through the Supplementary table and looked up four accessions at “random” in RefSeq. None of them had been removed and all had been manually curated.

    Sal, if this post has a purpose beyond noting that RefSeq gene predictions sometimes get updated or corrected when new information becomes available, you should probably look at more than 1% of frameshifted proteins predicted by that paper.

  3. What the heck does this imply about the claims of the paper? Does it rely on dubious data?

    Invariably some older data will be overturned. I think that comes with the territory.

    Is it possible after the paper was published, that these errors were found and corrected in the databases, hence the paper is making claims on now obsolete premises? I mean a lot can happen in 12 years since the paper’s publication in 2006.

    Even without looking into the specifics that seems entirely plausible to me. As more data is collected, previous results are attempted replicated by other labs, gene annotation and prediction methods improve, the picture can substantially change over time.

    The reverse position seems to be what happened with the idea of de novo evolution of orfan genes from noncoding regions. At first this was thought, apparently based on theoretical assumptions never really tested, that the phenomenon of an orfan protein coding gene emerging by accumulating mutations in non-coding DNA, would be so unlikely as to be insignificant even on evolutionary timescales. Yet as sequences from diverse species of organisms were collected, comparative genomics increasingly indicated that the opposite was the case, and today the emerging view buffered by various types of experiments(for example that random sequences quickly evolve into functional transcription promoters), is that this has been a rather stable long-term source of new lineage-specific(at pretty much all taxonomic levels) protein coding genes at least in eukaryote evolution.

  4. Alan Fox,

    Why? The merger of a bacterium and an Archean cell is a very plausible hypothesis.

    This is not really relevant and does not really explain eukaryotic cells origin anyway. The issue is the much higher level of transcription complexity in eukaryotes that prokaryotic architectures don’t have.

  5. colewd:
    Alan Fox,
    This is not really relevant and does not really explain eukaryotic cells origin anyway.The issue is the much higher level of transcription complexity in eukaryotes that prokaryotic architectures don’t have.

    I have done the experiment of predicting binding sites in eukaryotic DNA, and the small amount of information of eukaryotic promoter sequences makes them easier to mistake for useless regions, easier for transcription factors to start spurious transcriptions (start transcription at the wrong positions), and therefore easier for random sequences to evolve into promoters.

  6. @Bill Cole
    Experiments have been done on eukaryotic transcription, that show basically the same thing. Pervasive transcription of even random DNA is ubiquitous even for the eukaryotic transcriptional machinery. See for example this: Massively parallel in vivo enhancer assay reveals that highly local features determine the cis-regulatory function of ChIP-seq peaks.

    Here’s a blog about it: Finding function in the genome with a null hypothesis.

    “To test DNA for function, we used a new technique to measure whether a piece of DNA can regulate a downstream gene (a barcoded DsRed reporter gene). One way to define functional DNA in the context of this experiment is ‘any piece of DNA that reproducibly regulates the reporter gene.’

    We tested about 2,000 native sequences from the genome (more about that in my next post), and, as a negative control, we also tested random DNAs, DNAs created by scrambling the sequences of genomic DNA.

    It turns out that most of the 1,300 random DNA sequences cause reproducible regulatory effects on the reporter gene. You can see this in these results from 620 random DNA sequences below, in what I call a Tie Fighter plot:”

  7. Dave Carlson:

    I just scrolled through the Supplementary table and looked up four accessions at “random” in RefSeq. None of them had been removed and all had been manually curated.

    Thanks!

    Can a record be suppressed in GenBank and still in RefSeq? Anyway, out of respect for the authors, I’m planing on contacting them, but I want to work through the paper a little more carefully first.

  8. stcordova,

    Can a record be suppressed in GenBank and still in RefSeq?

    I don’t know for sure, but I suspect not. Especially since RefSeq is, more-or-less, a more highly curated subset of the information available in GenBank.

  9. Dave Carlson:
    stcordova,

    I don’t know for sure, but I suspect not.Especially since RefSeq is, more-or-less, a more highly curated subset of the information available in GenBank.

    Thanks for looking into the data.

    I was debating whether I should slug through all 1000 entries in table S2. I might get a sample and then point out some issues to Okamura et. al as I try to contact them.

  10. CONFESSION:

    I really don’t know much about RefSeq, I know more about GenBank!!

    I tried putting in “NM_207478.1” in the RefSeq website and got an error message. Was that the right approach? Sorry for the elementary question.

    I highlight the box where I attempted to put an accession number. Is that the right way to do it?

    It seems I can access some RefSeq data by putting in an accession number into GenBank. From the RefSeq handbook:

    https://www.ncbi.nlm.nih.gov/books/NBK21091/

    Complete genome data for viruses, organelles, prokaryotes, and some eukaryotes is propagated to RefSeq records from the whole genome sequence data and annotation available in GenBank (also in the ENA and DDBJ public archives). Generally, an initial validation step is performed before the RefSeq record is made public. The resulting RefSeq record is a copy of the GenBank submission but may contain some additional annotations as a result of the validation step. In particular, transcripts are provided as separate RefSeq records for most eukaryotic organisms; the GenBank submission of the genome sequence from which the RefSeq record is propagated instantiates the protein only, not the transcript.

    Click to ENLARGE

  11. stcordova,

    The problem is that you’re looking into a book about RefSeq, not in the database. The RefSeq ID, if it exists, will be indicated in the entry found through any database in NCBI. I found it only in the probe database though, and did not feel like reading through all the entries with “NM_207478” in them (I ignored the “.1” because that’s the version of the entry, there could already be a “.2” or “.3”, etc.).

  12. Rumraket,

    It turns out that most of the 1,300 random DNA sequences cause reproducible regulatory effects on the reporter gene.

    And so?

  13. Entropy:
    stcordova,

    The problem is that you’re looking into a book about RefSeq, not in the database. The RefSeq ID, if it exists, will be indicated in the entry found through any database in NCBI. I found it only in the probe database though, and did not feel like reading through all the entries with “NM_207478” in them (I ignored the “.1” because that’s the version of the entry, there could already be a “.2” or “.3”, etc.).

    Thanks

  14. colewd: And so?

    So by implication, it is not all that difficult to initiate transcription of eukaryotic DNA as even completely unrelated (as in dissimilar) DNA sequences from canonical promoter regions will still reproducibly recruit transcription factors and initiate transcription. And further, that mutations in such dissimilar DNA sequences that make the sequence more similar to the canonical promoter will generally cause higher levels of transcription.

  15. I’ve tentatively decided to go through a large portion if not all the list and then send it to Okomura et. al. Long sequences that lack stop codons in all 3 alternative reading frames in either the sense or antisense direction are of interest to myself, John Sanford, and Sigfried Scherer, etc. Okomura’s list has potentially some of these examples.

    In exploring Ohno’s 1984 paper, John Sanford found Yomo’s 1992 paper that highlighted anti-sense strands that lacked stop codons. John asked me to look into these….

  16. Rumraket,

    And further, that mutations in such dissimilar DNA sequences that make the sequence more similar to the canonical promoter will generally cause higher levels of transcription.

    Fitness is tied to transcriptional regulation and not higher levels.

  17. colewd: Fitness is tied to transcriptional regulation and not higher levels.

    Sure, not that that changes anything. Mutations can make DNA sequences more, or less similar with whatever changes in expression levels up or down that comes with such changes.

  18. Rumraket,

    Sure, not that that changes anything. Mutations can make DNA sequences more, or less similar with whatever changes in expression levels up or down that comes with such changes.

    It puts a pretty high burden in your court trying to explain eukaryotic cellular regulation with a step by step trial and error process.

  19. colewd:
    It puts a pretty high burden in your court trying to explain eukaryotic cellular regulation with a step by step trial and error process.

    It doesn’t look that way at all. If evolutionary changes are easy to come by, then it’s easier to understand the evolution of eukaryotic regulation. Even better, scientists have been examining eukaryotic genomes, and they’re reaching the conclusion that eukaryotic evolution is better explained in terms of changes in regulation than gene content. Of course, both play roles, but regulation seems to be much more important. That matches very well with the plasticity that those short-promoter-sequences seem to allow.

    Your court doesn’t have anything going for it, other than poor philosophy and poor, misinformed, science and conceptual frameworks.

  20. colewd: It puts a pretty high burden in your court trying to explain eukaryotic cellular regulation with a step by step trial and error process.

    And that burden was met, as all lines of evidence converge on the conclusion that regulation of eukaryotic gene expression is an easily evolvable phenomenon. All the types of evidence we should expect to see if eukaryotic gene expression can, and did in the past, evolve by accumulation of mutations, has been demonstrated to exist. The pattern we expect to see from comparative genetics is in fact seen (one merely has to compare genomes from sufficiently closely related species to see the various incremental stages in the emergence of ORFan genes from noncoding regions), ancestral states can be inferred with phylogenetic methods and tested for function in the lab (the work by Thornton and colleagues has been monumental here), and direct experiments on the mechanisms and process of expression demonstrate how little it really takes for random DNA to reliably recruit transcription factors and express downstream genes(as linked above). The phenomenon is so ubiquitous and widespread it is now commonly known as pervasive transcription.

    At this stage, refusal to accept that eukaryotic gene expression is easily evolvable is patently irrational. It’s pretty much like as Gould said, it would be perverse to deny it.

  21. Rumraket,

    At this stage, refusal to accept that eukaryotic gene expression is easily evolvable is patently irrational. It’s pretty much like as Gould said, it would be perverse to deny it.

    So you think the problem of multicellular control systems is only gene expression?

  22. colewd: So you think the problem of multicellular control systems is only gene expression?

    I have no idea what you’re asking me. What is “the problem of multicellular control systems”?

  23. colewd:
    Rumraket,
    So you think the problem of multicellular control systems is only gene expression?

    The theme was the evolution of gene expression Bill. It started with some experiments in prokaryotes about the evolution of transcription factor binding sites (promoters, for example), you said that it would not be as easy for eukaryotes because the transcriptional machinery was much more complicated, so we show you that there’s no problem at all with eukaryotic promoters, and you come back with this?

  24. Entropy,

    Fitness is tied to transcriptional regulation and not higher levels.

    Here is the subject and its bigger then gene expression.

  25. colewd: Fitness is tied to transcriptional regulation and not higher levels.

    Here is the subject and its bigger then gene expression.

    As usual, your point is opaque. Transcriptional regulation refers to the regulation of levels of gene expression in different tissues at different times. So yes, higher (and lower) levels. And yes, gene expression. What are you trying to say?

  26. John Harshman,

    Transcriptional regulation refers to the regulation of levels of gene expression in different tissues at different times. So yes, higher (and lower) levels. And yes, gene expression. What are you trying to say?

    Why higher and lower levels? Why regulation and not just random gene expression?

  27. Rumraket,

    Bill your questions don’t make sense. What are you trying to say?

    Do you understand that eukaryotic cells of multicellular organisms have variable regulation depending on the age of the animal.

    Rum, this system is not one that can start out sloppy and get better over time.

    One example is cell division which has to be very rapid during embryo development and then slow down to adulthood. The immune system needs to be dormant during embryo development but active in adulthood.

    This process is mission critical for multicellular organisms. Part of what Salvador is talking about is protein tagging which is part of the regulation process. This is a very complex irreducibly complex system which is much more complex then anything that Behe has brought forward.

  28. colewd: Entropy,

    Fitness is tied to transcriptional regulation and not higher levels.

    Here is the subject and its bigger then gene expression.

    I suspect that you don’t know what transcriptional regulation means, because that’s regulation of gene expression, and it includes levels of gene expression.

  29. Entropy,

    I suspect that you don’t know what transcriptional regulation means, because that’s regulation of gene expression, and it includes levels of gene expression.

    And?

  30. colewd:
    Entropy,
    And?

    Nothing Bill. Nothing. Your attention span is not long enough for us to keep a consistent conversation. So I’m not going to try to reconstruct it for you, again.

    ETA: The statements you made were contradictory because you didn’t know that transcriptional regulation is about gene expression, but it doesn’t matter. This conversation won’t work because you don’t seem very inclined to read carefully enough.

  31. colewd: Do you understand that eukaryotic cells of multicellular organisms have variable regulation depending on the age of the animal.

    Yes. Regulation of genes is also tied to the cell cycle, and environmental factors. Why do you think this makes evolution of gene regulation a problem of such a magnitude that we should believe it is problematic for multicellular eukaryotic evolution?

    I also must note that you keep moving the goalposts. To begin with you merely declared (contrary to all evidence) that there would be a huge difference in difficulty between evolving effective promoters in eukaryotes vs prokaryotes.

    Now you’ve moved on to blather about the developmental timings during multicellular eukaryotic life-cycles as your new go-to for why we should think there’s a Big Problem involved.

    Rum, this system is not one that can start out sloppy and get better over time.

    What exactly is “this system”? Be specific. What do you mean by sloppy? Who says they start out “sloppy”, as opposed to merely start out at a lower levels of expression?

    One example is cell division which has to be very rapid during embryo development and then slow down to adulthood. The immune system needs to be dormant during embryo development but active in adulthood.

    So why can’t the evolution of a new gene through an accumulation of mutations in noncoding DNA happen in multicellular eukaryotes?

    Why do we have so much evidence that this happens?

    Think about this hypothetical situation: There’s a noncoding region somewhere in the genome of a multicellular eukaryote. A mutation brings the transcription of a downstream locus up to a level visible to selection. This transcribed locus now has the potential to negatively or positively affect fitness of the carrier organism.

    But we should think this is a problem of such significance that it undermines evolution as process of generating such genes from non coding DNA because… what?

    This process is mission critical for multicellular organisms.

    What does “mission critical” mean?

    You’re just blurting out vague generalizations. One could effectively agree with all these vague generalizations without having to agree with your apparent belief that multicellular eukaryotes, or regulation of genes involved in multicellularity, could not have evolved.

    Part of what Salvador is talking about is protein tagging which is part of the regulation process. This is a very complex irreducibly complex system which is much more complex then anything that Behe has brought forward.

    Oh my it’s so complex, man it’s so complex and irreducibly so. OOooh it’s so complex, more complex that previously already very complex irreducibly complex complexes of complexificating complexities.

    And… so what?

  32. Rumraket,

    I also must note that you keep moving the goalposts. To begin with you merely declared (contrary to all evidence) that there would be a huge difference in difficulty between evolving effective promoters in eukaryotes vs prokaryotes.

    Rum, there is a very big difference as multicellular organisms require controlled variable cell division. A promoter protein in a eukaryotic cell is controlled by feedback where it is targeted for destruction in order to control cell division. This is part of the function of the ubiquitin system. Without this system fully in place multicellular organisms are not possible. This part of the system Sal is talking about.

    What exactly is “this system”? Be specific. What do you mean by sloppy? Who says they start out “sloppy”, as opposed to merely start out at a lower levels of expression?

    Cell division rates must be tightly controlled depending on where the animal is in development. This is truly and irreducibly complex system. There is no step by step path to it.

    Oh my it’s so complex, man it’s so complex and irreducibly so. OOooh it’s so complex, more complex that previously already very complex irreducibly complex complexes of complexificating complexities.

    There are too many interdependent parts that need to function together to make an animal life cycle work. This is exceeding unlikely the result of a trial and error process as variable cell division must be achieved to even start the process.

  33. colewd: Cell division rates must be tightly controlled depending on where the animal is in development. This is truly and irreducibly complex system. There is no step by step path to it.

    *cough*Dictyostelium*cough*

  34. colewd:
    Rumraket,

    Rum, there is a very big difference as multicellular organisms require controlled variable cell division.

    Ahh, so there’s a Very Big Difference. Please show that this is actually a problem!

    A promoter protein in a eukaryotic cell is controlled by feedback where it is targeted for destruction in order to control cell division.

    What the hell does that even mean? promoters aren’t targeted for destruction. Organisms don’t delete their own genes during development. They up or downregulate particular genes. Why can’t a developmentally timed promoter evolve?

    This is part of the function of the ubiquitin system. Without this system fully in place multicellular organisms are not possible.

    First, how do you even know that? How do you know what is possible with respect to multicellular organisms?

    Second, suppose a ubiquitylation system is necessary for multicellular organisms, why that can’t evolve and why do we have evidence that it actually did?

    Cell division rates must be tightly controlled depending on where the animal is in development.

    And why can’t that evolve?

    This is truly and irreducibly complex system.

    Evolutionary theory predicts that irreducibly complex structures with mutually dependent components will evolve.

    There is no step by step path to it.

    That is an amazing claim for which you must surely have copious amounts of empirical evidence. You wouldn’t just be saying stuff without actually knowing
    whether there are any pathways to them would you? So, how do you know that?

    There are too many interdependent parts that need to function together to make an animal life cycle work.

    How do you know there are “too many” of those for it to evolve? As your math teacher would say, show your work.

    This is exceeding unlikely the result of a trial and error process as variable cell division must be achieved to even start the process.

    Please show your work.

  35. Rumraket,

    I am going to take a break here. When you say promoters don’t get targeted for destruction it is obvious you need more study time. Do some research on the WNT beta catenin pathway. Would enjoy discussing it further soon.:-)

  36. colewd:
    Rumraket,

    I am going to take a break here.When you say promoters don’t get targeted for destruction it is obvious you need more study time.Do some research on the WNT beta catenin pathway.Would enjoy discussing it further soon.:-)

    Promoters are nucleotide sequences that occur near the transcription start site of genes. It would be very surprising indeed if they were targets for destruction via the ubiquitin system.

  37. colewd:
    Rumraket,
    Here is an over view of the ubiquitin system that targets the promoter beta catenin for destruction.
    https://uncommondescent.com/intelligent-design/the-ubiquitin-system-functional-complexity-and-semiosis-joined-together/

    While I wouldn’t be too surprised if a UD entry said that promoters are targeted for destruction, I suspect you’re not reading that correctly. I could not find the words “promoter beta catenin” anywhere there, nor could I find anything about a promoter being targeted for destruction. Looking around elsewhere I found that beta catenin is a protein. I tried to find if a promoter was also named so, and if so, if it was targeted for destruction somehow, even if not by the ubiquitin system, but I failed miserably on every account. My abilities to find information seen damaged. It could also be that promoters are not targeted for destruction and you’ve got it wrong, but who am I to question you in this regard? I must be missing something and that’s that.

  38. colewd: . Do some research on the WNT beta catenin pathway.

    I have been unable to find any evidence that any promoter is targeted for destruction anywhere in the WNT beta catenin pathway. This is where you give a reference.

    I suspect Dave Carlson is right and you don’t actually know what a promoter sequence is. It looks like you think a promoter is a protein, or some sort of organelle, that is targeted for destruction. It isn’t. Read the link. “It’s obvious you need some study time”.

    colewd: Here is an over view of the ubiquitin system that targets the promoter beta catenin for destruction.
    https://uncommondescent.com/intelligent-design/the-ubiquitin-system-functional-complexity-and-semiosis-joined-together/

    And Dave Carlson was right.

    Beta-catenin is a protein molecule, it is not a promoter. The beta-catenin gene has a DNA promoter upstream of the protein coding region. This promoter is an area of DNA that has a particular sequence to which transcription factors bind and regulate the expression of the beta-catenin gene.

    Ironically, the ubiquitylation system is actually indirectly involved in keeping DNA (whether promoter sequences or not) intact by targeting damaged DNA repair proteins for destruction.

    So let’s try again, you go back and read this post of mine and try to answer it substantially instead of making a fool of yourself again.

  39. Rumraket,

    Ironically, the ubiquitylation system is actually indirectly involved in keeping DNA (whether promoter sequences or not) intact by targeting damaged DNA repair proteins for destruction.

    Do you understand what beta catenin functions in the cell are? Do you understand the role the destruction mechanism plays in cell cycle control?

    Apologies for conflating promoter DNA with a transcription factor protein.

  40. colewd: Do you understand what beta catenin functions in the cell are? Do you understand the role the destruction mechanism plays in cell cycle control?

    Apologies for conflating promoter DNA with a transcription factor protein.

    Do you understand any of this stuff? Your confusion of promoters with transcription factors suggests that you do not.

  41. colewd: Yes.

    From where I’m sitting it’s clear you do not understand more then a fraction of it. You ignore explanations, you ignore counter points. You ignore most of everything everyone says in fact. And I’ve seen things that you’ve had explained to you and which you’ve accepted forgotten about in days, sometimes hours. It’s also clear that the long detailed explanations given to you are ignored. You respond to a point or two at best, ignoring the rest.

    And then you link to Uncommon Descent as if you are linking to a peer reviewed paper that supports your claims.

    Honestly, your side would be better off without your efforts.

Leave a Reply