Repetitive DNA and ENCODE

[Here is something I just sent Casey Luskin and friends regarding the ENCODE 2015 conference. Some editorial changes to protect the guilty…]

One thing the ENCODE consortium drove home is that DNA acts like a Dynamic Random Access memory for methylation marks. That is to say, even though the DNA sequence isn’t changed, like computer RAM which isn’t physically removed, it’s electronic state can be modified. The repetitive DNA acts like physical hardware so even if the repetitive sequences aren’t changed, they can still act as memory storage devices for regulatory information. ENCODE collects huge amounts of data on methylation marks during various stages of the cell. This is like trying to take a few snapshots of a computer memory to figure out how Windows 8 works. The complexity of the task is beyond description.

To get a hint of the importance of the methylation marks see:
http://www.nature.com/nature/2015/180215/full/nature14310.html

So it’s just my conjecture, the repetitive DNA is there to act as dynamic store of data. We are fooled into thinking since the sequences don’t change much during an organism’s lifetime that it can’t act as an information processing system since we presume the relevant information is the DNA sequence, when in fact the critical information are the methylation marks.

The repetition is probably important for the organism to recognized the regions and say, “hey, here’s where a lot of RAM is for me to use.” The mistake is thinking the significant information content is in the “ACTG” sequences, it is not, it is in the methylation markings, and that could well be where some serious parts of ontogenic regulatory information is flowing through.

Gruar’s approach is akin to opening up a computer and examining the physical RAM in it and saying, “those VLSI transistors are all identical, look at all that repetition, therefore it’s junk!” What matters is when the computer is up and running and we’re seeing these transistors switching back and forth from 0 to 1. Those zero’s and ones are the DNA methylation marks for regulation (not the repetitive ACTG transistors), and that’s why I suspect Graur is way off mark. It really is too early to tell, I wouldn’t yet go out on limb, but Graur is going out on a limb, and if proven wrong, we can publicly call him on it.

Repetitive sequences are plug and play like computer RAM, and can be subject to some variation. But they need to be there.

102 thoughts on “Repetitive DNA and ENCODE

  1. stcordova: That’s like saying the 5 backup navigation systems of the space shuttle are non-functional because we can delete all 5 of them with no ill effect.

    Exactly. If you can delete (or break) all five of them, and the shuttle still flies, they are non-functional. If the shuttle needs a back up system, the shuttle crashes – they are still non-functional.

    The point you seem to have missed, Sal, is that if a gene can take lots of different sequences and the organism still function, then the exact sequence of that gene can’t matter to the organism. Which means it’s junk – if it has any function, then it’s a function that can be served as easily by one sequence as by many variants.

    Which probably means if it’s useful at all, it’s useful as spacer – filler.

    Which is possible.

  2. stcordova,

    I have to point something out, the numbers Moran gave are tentative at best. There are trillions of cells in the human at various developmental stages.

    We sample only a small fraction of those cells to do studies, and many of those cells are immortalized cancer cells.

    I addressed that. If by chance we had a transposon that was active in some tissues but not in others, we would reasonably expect the zygote to contain the ‘active’ version. We can’t realistically expect a corrupt master copy to produce non-corrupted descendants.

    If the zygote contains an active transposon, this would have population-level effects on polymorphy. If we don’t see this, then the transposon is reasonably concluded broken in the zygote too, and not just the tissue. And if it’s broken in the zygote, it’s broken everywhere.

  3. stcordova,

    Take a look at the number of researchers on the vitamin D receptor who found function in the non-coding regions associated with vitamin D:

    Non-coding is not the same as junk. You really need to take Larry’s course!

  4. You don’t know anything of the kind about the regions in question, do you?

    We don’t know hardly anything about anything in biology relative to what we could possible learn over the next thousand years, that’s why it’s unwise for you and Moran to presume there is no function. If avoiding looking in those regions prevents us from curing certain diseases, that would be bad. You and Moran obviously would prefer to conclude there is no chance these regions are important to health. What if you guys are wrong?

    I’ve already provided papers and research that hint you guys could be wrong. But in the scheme of things, ENCODE and ROADMAP medical researcher have decided they’re going to look while you and Larry want to just stand on the sideline and say their work is a debacle (to quote Dan Graur).

    You and Larry are premature and presumptuous.

  5. stcordova,

    If avoiding looking in those regions prevents us from curing certain diseases, that would be bad.

    Likewise, if people focussed on the all parts of the genome with equal intent, that would be bad, if 90% really were junk (which certainly seems to be the case, from numerous lines of evidence).

    Real scientists are open to all suggestions, and don’t close down any avenue. But then again, they don’t have unlimited funds. Research has to be targeted. If there is no association with pathology and no apparent function within a region, why bother with it?

    And if a piece of ‘functional junk’ is found, who tends to find it – ‘Darwinists’ or YECs?

  6. stcordova: I’ve already provided papers and research that hint you guys could be wrong.

    Do some actual work then, if you are so sure! Or, you know, stand on the sidelines throwing popcorn for the rest of your life. Up to you really.

  7. stcordova: If avoiding looking in those regions prevents us from curing certain diseases, that would be bad. You and Moran obviously would prefer to conclude there is no chance these regions are important to health. What if you guys are wrong?

    As a lot of “junk” probably consists of degraded sequences that once served a function, it’s perfectly possible that in the future we could make ourselves more functional. Maybe we could fix the broken vitamin C gene and we wouldn’t have to eat fruit!

    But that doesn’t help your case.

    Evolutionary theory doesn’t require that there be junk DNA (if DNA were metabolically expensive, it would tend to be eliminated over time, but it seems that it isn’t), but it it can explain it. And it seems that there is a lot of junk. The onus is on ID to explain it under ID. It’s not particularly a prediction under evolutionary theory.

    On the other hand, what is, is that if there is junk, it will be found to contain corrupted versions of genes that have a function in other lineages, bits of virus, and other detritus (redundantly duplicated sequences, for instance, or even entire chromosomes). And of course that’s what it a lot of it turns out to be.

  8. The problem for ID theory is that junk is what the word implies: broken stuff that used to work but which has been replaced with newer stuff.

  9. Allan Miller: And if a piece of ‘functional junk’ is found, who tends to find it – ‘Darwinists’ or YECs?

    That’s really the key. Everyone would love to get rich and famous by curing a disease. How can anyone be stupid enough to think there is a bias against looking in promising areas?

  10. petrushka: That’s really the key. Everyone would love to get rich and famous by curing a disease. How can anyone be stupid enough to think there is a bias against looking in promising areas?

    This strikes me as the most hilarious aspect of the “Darwinist omerta” idiotrope: pharmaceutical companies don`t give a rat’s ass about the ideological implications of their research programs — they are ruthlessly committed to making money — yet, strangely, they stick with the evolutionary paradigm. Strange, that.

  11. Elizabeth wrote:

    Sal, the main reason we can conclude that vast tracts of the genome are non-functional is that they accrue mutations without ill-effect.

    if they were functional, mutations in those regions would have an effect.

    That would not be true if there is redundancy in the genome, and there is in fact redundancy in the genome, hence that viewpoint is not supported, likely wrong.

    Genetic redundancy means that two or more genes are performing the same function and that inactivation of one of these genes has little or no effect on the biological phenotype. Redundancy seems to be widespread in genomes of higher organisms1, 2, 3, 4, 5, 6, 7, 8, 9. Examples of apparently redundant genes come from numerous studies of developmental biology10, 11, 12, 13, 14, 15, immunology16,17, neurobiology18,19 and the cell cycle20,21.

    http://www.nature.com/nature/journal/v388/n6638/full/388167a0.html

    By functional I mean has utility if only in contingency, not that is actively utilized all the time.

  12. Allan Miller:

    we would reasonably expect the zygote to contain the ‘active’ version.

    I’m afraid your conjecture isn’t supported empirically, here is an example of lncRNAs appearing mostly in the adult stage:

    http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3928180/

    Long non-coding RNAs (lncRNAs) were recently shown to regulate chromatin remodelling activities. Their function
    in regulating gene expression switching during specific developmental stages is poorly understood. Here we describe a
    nuclear, non-coding transcript responsive for the stage-specific activation of the chicken adult αD globin gene. This noncoding
    transcript, named α-globin transcript long non-coding RNA (lncRNA-αGT) is transcriptionally upregulated in late
    stages of chicken development, when active chromatin marks the adult αD gene promoter. Accordingly, the lncRNA-αGT
    promoter drives erythroid-specific transcription. Furthermore, loss of function experiments showed that lncRNA-αGT is
    required for full activation of the αD adult gene and maintenance of transcriptionally active chromatin. These findings
    uncovered lncRNA-αGT as an important part of the switching from embryonic to adult α-globin gene expression
    , and suggest a function of lncRNA-αGT in contributing to the maintenance of adult α-globin gene expression by promoting
    an active chromatin structure.

    That shows there is foresight for the adult stage. You’re making the same mistake Larry is making, extrapolating from miniscule sample sizes and making assumptions that the non-Coding regions must be active in the zygote.

    As large as ENCODE is, they only track 150 cell types, and those mostly immortalized cells in the cancerous stage. Even ENCODE and ROADMAP have hardly scratched the surface of the Quadrillion transcriptomes that might be involved in development of an adult human. There is no way Larry can be sure of his claim.

    ENCODE researchers every year discover function and roles in non-Coding DNA, and usually it is surprising. Larry is premature. And the fact that ENCODE research keep discovering new roles in places people didn’t suspect indicates a trend that we will discover more.

    Usually the GWAS studies will give a providential breakthrough that something is up. Usually the researcher reports, “variation in the intronic and intergenic regions compromises health, we report the regulatory role of these regions….” It’s not going to stop. There is no way Larry can know the non-coding DNA have no function in any of the cells during development. If ENCODE and ROADMAP don’t have the data, surely Larry doesn’t either.

  13. stcordova: By functional I mean has utility if only in contingency, not that is actively utilized all the time.

    Sal, you aren’t thinking this through.

    To repeat:

    stcordova: Sal, the main reason we can conclude that vast tracts of the genome are non-functional is that they accrue mutations without ill-effect.

    Yes, that is probably because at least some of that material is duplication, as I said. Until it is broken, it is functional, but redundant. Because it is redundant, when it is broken, it has no ill effect.

    Clearly, if the sequence it duplicates also breaks, then it will. And that lineage will tend to die out.

    But if only one breaks, the lineage survives. And so that lineage continues to carry the broken duplicate – the junk.

    Perfectly explicable in terms of evolutionary theory. Not so explicable in terms of ID, unless your argument is that the Designer gave all living things lots of duplicates so that if something broke, there’d still be spare.

    Is that your argument? That the junk is indeed junk, but is made of broken stuff that was originally useful in that it was redundant spare stuff, until it broke?

  14. From Larry’s website,

    Pseudogenes (1.2% junk)
    (from protein-encoding genes): 1.2% junk

    How does Larry know they are functionless. From this review paper:

    Previously dismissed as non-functional DNA, evidence shows that some pseudogenes are fully transcribed, resulting in the production of natural antisense transcripts (NAT). NATs are involved in numerous vital cellular processes, including regulation of translation and stability, RNA export, alternative splicing, genomic imprinting, X inactivation, DNA methylation and modification of histones, and have also been shown to play roles in stress response and developmental processes [169]. NATs transcribed from pseudogenes have the potential to regulate sense transcripts arising from the functional parental gene through complementary binding, which has been shown in some cases to induce cleavage of the sense transcript [191]. Studies have shown that pseudogenes can also regulate their parental gene by interacting with enhancers, and that pseudogene transcripts can act as decoys for miRNAs that target the parental gene [143] (reviewed in [129]. It is estimated that up to 20 % of human pseudogenes are fully transcribed [199]. However, it is likely that pseudogenes also produce smaller non-coding RNAs that may regulate gene expression in cis or in trans. Transcription of pseudogenes often occurs in a tissue-specific manner, and the discovery that pseudogenes are capable of regulating tumour suppressors and oncogenes, and are often deregulated during cancer progression, indicates that they are important components of the non-coding RNA regulatory system (reviewed in [142]. The discovery that pseudogenes may function in the form of non-coding RNAs shows that previous assumptions about “non-functional” regions of the human genome should be challenged in the course of further research into non-coding RNAs.
    http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3474909/

    Gee Larry, these are TISSUE SPECIFIC, which means you’d have to examine every cell and every tissue in every developmental stage before you can safely say it has no regulatory role! That’s probably in the quadrillions of transcriptomes to search. You’re premature Larry.

    co-opted pseudogenes: <0.1% c
    cCo-opted pseudogenes are formerly defective pseudogenes those that have secondarily acquired a new function.

    How do you know that Larry? Did you look into the quadrillions of human transcriptomes to find it never had a role?

  15. Sal, the main reason we can conclude that vast tracts of the genome are non-functional is that they accrue mutations without ill-effect.

    Just like coding regions have degeneracy, so do regulatory regions. Tolerating mutations with no ill effect does not imply they are non functional. Coding regions can tolerate SOME mutations without ill effect because of codon degeneracy, that does not imply the coding regions are non-functional.

    The line of reasoning that, “it can be mutated, therefore it is junk” is erroneous. Better to say, “it can be mutated, therefore it has some degree of fault tolerance, and if fault tolerant, it suggests purpose and function.”

    GWAS studies for various diseases show there is fault tolerance, but not fault proofness in the non-coding regions.

    And so that lineage continues to carry the broken duplicate – the junk.

    Even granting that, how is it established that the duplicate has no functional role? There are probably a quadrillion or more transcriptomes through the lifetime of a human, and ENCODE and the rest of the world have examined only a paltry few of them.

    In the previous comment, regarding pseudogenes, the supposedly imperfect duplicate actually has a role that wasn’t discovered till later. It’s premature to say something is broken just because one hasn’t seen the context where it actually does something.

    What I object to is being so presumptuous about things having no functional role. That rush to conclusion has been embarrassing, and I’ve provided reasons why we should not jump to conclusions because much of the activity of the DNA is cell and development stage specific. The functional role of each of these non-coding regions may only be evident in a few of the quadrillion cells at various developmental stages.

    How reasonable is it to extrapolate data from 150 cell lines that ENCODE tracks (with great difficulty I might add) to the quadrillion or more transcriptomes that are expressed in a human lifetime?

    As someone who endorses genetic entropy, I think there is junk, but we don’t know where it is. Larry and Graur are being awfully premature.

    The OP showed that DNA carries more information than just through the A,C,G, T bases. That makes it possible for DNA, even the highly repetitive ones, to convey lots of information, and in fact it is quite reasonable to say even though the ACGT bases of DNA in each cell of a human are mostly the same, the methylation marks are almost unique in every cell. This fact makes it possible for non-coding DNA to have a substantial role that can only be detected as we examine more and more cells at various developmental stages. We have only scratched the surface. Larry Moran, Dan Graur and the anti-ENCODE crowd are horribly premature in call the ENCODE claims a debacle.

    The trends in discovery of function in non-coding DNA look like it will vindicate many of the ENCODE claims. ENCODE researchers too may have been premature in their claims of 80% functionality, but the trend is on their side.

  16. Ken Miller might have to eat his words:

    so many repeated copies of pointless DNA sequences that it cannot be attributed to anything that resembles intelligent design.

    Yeah, and we know the fallacy in assuming that just because the ACGTs repeat that it must not be able to carry large amounts of meaningful information!

    The information isn’t in the ACGTs alone, but the epigenetic methylation marks.

    By Miller’s reasoning, the repeated patterns in VLSI hardware must imply they can’t carry meaningful information — baloney! The methylation marks on repeated ACGT patterns are so information bearing that there is not the quest for, tada: The Epigenetic Code, or dare I say, codes!

    Deciphering the Epigenetic Code: An Overview of DNA Methylation Analysis Methods

    Over the past decade, research in epigenetics has been making it into mainstream research. This is due to the fact that epigenetic mechanisms have emerged as key events involved in the regulation of critical biological processes. Epigenetic mechanisms have been proposed to function as an interface between the genome and environment; therefore, epigenetic deregulation is likely to be involved in the etiology of human diseases associated with environmental exposures (22, 25). However, the high variability of epigenetic states between different cell types within the same organism as well as epigenetic fluctuation in time, even within a single cell, represents an important challenge to a complete understanding of the normal epigenetic landscape and its dynamic variability.

    DNA methylation is the most extensively studied epigenetic mechanism, and it plays multiple roles in key cellular processes, including regulation of gene expression, embryonic development, genomic imprinting, and chromosome stability (Fig. 1A). Although methylation of other nucleotides does exist, at least for this article, DNA methylation refers to the attachment of a methyl group to the 5-carbon (C5) position of cytosine, mostly in the context of so-called CpG dinucleotide. This process is mediated by two main categories of DNA methyltransferases (DNMTs): de novo DNA methyltransferases (DNMT3A and DNMT3B) and the maintenance methyltransferase (DNMT1) (Fig. 1B). DNA methylation changes, like other epigenetic changes, are in principle reversible, which makes them attractive targets for the epigenetic therapy (Fig. 1C).

    Reversible marks? Like I said in the OP, it’s like readable/writable memory. The repetitive ACGT sequences provide a substrate for this information storage.

  17. So what is the role of repetition? The answer, it is easier to recognize! Bwhaha.

    http://www.sciencedirect.com/science/article/pii/S0968000498012250

    1. Lessons on repetitive-sequence methylation from fungi

    ….some organisms have mechanisms that detect repeated sequences, and then methylate and silence them…
    ….
    These experiments revealed that duplicated sequences are efficiently detected and methylated during the sexual phase, in the period between fertilization and nuclear fusion.

    So Ken Miller has to eat his words. Repetition is a way to emphasize a methylation target. Like I said in the OP:

    The repetition is probably important for the organism to recognized the regions and say, “hey, here’s where a lot of RAM is for me to use.”

  18. stcordova: The line of reasoning that, “it can be mutated, therefore it is junk” is erroneous. Better to say, “it can be mutated, therefore it has some degree of fault tolerance, and if fault tolerant, it suggests purpose and function.”

    That is not sound reasoning, Sal.

    There are two possible reasons why a thing should be “fault tolerant”, and one of them is because it is junk.

    Of course it may serve a purpose for which the sequence is not critical – as I said, it could be useful as spacer, or padding.

    And the fact is that we know that some of it WAS once useful and now doesn’t work, because it DOES still work in parallel lineages, the GULO gene being a case in point.

    So not fault-tolerant in that case.

  19. I can whack sand with a mallet to little effect. So it’s fault tolerant, and therefore suggests purpose and function. Which of course it does – its purpose is to hold the sea up.

  20. stcordova,

    Me: we would reasonably expect the zygote to contain the ‘active’ version.

    Sal: I’m afraid your conjecture isn’t supported empirically, here is an example of lncRNAs appearing mostly in the adult stage:

    God, you’re hard work! I’m going to start charging you for all these geology and biology lessons.

    lncRNAs are not transposons. And the DNA for them exists in the zygote, unless you have evidence to the contrary.

  21. stcordova: The repetition is probably important for the organism to recognized the regions and say, “hey, here’s where a lot of RAM is for me to use.”

    Probably? You mean you are just guessing, right?

  22. That shows there is foresight for the adult stage. You’re making the same mistake Larry is making, extrapolating from miniscule sample sizes and making assumptions that the non-Coding regions must be active in the zygote.

    Nope. You are conjecturing that, if a transposon is discovered to be broken in (say) a bone cell, it may be unbroken in one of the other tissues. If it is unbroken in one of the other tissues, it must be unbroken in the zygote. So here are the things that must happen in ‘Sal-world’:

    1) An active transposon is prevented from transposing in the zygote (for no better reason than that it can avoid leaving a troublesome phylogenetic signal, keeping Sal’s theory afloat).
    2) In certain tissue types, it becomes broken (in a consistent manner from individual to individual, and from transposon to transposon) – again, mainly in service of Sal’s theory, since the organism already has a zygotic suppression mechanism. Now it has TWO ways of inactivating a transposon! How elegant!
    3) In other tissue types, it both remains both unbroken and unsuppressed.

    There is not a shred of evidence for such mechanisms operatiing throughout the transposon load of the genome, and significant evidence against. You need it to happen to 50% of the genome. Someone might notice. Bwa and, indeed, ha.

  23. lncRNAs are not transposons. And the DNA for them exists in the zygote, unless you have evidence to the contrary.

    I was talking repetitive DNA in general, and if adult expression is true for lncRNA, it’s not a stretch for transposons like L1. A zygote isn’t a neuron cell, and I just pointed out L1 transposons are active in neuron cells. They only just determined it.

  24. stcordova,

    I was talking repetitive DNA in general, and if adult expression is true for lncRNA, it’s not a stretch for transposons like L1.

    We were talking about broken transposons, not regulated ones.

    It is a stretch if you propose that there is an intact transposon in tissue A, a consistently broken one in tissue B (from which the genome was sequenced), and an unbroken but inactivated one in the zygote. It is ALSO a stretch if you propose that there is an intact transposon in tissue A, a consistently broken one in tissue B and a broken one in the zygote. Particularly if you propose that this model applies to a substantial fraction of the ‘transposome’, which you would have to if you are trying to ‘de-junk’ transposons. The soma just does not work like that.

    A zygote isn’t a neuron cell, and I just pointed out L1 transposons are active in neuron cells.

    Neuron cells are derived from the zygote, so that transposon must be whole in the zygote, and therefore capable of transposing (unless there is some elaborate tissue-specific suppressor). You surely aren’t proposing that ALL L1 transposons – or even a substantial fraction – are active (definitely a stretch given that 99%+ of them are consistently bust in multiple genome copies).

  25. I cross posted this to Larry’s blog:

    This is Sal (don’t know why word press gives me the name LiarsForDarwin when I sign in as stcordova, I do own the blog by that name, but LiarsForDarwin isn’t supposed to be my username)…

    I personally agree with you the non junk DNA was not a prediction of the ID hypothesis. This isn’t the only time I’ve broken ranks with my own, probably won’t be the last…

    Not every creationist believes the genome has to be 100% or 80% functional. The genetic entropy community (John Sanford and other creationists) predict there is a continuing accumulation of defects in the human genome because of mutation load problems. So there is a segment of the creationist community which predicted some of the genome is junk, especially those who believe Adam’s sin in the garden of Eden precipitated the eventual death of the human race….

    The 50% figure was likely a whisper number among the ENCODE community. I speculate in part because 54% to 60% of the human genome is mostly repetitive. My search for FASTA files for HG37 with repeat masking on indicates a 54% whereas C-values hint that it could be 60% since it says the human genome is 3.5 gigabases, not 3.1 (for FASTA HG37). ENCODE does not provide much in the way of annotations for repetitive elements except to say it transcribes to repetitive RNA!

    You ask, “why would Casey take such a risk”. The real question is why would you and Dan Graur take such risks. There is so little that we know. We have good evidence the repetition in repetitive elements could be key to navigating molecular machinery to do various tasks since it seems there is machinery in Eukaryotes that is tailor made to seek repetition. The repetitive elements can also be highly information bearing in a readable/writable way like random access memory in a computer because of the methylation markings (among other memory mechanisms).

    The NIH ROADMAP project has a comparable budget to ENCODE, and it is constructing maps of the methylome (yet another -ome). However this will be a daunting task since there are probably over a quadrillion methylomes expressed in a human since there is a separate transciptome and methylome for each cell, and the methylome changes even within the lifetime of the cell. If we then take every cell as a separate memory device through the methylation marks on the DNA, then the sum total of repetitive DNA in the entire body in every cell could be hosting pentilliions of bits of information because each DNA strand in each cell has a different methylome…

    Many of these methylomes which t-off the repetitive DNA and are tissue specific. ENCODE and ROADMAP track only 150 of the possible quadrillion methylomes.

    So why are you and Dan Graur sticking out your necks? It’s too early in the game for either side to make the call for sure since everyone is just guessing.

    I’ll tell you why I think Casey’s taking the risk. He has nothing to lose. That is sharp wagering. On the other hand you, Dan Graur and Ken Miller do. You could be the but of jokes 20 years from now….

    And why are you railing against ENCODE and practically agree it’s a debacle. Some of your colleagues at U Toronto is possibly getting some $ from the ENCODE consortium directly or indirectly. Couldn’t you be a little more toned down about their sake? They have to bring home the bacon for some of their kids you know…

  26. It’s “butt” of the joke Sal, I’m surprised you don’t know this already.

  27. I appreciate your taking your argument to Sandwalk. I’ll be watching, but I don’t expect much. The post is simply too unfocused. A Gish Trot, rather than a gallop.

  28. “liarsfordarwin”. Way to get people to read you charitably!

    As if “stcordova” didn’t already. 😛

  29. Well good luck on your predictions.

    Having no evidence, no theory and no mechanism, and lots of conflicting evidence should not discourage you.

    I am amused by your celebrating the young Pluto.

  30. I mentioned duplication as backup strategy at Sandwalk.

    Lo and behold:

    http://www.sciencedaily.com/releases/2015/07/150706114123.htm

    Carrying around a spare tire is a good thing — you never know when you’ll get a flat. Turns out we’re all carrying around “spare tires” in our genomes, too. Today, in ACS Central Science, researchers report that an extra set of guanines (or “G”s) in our DNA may function just like a “spare” to help prevent many cancers from developing.

    Various kinds of damage can happen to DNA, making it unstable, which is a hallmark of cancer. One common way that our genetic material can be harmed is from a phenomenon called oxidative stress. When our bodies process certain chemicals or even by simply breathing, one of the products is a form of oxygen that can acutely damage DNA bases, predominantly the Gs. In order to stay cancer-free, our bodies must repair this DNA. Interestingly, where it counts — in a regulatory DNA structure called a G-quadruplex — the damaged G is not repaired via the typical repair mechanisms. However, people somehow do not develop cancers at the high rate that these insults occur. Cynthia Burrows, Susan Wallace and colleagues sought to unravel this conundrum.

    The researchers scanned the sequences of known human oncogenes associated with cancer, and found that many contain the four G-stretches necessary for quadruplex formation and a fifth G-stretch one or more bases downstream. The team showed that these extra Gs could act like a “spare tire,” getting swapped in as needed to allow damage removal by the typical repair machinery. When they exposed these quadruplex-forming sequences to oxidative stress in vitro, a series of different tests indicated that the extra Gs allowed the damages to fold out from the quadruplex structure, and become accessible to the repair enzymes. They further point out that G-quadruplexes are highly conserved in many genomes, indicating that this could be a factory-installed safety feature across many forms of life.

    H/T Denyse

    Richard Sternberg pointed out quadruplex DNA:

    http://www.evolutionnews.org/2009/05/guy_walks_into_a_bar_and_think020401.html

    As you know, telomeres are the ends of chromosomes. In many species, including chimps and humans, the DNA sequences that are found at these genomic tips are tandem repetitions of TTAGGG. That’s right…TTAGGGTTAGGGTTAGGG…over and over and over again. A notable exception to this rule is the fruit fly, an organism that in this regard has provided the junk DNA notion no succor, since its telomeres have complex combinations of three different retrotransposons instead of those six-basepair units. What is important to note, though, is that telomeric sequences are essential to the cell, and it seems that hardly a week does not pass without some new role being discovered for these elements.

    How, precisely, are miles and miles of TTAGGG of significance? From the standpoint of chromosome architecture, the repetitive elements en masse have the propensity to form complicated topologies such as quadruplex DNA. These sequences or, rather, topographies are also bound by a host of chromatin proteins and particular RNAs to generate a unique “suborganelle” — for the lack of better term — at each end. As a matter of fact, the chromatin organization of telomeres can silence genes and has been linked to epigenetic modes of inheritance in yeast and fruit flies. Furthermore, different classes of transcripts emanate from telomeres and their flanking repetitive DNA regions, which are involved in various and sundry cellular and developmental operations.

  31. Well Sal, What percentage of DNA is highly conserved, and where can you find someone like Larry Moran who dismisses highly conserved DNA as non-functional?

    The reason Larry says 90 percent of DNA is junk is because that is the percentage that is not conserved.

  32. petrushka:
    The reason Larry says 90 percent of DNA is junk is because that is the percentage that is not conserved.

    If I understand the argument correctly, it’s deemed not conserved because it is different from sequences in other species and so it’s assumed that these sequences diverged from a common ancestor in which they were the same.

    But that does not tell us that they are not currently under selection.

    Do they show a wide variance among different members of the same species?

  33. I posted this at Sandwalk:

    http://sandwalk.blogspot.com/2015/07/the-fuzzy-thinking-of-john-parrington_19.html?showComment=1437492413384#c928967426522207001

    Speaking of only 30,000 genes, they are part of generating possibly billions or trillions or more protein isoforms and glycoforms produced from a quadrillion transcriptomes. Plenty of room for RNAs to do work in coordinating all these buzzillion isoforms. We haven’t even scratched the surface with RNA-seq experiments, but papers are coming out on the role of RNAs in isoform creation and cellular differentiation. The field of RNA interactomes has just barely gotten off the ground. What may look like noise to Graur looks like interactome machinery to ChIRP-SEQuencers.

    How do biochemists actually know RNA transcripts are non-functional? How many GWAS, RNA-Seq, ChIRP-seq, Chip-Seq, etc. experiments on the quadrillion transcriptomes would be adequate to establish non-function? Many functions, such as those involved in recovery from injury, are not detected unless the back up nature of this redundancy is called upon.

    How do you distinguish spare tires from junk without all these requisite experiments? Even recently we found quadruplex DNA may have functional role. Example: miles and miles of repeated TTAGGG the are often incorporated in nuclear only telomeric repeat-containing RNA (TERRA), and these repeats have biophysical function in preventing error.

    Steve Matheson kept saying many RNAs don’t even leave the nuclear complex. Did it occur to him then that maybe these RNAs are used to do something that’s done in the nuclear complex alone, like creating and regulating part of the manufacture of ribosomes? It’s to early to tell what these RNAs do. Simply saying they have no function because people like Matheson and Graur don’t perceive the function is not a basis for declaring there is no function. It’s just prejudice against the idea of deep integrated functionality in biological systems. And prejudice is not experiment and observation.

  34. The reason Larry says 90 percent of DNA is junk is because that is the percentage that is not conserved.

    non-Conservation does not imply non-function. Human have orphan genes, we have non-conserved opposable thumbs. Not a good idea to say non-conservation implies non-function.

  35. Larry replied:

    “liarsfordarwin” lives up to his name in a comment where he asks,

    How do biochemists actually know RNA transcripts are non-functional?

    We don’t.But since spurious, nonfunctional, transcripts are expected by everyone who understands biochemistry, the onus is on the true believers to prove that transcripts are functional.

    All right, lets see who Larry insinuates don’t understand biochemistry. Here are some of the ENCODE 2015 presenters whom Matzke said aren’t that smart:

    http://medicine.yale.edu/neurology/people/chris_cotsapas.profile

    Chris Cotsapas is a computational geneticist whose primary interest is in understanding the biological processes underlying diseases of the immune system. He has published several highly cited papers on genome-wide association studies of disease, gene expression genetics and evolutionary biology. For more information please see our lab website.

    Chris obtained his BSc in Biochemistry from Imperial College London in 2000 and his PhD in Biochemistry and Molecular Genetics from the University of New South Wales, Sydney in 2007.

    and

    https://biochem.wisc.edu/faculty/pike

    Department of Biochemistry College of Agricultural and Life Sciences

    http://www.bidmc.org/Research/Departments/Medicine/Divisions/Endocrinology/Laboratories/RosenLab/Members/EvanRosen.aspx

    And this guy is a U Toronto where Larry teaches:
    http://utoronto.academia.edu/MathieuLupien

    Not too swift of Nick Matzke to label Larry’s fellow faculty as not that smart. Certainly some ENCODE researchers understand biochemistry.

    And how about one the ENCODE leaders:
    https://en.wikipedia.org/wiki/Ewan_Birney

    Birney was educated at Eton College as a King’s Scholar.[1][39] Before studying at university Birney completed a gap year internship at Cold Spring Harbor Laboratory under the supervision of James Watson[2][27] and Adrian Krainer.[27][40][41] He acted as a bookmaker to the genomics community, taking bets on different estimates of the total number of genes (and so-called “junk DNA”[42]) in the human genome.[27][43][44]
    Birney completed his Bachelor of Arts degree in Biochemistry at Balliol College, Oxford in 1996[1][2][45] and did his PhD at the Wellcome Trust Sanger Institute, under the supervision of Richard Durbin[4] as postgraduate student of St John’s College, Cambridge.[46][47]

    He studied under James Watson, of Watson-Crick fame? And Nick Matzke says they’re not that smart and Larry says they don’t understand biochemistry?

  36. See! I told you guys the lack of mutational effects on non-coding regions is not evidence they are junk but of robustness and redundancy.

    Here are papers that reinforce the view that there is a lot of redundancy in the genome, both coding and non coding:

    http://onlinelibrary.wiley.com/doi/10.1002/bies.201400036/epdf

    TRIP [119] are paving the way in this direction.
    In conclusion, TFBSs are often at least partially redundant,
    which can be a side effect of network complexity, but
    potentially also an adaptive trait serving to increase system
    robustness. As a result, TFBS mutations may produce little to
    no phenotype not only when they genuinely have no impact
    on target promoters, but also because their inputs are
    efficiently ‘buffered’ by other TFBSs.
    Since the buffering
    capacity depends on other TFBSs being intact as well as on the
    environmental conditions, a significant fraction of TFBS
    variation may be phenotypically ‘cryptic’. This may have
    implications for interpreting TFBS function, particularly
    under suboptimal conditions such as old age and disease.

    Conclusions
    Collective evidence from the molecular mechanics of
    transcriptional regulation, functional genomics, evo-devo
    and population genetics discussed in this essay suggest that
    rather than considering multiple TF binding events in the
    proximity of a gene as either ‘functional’ or ‘spurious’, it is
    perhaps more appropriate to view them as jointly contributing
    incremental ‘doses’ of transcriptional activation to the
    total pool that can differ in potency from high to negligibly
    low. Redundancy emerges in this system when the total
    ‘dose’ exceeds the ‘threshold of activation’ required to
    generate an adequate level of a gene’s expression. Because
    of such redundancy, some TFBSs appear near neutral, both
    evolutionary and functionally, under normal conditions.
    These same sites however may play pivotal roles when
    the system is pushed away from the optimum by either
    genetic or environmental abnormalities. With the majority of
    disease-associated SNPs mapping to non-coding, potentially
    regulatory regions [100], it is important to gain a better
    understanding of how multiple TF binding events integrate to
    regulate transcription in time and space. Accounting for the
    variable levels of TFBS redundancy under different conditions
    may impro

    http://www.ncbi.nlm.nih.gov/pubmed/9217155

    Genetic redundancy means that two or more genes are performing the same function and that inactivation of one of these genes has little or no effect on the biological phenotype. Redundancy seems to be widespread in genomes of higher organisms. Examples of apparently redundant genes come from numerous studies of developmental biology, immunology, neurobiology and the cell cycle. Yet there is a problem: genes encoding functional proteins must be under selection pressure.

  37. stcordova: See! I told you guys the lack of mutational effects on non-coding regions is not evidence they are junk but of robustness and redundancy.

    If you know so much about it and are so interested and are capable or making guesses that come true, why are you studying Physics and not Biology?

  38. I posted this over at Larry’s blog. Not surprisingly, there weren’t any substantive counter arguments:

    http://sandwalk.blogspot.com/2015/07/john-parrington-and-c-value-paradox.html?showComment=1437966213310#c3108981460501015803

    Just because a DNA doesn’t seem translationally or transcriptionally active does not mean it can’t be active in terms of regulation, providing redundancy, providing a means of adaptive variation, having structural roles, roles in cellular differentiation, or in one case optical roles.

    Sternberg points out:

    “Why the elaborate repositioning of so much “junk” DNA in the rod cells of nocturnal mammals? The answer is optics. A central cluster of chromocenters surrounded by a layer of LINE-dense heterochromatin enables the nucleus to be a converging lens for photons, so that the latter can pass without hindrance to the rod outer segments that sense light. In other words, the genome regions with the highest refractive index — undoubtedly enhanced by the proteins bound to the repetitive DNA — are concentrated in the interior, followed by the sequences with the next highest level of refractivity, to prevent against the scattering of light. The nuclear genome is thus transformed into an optical device that is designed to assist in the capturing of photons. This chromatin-based convex (focusing) lens is so well constructed that it still works when lattices of rod cells are made to be disordered. Normal cell nuclei actually scatter light.

    So the next time someone tells you that it “strains credulity” to think that more than a few pieces of “junk DNA” could be functional in the cell — that the data only point to the lack of design and suboptimality — remind them of the rod cell nuclei of the humble mouse.”

    Given that DNA may have more than just a coding role but may even have gotten recruited for such unexpected roles like being a lens, the C-value paradox might be explainable by the fact that DNA might have unique applications in a variety of organisms than just mere coding for proteins. Large or small amounts of DNA that have function may or may not contribute to the overall complexity of the organism because the excess may not necessarily be for coding functions.

    Junk in one organism doesn’t imply junk in another. So ENCODE could be right about the human genome even if Ferns have junk DNA….

    There is the phenomenon of redundancy whereby duplicates provide function in the way or redundancy, and so extra DNA not immediately used is not necessarily an indication it is junk.

    The central dogma does not necessarily imply all the information for life resides in the DNA, especially ontogenic information. Some candidates for other information storage mechanisms apart from DNA are the glyco protein complexes that implement the “sugar code”. Hence the information that creates the complexity of an organism might be outside the nuclear complex, and thus there is no necessity that DNA be larger for more complex organisms if most of the information to generate that complexity lies outside the nuclear complex.

    Since the information processing of DNA is not restricted to ACGTs only, but also methylation marks and indirectly histone modifications, large amounts of nuclear DNA might be the normal way an organism’s sugar code and other cytoplasmic memories recruit DNA.

    And as pointed out with the mouse eye, junk DNA might have function in unexpected ways in various organisms.

    It is really too early to insist that the C-value paradox is solved merely by invoking junk arguments.

  39. stcordova,

    Not surprisingly, there weren’t any substantive counter arguments:

    Did you provide a substantive argument?

    The mouse eye is (perhaps) an example of pre-existing DNA, accumulated for other reasons, being co-opted for other functions. Evolution is nothing if not a bodger. People make all sorts out of junk. This does not mean it was never really junk. It just stopped being so.

    If a race of diurnal mice arises, what do you think will happen to the crystalline function in the eye? What will happen to junk? Will it hang around in case the mice fancy becoming nocturnal again? Or will it hang around because there is no advantage to getting rid?

    It is certainly possible that junk fulfils a different function in each organism, but this is not parsimonious. Parsimony is a rule of thumb, not a law of course. But given that the commonest inhabitant of all junk in every species is transpositional sequence that has largely and demonstrably lost its ability to transpose, I’d say that’s a strong common factor. The bulk of most supernumerary DNA appears to be the detritus of periodic flare-ups of transpositional activity against which selection, for all but the most virulent, is weak in eukaryotes.

  40. OMagain: Why do you suppose that is?

    To be fair, someone asked Sal what percentage of junk DNA has been recruited, and Sal gave the expected Sal response.

  41. stcordova: There is the phenomenon of redundancy whereby duplicates provide function in the way or redundancy, and so extra DNA not immediately used is not necessarily an indication it is junk.

    I keep stuff in my garage that is occasionally called upon to provide spare parts. But while it is sitting in my garage, it is junk. It is non functional. Both junk and non functional are non-ambiguous terms for stuff that was once in use but is not currently used.

    The fact that a small percentage of it may later become useful is not really surprising, but the percentage is small.

  42. Interesting that bats (nocturnal) have less junk than mice. Flying animals generally do have less. Does this mean it serves a useful function in non-flying animals that flyers don’t need?

  43. It might be worth mentioning, in addition to junk DNA being recruited for the mouse eye, DNA in conjunction with histones can be used as some sort of read write memory as stated in the OP, not the least of which is possibly memory of the brain!

    https://en.wikipedia.org/wiki/Epigenetics_in_learning_and_memory

    While the cellular and molecular mechanisms of learning and memory have long been a central focus of neuroscience, it is only in recent years that attention has turned to the epigenetic mechanisms behind the dynamic changes in gene transcription responsible for memory formation and maintenance. Epigenetic gene regulation often involves the physical marking (chemical modification) of DNA or associated proteins to cause or allow long-lasting changes in gene activity. Epigenetic mechanisms such as DNA methylation and histone modifications (methylation, acetylation, and deacetylation) have been shown to play an important role in learning and memory

    As I said in the OP:

    So it’s just my conjecture, the repetitive DNA is there to act as dynamic store of data. We are fooled into thinking since the sequences don’t change much during an organism’s lifetime that it can’t act as an information processing system since we presume the relevant information is the DNA sequence, when in fact the critical information are the methylation marks.

    The repetition is probably important for the organism to recognized the regions and say, “hey, here’s where a lot of RAM is for me to use.” The mistake is thinking the significant information content is in the “ACTG” sequences, it is not, it is in the methylation markings,

    So there, that extra DNA might get recruited for brain function. We can maybe even test the hypothesis. Knock out all those repetitive sequences and see if a mammalian brain becomes mush. But already there are hints this will be the case because the L1 LINE transposons which Larry keeps arguing are inactive in the germline turn out to be active in neuron somatic cells.

    Larry equivocated the defintion of inactive for germlines with inactive for somatic cells and non-transposition uses. The mouse eye demonstrate L1 LINE transposons can in principle be functional even though they may not jump around in the germline cells. I just demonstrated lots of those L1 LINE transposons are functional even assuming Larry’s claim they are “inactive” in the germline. And if indeed L1 LINE transposons are jumping in somatic cells and/or being used as a memory storage and cognitive mechanism in the brain, Larry and Dan are proven wrong yet again.

    As far as what percentage? I don’t know. I said I agreed with Larry there could be some junk and that ID didn’t predict non-coding DNA was functional. I only point out, Larry’s being premature. The functionality of DNA being recruited for optical lensing should be caution to being hasty in declaring DNA being junk.

  44. stcordova,

    But already there are hints this will be the case because the L1 LINE transposons which Larry keeps arguing are inactive in the germline turn out to be active in neuron somatic cells.

    I have argued this with you before, and you didn’t really follow the argument, but there is a reason for thinking they are inactive – they lack the complete proteins required for transpositional activity in any tissue. You can’t reconstitute broken bits of germline genome in the soma. Development doesn’t work like that. We aren’t ciliates.

    If the L1 transposons active in neurons are anything remotely close to the 17% of the germline genome that consists of L1 LINEs, I will eat the hat of everyone at UD, without salt. How many transposons are we talking about?

Leave a Reply