Conservation and function of long noncoding RNAs

Posted on January 21, 2016 by Dave Carlson

How about some cool science as we head toward the weekend?

Let’s talk about long noncoding RNAs (lncRNA) – they are (somewhat arbitrarily) defined as stretches of DNA that are at least 200 base pairs in length that are transcribed into mRNA but have little potential to code for proteins. Determining the function (if one exists) of a particular lncRNA can often be difficult. In part, this may be due to the fact that lncRNA evolve much more quickly than protein-coding genes do and therefore exhibit a much smaller degree of sequence conservation, which can make identifying orthologs in other related organisms more difficult. Nevertheless, if a particular lncRNA is functionally important, we would usually expect to see copies of it in related organisms, so finding these homologs can be an important indicator of function.

A new paper in Genes and Development by Quinn et al. is a useful demonstration of this. The authors find evidence of 47 homologs of roX, an lncRNA involved in X chromosome dosage compensation, across 35 fruit fly species. The researchers identity roX homologs based on a combination of short regions of sequence conservation (“microhomology”), RNA secondary structure and synteny (i.e., similarity in location along a chromosome) Here is the abstract (I believe the paper itself is open access):

Many long noncoding RNAs (lncRNAs) can regulate chromatin states, but the evolutionary origin and dynamics driving lncRNA–genome interactions are unclear. We adapted an integrative strategy that identifies lncRNA orthologs in different species despite limited sequence similarity, which is applicable to mammalian and insect lncRNAs. Analysis of the roX lncRNAs, which are essential for dosage compensation of the single X chromosome in Drosophila males, revealed 47 new roX orthologs in diverse Drosophilid species across ∼40 million years of evolution. Genetic rescue by roX orthologs and engineered synthetic lncRNAs showed that altering the number of focal, repetitive RNA structures determines roX ortholog function. Genomic occupancy maps of roX RNAs in four species revealed conserved targeting of X chromosome neighborhoods but rapid turnover of individual binding sites. Many new roX-binding sites evolved from DNA encoding a pre-existing RNA splicing signal, effectively linking dosage compensation to transcribed genes. Thus, dynamic change in lncRNAs and their genomic targets underlies conserved and essential lncRNA–genome interactions.

I think it’s a neat demonstration of both the challenges and utility of using evolutionary conservation to help inform inferences regarding the functionality of non-coding genes. As we continue to get a better grasp of which non-coding sequences are important and which ones are less so, I expect to see many more studies like this.

Carl Zimmer also has an excellent write up of the paper here, which was the original inspiration for this post. Enjoy!

Edited to correct the number of species in which roX was detected.

78 thoughts on “Conservation and function of long noncoding RNAs”

Alan Fox on January 21, 2016 at 8:57 pm said:

From Carl Zimmer’s article:

“This is the most powerful, clear-cut example that we should reconsider junk DNA,” said John Rinn, a cellular biologist at Harvard University who was not involved in the research.

Oh dear! I hope for Dr Rinn’s sake that Larry Moran doesn’t spot this!
Dave Carlson on January 21, 2016 at 9:00 pm said:

Small correction: the paper reports the discovery of 47 roX homologs (either roX1 or roX2), but only from 35 species, not 47.
Dave Carlson on January 21, 2016 at 9:10 pm said:

Alan Fox:
From Carl Zimmer’s article:

Oh dear! I hope for Dr Rinn’s sake that Larry Moran doesn’t spot this!

Haha I noticed that. I’ve done a bit of transcriptomics work, and the proportion of transcripts that don’t appear to be protein coding really is pretty striking. I’m more-or-less agnostic regarding the functional potential of currently uncharacterized non-coding transcripts, but I think it’s a really interesting area of study.
colewd on January 22, 2016 at 12:04 am said:

Hi Dave
Interesting article; thanks for the post.
Dave Carlson on January 22, 2016 at 12:46 am said:

colewd:
Hi Dave
Interesting article; thanks for the post.

I’m glad you enjoyed it!
petrushka on January 22, 2016 at 1:59 am said:

I don’t see anythimg that would surprise Moran.
Rumraket on January 22, 2016 at 1:24 pm said:

petrushka:
I don’t see anythimg that would surprise Moran.

Me neither. And he would (I hereby predict), quite rightly, ask the people who say this is strong evidence against junk-DNA to try to estimate the fraction of the genome these lncRNA’s take up. 0,0000125% ?

At this rate of discovery of function in junk, it’s only going to take A GEOLOGIC EON to assign function to the entire genome of your average animal or plant species.
petrushka on January 22, 2016 at 2:12 pm said:

And this is why Moran has hissy fits when sloppy writers imply that there’s little or no junk.

When you run the numbers, regulatory DNA is about three times as numerous as protein coding DNA, and together they make 8-10 percent of the genome.
Mung on January 22, 2016 at 9:31 pm said:

http://www.regulomedb.org/
TomMueller on January 23, 2016 at 4:12 pm said:

Whenever I think of functional lncRNA – XIST and Barr Bodies start dancing in my mind, which then conjure images of Hinnies and Mules. Why should male/female Hinnies be different from male/female Mules?!

Check out this quote from Peter Fraser:

“Dynamic changes in chromatin, and nuclear structure can regulate patterns of cellular gene expression during differentiation and development, or in response to environmental signals. Our research looks at various levels of chromatin, chromosome and nuclear structure, from individual nucleosome modifications to the dynamic 3D structure of chromosomes and their inter-relationships in the nucleus and how they affect genome functions.”

http://www.babraham.ac.uk/our-research/nuclear-dynamics/peter-fraser

here are some fast and easy sound-bites:

http://www.bbsrc.ac.uk/news/health/2013/130925-pr-x-shape-chromosome-structure/

In other words, crucial bits and pieces of DNA may be down-regulated by sequestration to the interior or up-regulated by exposure to the periphery of some non-random chromosomal super-structure. This chromosomal super-structure therefore requires non-informational nucleic acid “clean fill” (to use Doolittle’s phraseology) between the crucial bits that get moved around.

The c-value paradox is not puzzling when comparing onions to say metazoan lineages; but rather, very very puzzling when comparing some lineages of onions to other lineages of onions.

I think that intervening non-informational (but still functional) nucleic acid “clean-fill” DNA can occur in +/- multiples within certain parameters such that too much bulk DNA becomes too much of a metabolic burden while too little cannot be stretched to fulfill its function of redundant (but not necessarily superfluous) gene regulation as described by Peter Fraser.

Ford Doolittle (in personal communication) did in fact suggest something along these lines was happening and also was subject to selection:

Doolittle: That “excess” DNA (that 40x as much that lungfish have compared to us, for instance) might play a structural role or determine other cellular parameters under selection has long been accepted, even by us “junk” supporters.”

Let’s leave selection out of the discussion for the interim. First things first.

I think Peter Fraser presents an interesting case worthy of close consideration.

Let’s tie all of this in with another interesting recent publication:
https://www.quantamagazine.org/20160105-supercoiled-dna/

I am reminded that one proposed mechanism of enhancer/activators finding their way to distal transcription complexes is a “slither model” where introducing super-coiled twists into a looped domain of DNA results in a series of figure eights where regions of the DNA writhe or slither past each other. I never did like the standard DNA “flopping” or “bending” model of enhancers finding their targets as found in most books.

Let’s tie this together – DNA architecture may provide an exuberantly redundant means of gene regulation or even cell differentiation mediated by differential exposed super-coiled domains. Alternatively, differential supercoiling and sequestration of entire blocks of unneeded genome may merely represent a means of shutting down pesky TEs whenever possible. Alternatively, the darn junk DNA interferes with the regulation of gene expression unless it’s correctly folded out of the way (tip of the hat to Barb Wilson).

The obvious candidate mediating these speculative proposals would be lncRNA. Sequence conservation is rendered moot when conserved and essential lncRNA–genome interaction regions are relatively small and interspersed with non-conserved spacers as suggested in this paper.
Alan Fox on January 23, 2016 at 9:48 pm said:

Welcome to TSZ, Tom.
stcordova on January 23, 2016 at 10:33 pm said:

Beyond the issue of long ncRNAs are ncRNAs in general.

The presumption is that DNAs and RNAs are purely for coding and regulation purposes of proteins, that they might not participate in other function.

In the case of human memory and cognition, it appears the epigenetic marks associated with DNA are changed dynamically as part of the process of remembering!

It totally makes sense there could be variability in the genome size if there is an epigenomic and/or epitranscritomic component that helps an organism explore functional space.

For example, if epigenetic marks are used in the process of cognition, it stands to reason variability in genome size of the ncDNA/ ncRNAs may create variability in the ability to learn and memorize.

I also think extra DNA can be recruited in self-healing functions.

A testable hypothesis is comparing the c-values of various lineages of onions and observing their ability to recover from injury.

In self-healing man-made networks like BitCoin, redundancy factors of 400 are the norm. I would not be surprised to see such enormous redundancy factors in biology.

I will add a comment how this relates to the issue of evolutionary conservation.
stcordova on January 23, 2016 at 10:39 pm said:

The theory of evolutionary conservation is that similarity is retained between species because these are fairly critical functions, therefore they are preserved by purifying selection.

I believe similarity between species is a design feature to assist humans in understanding biology, so we will be able to elucidate function by interpsecies comparisons that reveal deep similarity.

A problem in trying to explain evolutionary conservation is that there may not be sufficient population resources to sustain purifying selection — a point not lost on Larry Moran and others.

I should point this out:

http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.0050234

Abstract

Ultraconserved elements have been suggested to retain extended perfect sequence identity between the human, mouse, and rat genomes due to essential functional properties. To investigate the necessities of these elements in vivo, we removed four noncoding ultraconserved elements (ranging in length from 222 to 731 base pairs) from the mouse genome. To maximize the likelihood of observing a phenotype, we chose to delete elements that function as enhancers in a mouse transgenic assay and that are near genes that exhibit marked phenotypes both when completely inactivated in the mouse and when their expression is altered due to other genomic modifications. Remarkably, all four resulting lines of mice lacking these ultraconserved elements were viable and fertile, and failed to reveal any critical abnormalities when assayed for a variety of phenotypes including growth, longevity, pathology, and metabolism. In addition, more targeted screens, informed by the abnormalities observed in mice in which genes in proximity to the investigated elements had been altered, also failed to reveal notable abnormalities. These results, while not inclusive of all the possible phenotypic impact of the deleted sequences, indicate that extreme sequence constraint does not necessarily reflect crucial functions required for viability.
Author Summary

It is widely believed that the most evolutionarily conserved DNA sequences in the human genome have been preserved because of their functional importance and that their removal would thus have a devastating effect on the organism. To ascertain this we removed from the mouse genome four ultraconserved elements—sequences of 200 base pairs or longer that are 100% identical among human, mouse, and rat. To our surprise, we found that the mice lacking these elements are viable, fertile, and show no apparent abnormalities. This completely unexpected finding indicates that extreme levels of DNA sequence conservation are not necessarily indicative of an indispensable functional nature.

So I personally think the similarities are not evolutionary conserved but created for the benefit of humans for scientific discovery.

Changes in fitness with such a large genome like the human genome are too small to detect in most cases, and fitness isn’t a great way of elucidating engineering function anyway.
Alan Fox on January 24, 2016 at 12:25 pm said:

I see there is parallel discussion at Larry Moran’s Sandwalk and at Uncommon Descent, where Vincent Torley writes:

Over at the Sandwalk blog, a commenter named Tom Mueller wrote: “FOXP2 is one of several interesting genes that constitute an EXTRAORDINARY EXCEPTION to the rule when expecting the inexorable ticking of some presumed neutral molecular clock – to wit Human accelerated regions (aka HARs).”

Professor Joe Felsenstein made another point: “Trees from different genes are not all exactly the same. There is noise owing to accidental convergence, reversals, and parallelism. But they do come out very similar, and when you look at many loci in eukaryotes the signal of an underlying evolutionary tree becomes overwhelming. See, for example, Doug Theobald’s papers on formal tests of common ancestry (here, for example).”

What do readers think of these points?

Tom Mueller has commented above and Professor Felsenstein pops in here from time to time so may reply to Vincent directly. Vincent is also welcome to revisit if he can find time.
stcordova on January 25, 2016 at 3:30 pm said:

FOXP2 is one of several interesting genes that constitute an EXTRAORDINARY EXCEPTION to the rule when expecting the inexorable ticking of some presumed neutral molecular clock

The supposed clock is broken! It ticks at different rates for different genes, it is restricted to the coding regions in some cases and totally ignores the introns in genes that cross the prokaryote/eukaryote divide.

Finally, the INTRA-species divergence is in conflict with the inter-species divergence. Why should aaRS genes in E. Coli be identical in strains world wide in various hosts and contexts that are environmentally isolated, yet that blasted clock is supposedly ticking in between bacteria?

We don’t know. Maybe mutations are not random and maybe “conservation” isn’t really conservation but God-made similarity and diversity that forms hierarchical patterns between species.
Tom English on January 28, 2016 at 1:36 pm said:

Dave,

Sorry not to have jumped in here, and tried to learn from you. Please don’t be discouraged by the paucity of comments. I suspect that you’ll have an impact on the composition of the audience, if you keep at it.
Joe Felsenstein on January 28, 2016 at 2:00 pm said:

stcordova: The supposed clock is broken! It ticks at different rates for different genes, it is restricted to the coding regions in some cases and totally ignores the introns in genes that cross the prokaryote/eukaryote divide.

Finally, the INTRA-species divergence is in conflict with the inter-species divergence. Why should aaRS genes in E. Coli be identical in strains world wide in various hosts and contexts that are environmentally isolated, yet that blasted clock is supposedly ticking in between bacteria?

We don’t know. Maybe mutations are not random and maybe “conservation” isn’t really conservation but God-made similarity and diversity that forms hierarchical patterns between species.

These are cases where it was not expected that the clock would predict equal amounts of difference. so Salvador Cordova is way off base:

1. It has been known basically forever that different loci have different rates of change, that different regions of a gene will change at different rates. None of that is in violation of the molecular clock. Such genes may or may not evolve in a clocklike fashion, but these rate differences are not evidence of that. They reflect different amounts of conservation in the different genes and different regions.

2. Different amounts of intraspecies variation reflect different migration rates in different species, as well as “selective sweeps” in local regions of the genome within species. Show me anyone who predicted that different species cannot have different migration rates.

In all of these cases we use a molecular clock as a basic part of the modeling. We can then do some tests of it. I am not saying that molecular clocks are perfect or universal — the more different two species are the more likely that their genomes change at different rates. But the particular phenomena Salvador points to are just not evidence of violation of the molecular clock.
Joe Felsenstein on January 28, 2016 at 2:04 pm said:

Alan Fox quoting Vincent Torley at UD talking about a thread at Sandwalk: Professor Joe Felsenstein made another point: “Trees from different genes are not all exactly the same. There is noise owing to accidental convergence, reversals, and parallelism. But they do come out very similar, and when you look at many loci in eukaryotes the signal of an underlying evolutionary tree becomes overwhelming. See, for example, Doug Theobald’s papers on formal tests of common ancestry (here, for example).”

What do readers think of these points?

I agree with it.
Allan Miller on January 28, 2016 at 3:24 pm said:

TomMueller,

Hi, Tom!
Allan Miller on January 28, 2016 at 3:29 pm said:

stcordova,

A problem in trying to explain evolutionary conservation is that there may not be sufficient population resources to sustain purifying selection — a point not lost on Larry Moran and others.

You might be surprised, if you had a play with Kimura’s ‘equation 10’ I keep touting, just how small a selection coefficient needs to be to make a fixed beneficial allele very hard to dislodge, even in quite small populations. And I don’t think you can assume smaller populations to be the norm in a long run. Especially if you are right about their susceptibility to extinction (which, to some extent, you are).
Allan Miller on January 28, 2016 at 3:39 pm said:

stcordova,

Why should aaRS genes in E. Coli be identical in strains world wide in various hosts and contexts that are environmentally isolated, yet that blasted clock is supposedly ticking in between bacteria?

They tick very slowly, and AARS genes appear to be particularly susceptible to HGT, making them poor divergence markers. It makes sense: any given aminoacylation is almost a universal function among descendants of LUCA (54 triplet assignments are universal, 9 only slightly variant). A better, or even just a different, AARS sequence will probably work fine in many contexts, less so for genes specific to particular taxa.

The Ignicoccus/Nanoarchaeon ‘exosymbiosis’ I mentioned elsewhere contains a striking example of this – their AARSs are much more closely related than most of their other genes.
petrushka on January 28, 2016 at 3:45 pm said:

Am I off base thinking that if a population is small, its smallness is probably due to some external environmental condition.

It could be a founder population. It could be facing a new predator. It could be facing a decline in food supply.

But speaking of small populations, how’s genetic entropy working out for the python population in south Florida? Or for any invasive species?
Dave Carlson on January 28, 2016 at 3:51 pm said:

Tom English:
Dave,

Sorry not to have jumped in here, and tried to learn from you. Please don’t be discouraged by the paucity of comments. I suspect that you’ll have an impact on the composition of the audience, if you keep at it.

Appreciate it, Tom! I don’t always have a lot to say (or the time to say it), but I do like reading and discussing evolutionary biology, and I will try to post some topics about current research as I’m able to.
petrushka on January 28, 2016 at 3:55 pm said:

Dave Carlson:

Anyone who makes substantive posts on the internet must know that they have few peers capable of expanding on the topic.

And a lot of harpies lurking and pouncing.

Sooner or later, any evolution post turns into a fight with creationists, most likely, off topic.

I hope this doesn’t discourage the posting of original material. It does get read.
Dave Carlson on January 28, 2016 at 4:05 pm said:

petrushka:
I hope this doesn’t discourage the posting of original material. It does get read.

Not at all. I’ve been reading (if not necessarily contributing) to these sorts of discussions for over a decade now, so I get how they tend to go.
stcordova on January 28, 2016 at 4:08 pm said:

AARS genes appear to be particularly susceptible to HGT,

Not demonstrated experimentally! That was an ad hoc rationalization to explain a incongruent pattern of similarity.

The rationale was, “the similarity pattern isn’t explained by common descent, therefore HGT is impicated.” The one thing that will never be considered is common design.
stcordova on January 28, 2016 at 4:11 pm said:

You might be surprised, if you had a play with Kimura’s ‘equation 10’ I keep touting, just how small a selection coefficient needs to be to make a fixed beneficial allele very hard to dislodge,

Dislodging a beneficial is not the same as purifying deleterious mutations out of populations.
Alan Fox on January 28, 2016 at 4:26 pm said:

petrushka: Anyone who makes substantive posts on the internet must know that they have few peers capable of expanding on the topic.

Well put! I’m sure most of our readers appreciate an informative post from someone covering their field of expertise. I’m certainly often guilty of reading, appreciating and forgetting to mention it.
petrushka on January 28, 2016 at 4:35 pm said:

stcordova: , “the similarity pattern isn’t explained by common descent, therefore HGT is impicated.”

Well, HGT is observed to happen. We can watch it and photograph it happening.

The divine finger diddler? Not so much.
petrushka on January 28, 2016 at 4:37 pm said:

Alan Fox: Well put! I’m sure most of our readers appreciate an informative post from someone covering their field of expertise. I’m certainly often guilty of reading, appreciating and forgetting to mention it.

But we have no shortage of cranks capable of dismantling two hundred years of physics, chemistry, biology, astronomy, and geology.

No shortage at all.
petrushka on January 28, 2016 at 4:38 pm said:

stcordova: Dislodging a beneficial is not the same as purifying deleterious mutations out of populations.

Sal, how’s genetic entropy working out with small populations of invasive species?
stcordova on January 28, 2016 at 5:52 pm said:

As I said, I believe study of conservation patterns (really patterns of similarity, conservation is a superfluous add-on), does suggest biological significance or function. The way these suspicions are confirmed however is through experiment. It should be instructive to see that we can’t just look at one cell and conclude a lncRNA is of no use. It may even be dependent on the cell and the stage of development.

We know from ENCODE research, the transcriptome changes a lot even during the stages of the cell. That said, here is one paper:

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3928180/

Long non-coding RNAs (lncRNAs) were recently shown to regulate chromatin remodelling activities. Their function in regulating gene expression switching during specific developmental stages is poorly understood. Here we describe a nuclear, non-coding transcript responsive for the stage-specific activation of the chicken adult αD globin gene. This noncoding transcript, named α-globin transcript long non-coding RNA (lncRNA-αGT) is transcriptionally upregulated in late stages of chicken development, when active chromatin marks the adult αD gene promoter. Accordingly, the lncRNA-αG promoter drives erythroid-specific transcription. Furthermore, loss of function experiments showed that lncRNA-αGT is required for full activation of the αD adult gene and maintenance of transcriptionally active chromatin. These findings
uncovered lncRNA-αGT as an important part of the switching from embryonic to adult α-globin gene expression, and suggest a function of lncRNA-αGT in contributing to the maintenance of adult α-globin gene expression by promoting an active chromatin structure.

That shows there is foresight for the adult stage.

As large as ENCODE is, they only track 150 cell types, and those mostly immortalized cells in the cancerous stage. Even ENCODE and ROADMAP have hardly scratched the surface of the Quadrillion transcriptomes that might be involved in development of an adult human. There is no way Larry can be sure of his claim of junk.

ENCODE researchers every year discover function and roles in non-Coding DNA, and usually it is surprising. Larry is premature. And the fact that ENCODE research keep discovering new roles in places people didn’t suspect indicates a trend that we will discover more.

Usually the GWAS studies will give a providential breakthrough that something is up. Usually the researcher reports, “variation in the intronic and intergenic regions compromises health, we report the regulatory role of these regions….” It’s not going to stop. There is no way Larry can know the non-coding DNA have no function in any of the cells during development. If ENCODE and ROADMAP don’t have the data, surely Larry doesn’t either.

NOTES:
adapted from comments here regarding RNAs that are cell and stage specific:

Repetitive DNA and ENCODE
Allan Miller on January 29, 2016 at 12:06 pm said:

stcordova,

Dislodging a beneficial is not the same as purifying deleterious mutations out of populations.

They are opposite sides of the same coin. Beneficial and deleterious are relative terms. If a beneficial allele arises, all its alleles are deleterious relative to it. If it fixes, it has ‘purified the deleterious out of the population’. Dislodging it means reversing that, and fixing the deleterious instead. The latter is (on analysis of ‘equation 10’) substantially less likely.
Allan Miller on January 29, 2016 at 12:26 pm said:

stcordova,

Not demonstrated experimentally! That was an ad hoc rationalization to explain a incongruent pattern of similarity.

In many cases, you can trace the point at which the event occurred. How? By using the incongruent pattern, and finding where in the branching structure the incongruence arises. Clearly, both vertical and horizontal descent can happen, therefore only a clueless phylogeneticist would expect solely the former to inform the pattern.

The rationale was, “the similarity pattern isn’t explained by common descent, therefore HGT is impicated.” The one thing that will never be considered is common design.

Mainly because it’s ridiculous. If ‘like creatures need like DNA’, it is weakened when we find unlike creatures with like DNA. Especially when they are observed to be involved in symbiosis right now.

Common Design does not explain the patterns. On common descent, gross transfer and non-transfer events would be expected to have subsequent effect on phylogeny – gene gain/loss, indel, inversion, transposition, copy number variation, karyotype change, and so on and so on. And that’s what we find.

I mean, why does a designer create different AARSs which j on phylogenetic analysis to jump between branches at the base but then appear congruent from that point on? Wouldn’t one do? It is as if He wants us to infer evolution.

It would be foolish to pretend HGT is invalid as an explanation of vertical-assumption incongruity, as if it cannot happen.
cubist on January 29, 2016 at 1:04 pm said:

stcordova: Not demonstrated experimentally! That was an ad hoc rationalization to explain a incongruent pattern of similarity.

The rationale was, “the similarity pattern isn’t explained by common descent, therefore HGT is impicated.” The one thing that will never be considered is common design.

Well, when you Creationists invoke “common design”, you never do bother to specify any, you know, practical details about the particular Designer-concept you’d like real scientists to consider. And absent those details, what, exactly, are real scientists supposed to “consider” about this “common design” thingie?

Would you expect scientists to “consider” the mindlessly vague not-really-a-hypothesis an unknown number of unknown entities of unknown powers, using unknown tools & techniques to achieve an unknown number of unknown purposes, performed an unknown number of unknown actions which, by some unknown means, somehow-or-other resulted in the phenomenon you’re observing? If you would, you’re a bloody, blinkered fool. If you wouldn’t expect real scientists to “consider” the aforementioned mindlessly vague not-really-a-hypothesis, why would you expect real scientists to “consider” the equally vague not-really-a-hypothesis of “oh, um, Common Design, and no we’re not gonna pony up any specifics”?
stcordova on January 29, 2016 at 3:11 pm said:

I’m more-or-less agnostic regarding the functional potential of currently uncharacterized non-coding transcripts, but I think it’s a really interesting area of study.

Well, I hope over time you’ll be a true believer.

http://genesdev.cshlp.org/content/23/13/1494.full.html

Most of the eukaryotic genome is transcribed, yielding a complex network of transcripts that includes tens of thousands of long noncoding RNAs with little or no protein-coding capacity. Although the vast majority of long noncoding RNAs have yet to be characterized thoroughly, many of these transcripts are unlikely to represent transcriptional “noise” as a significant number have been shown to exhibit cell type-specific expression, localization to subcellular compartments, and association with human diseases. Here, we highlight recent efforts that have identified a myriad of molecular functions for long noncoding RNAs. In some cases, it appears that simply the act of noncoding RNA transcription is sufficient to positively or negatively affect the expression of nearby genes. However, in many cases, the long noncoding RNAs themselves serve key regulatory roles that were assumed previously to be reserved for proteins, such as regulating the activity or localization of proteins and serving as organizational frameworks of subcellular structures. In addition, many long noncoding RNAs are processed to yield small RNAs or, conversely, modulate how other RNAs are processed. It is thus becoming increasingly clear that long noncoding RNAs can function via numerous paradigms and are key regulatory molecules in the cell.
….

The act of transcribing ncRNAs can have profound consequences on the ability of nearby genes to be expressed (Katayama et al. 2005). For example, transcription of a ncRNA across the promoter region of a downstream protein-coding gene can directly interfere with transcription factor binding, and thus prevent the protein-coding gene from being expressed (Martens et al. 2004). In fact, transcriptional interference mechanisms have been shown to regulate key developmental decisions, such as where homeotic (Hox) genes are expressed (Petruk et al. 2006) and whether Saccharomyces cerevisiae enters into meiosis (Hongay et al. 2006). Even if not directly interfering with a nearby promoter, transcription of ncRNAs can induce histone modifications that repress transcription initiation of overlapping protein-coding genes, as demonstrated at the yeast PHO84 (Camblong et al. 2007) and GAL1-10 gene clusters (Houseley et al. 2008). This is because transcriptional elongation causes histone marks to be added that prevent “spurious” transcription initiation from sites within the body of the transcript (Carrozza et al. 2005; Joshi and Struhl 2005; Keogh et al. 2005). Noncoding transcription has even been shown to induce the formation of heterochromatin at the p15 tumor suppressor gene locus that persisted after noncoding transcription was turned off, suggesting that the transient expression of ncRNAs can have long-lasting heritable effects on gene expression (Yu et al. 2008).
TomMueller on January 31, 2016 at 7:13 pm said:

I recently had an epiphany… or perhaps a brain-fart. I cannot tell which yet.
Consider that most levels of gene regulation are “leaky” according to the general methods of gene regulation generally discussed in textbooks:
That may explain the need for redundancy in gene regulation (processing control, post-translational control, translational control, mRNA stability control, together with transcriptional control) which are ALL BTW employed by Prokaryotes as well somewhere or other.
More importantly, that may explain why there must be some fail-safe or over-riding mechanism to completely silence certain categories of non-housekeeping genes in multi-cellular Eukaryotes – specifically if proto-oncogenes need turning off at certain developmental stages or certain classes of gene product are lethal in inappropriately differentiated cells.
If that mechanism were to be facultative chromatin formation, many enigmas (such as “junk” DNA and the c-value paradox) that have been plaguing discussions on other blogs could find resolution; especially when considering host strategies to control “selfish DNA” from an evolutionary perspective.
In line with this blogs topic dedicated to lncRNA – this line of speculation may in part explain part of the c-paradox dilemma along the lines of what Doolittle deemed to be the genomic equivalent of “clean fill” : useful and technically functional but not along the lines ENCODE over-hyped with their findings. “Along the lines…”?!?! hmm a lame pun actually.
When the load of Retroelements or selfish DNA becomes too great, the host is obliged to evolve mechanisms to prevent further spread or even mechanisms to eliminating retroelements altogether.
Let’s say that hosts in general have co-opted a common defense against selfish DNA as a means of gene regulation (some hosts better than others; Lungfish compared with Fugu. Perhaps?). Shutting down what once was pesky selfish DNA (those LINEs etc) has become a means of shutting down entire domains of chromatin and may constitute the major mechanism of cell differentiation where entire categories of genes are completely silenced. It would appear different hosts have convergently evolved identical defenses against what would be equivalent to their versions of Alu & LINEs. That would mean sequence conservation becomes a red herring as any criterion of “function”, as different hosts evolve similar strategies to tame and even co-opt erstwhile non-identical but similarly selfish or parasitic DNA eventually becoming symbiotic DNA with respect to gene regulation… with greater or less importance in some lineages and even exuberantly redundant in others.
I just wanted to float that out there welcoming any and all to shoot down this tentative trial balloon of mine.
TomMueller on January 31, 2016 at 7:22 pm said:

Before discussing “Epigenetics” I would challenge anyone not old enough to sport grey hair to revisit some classical genetics.

Here is a original PNAS “Core Concepts” article that provoked a classical geneticist’s outraged and indignant response:

http://www.pnas.org/content/110/9/3209.full?ijkey=ea8e1094836a8d747eddf25193795c206de0d25f&keytype2=tf_ipsecsha

Here is Mark Ptashne’s indignant and outraged response:

http://www.pnas.org/content/110/18/7101.full#ref-1

In short nucleosome modifications may be necessary for epigenetic response maintenance, but they are not sufficient for either initiation nor maintenance of any epigenetic response

. Kudos to Mark Ptashne!
Dave Carlson on January 31, 2016 at 7:56 pm said:

TomMueller,

Hi Tom,
Thanks for your comment. I’m not sure I totally understand your idea here, so apologies if this is a dumb question: are you suggesting that heterchromatin formation may have evolved as a means to silence selfish genetic elements (transposable elements, etc.) but this mechanism was later co-opted to take on regulatory functions?
TomMueller on January 31, 2016 at 8:12 pm said:

Dave Carlson,

That’s the most important part of my random scribbles

Meanwhile- any co-opted functionality would exist in more or less multiples when comparing one host to another. I was never bothered so much that an Onion could have so much more DNA than an human but rather perturbed that different lineages of onions could differ so much from each other.

My “aha” moment was a result of the notion that gene regulation typically described in textbooks is inefficient and leaky – requiring some backup or failsafe in multi-cellular organisms with differentiated cells in order to shut down entire categories of genes in certain cell types while leaving others on – but the reverse in other cell types of the same organism.
TomMueller on January 31, 2016 at 8:17 pm said:

When the load of Retroelements or selfish DNA becomes too great, the host is obliged to evolve mechanisms to prevent further spread or even mechanisms to eliminating retroelements altogether.

The later may prove “too expensive” even though some hosts “get close to the mark” – leaving the former option which may prove more or effective in some organisms than others. Some onion lineages may have had more success in eliminating the bulk of selfish DNA better than other onion lineages. Meanwhile all onion lineages managed to co-opt facultative heterochromatin formation as a method of global gene control during cell differentiation.

I hope that was more coherent…

best
stcordova on February 1, 2016 at 12:49 am said:

From Mark Ptaschne:

I ignore DNA methylation in the remainder of this article because its possible role in development remains unclear, and it does not exist in, for example, flies and worms—model organisms the study of which has taught us much of what we know about development.

Good gravy, that’s a terrible reason for ignoring methylation!

http://genesdev.cshlp.org/content/28/6/652.abstract

DNA methylation is required for the control of stem cell differentiation in the small intestine

he mammalian intestinal epithelium has a unique organization in which crypts harboring stem cells produce progenitors and finally clonal populations of differentiated cells. Remarkably, the epithelium is replaced every 3–5 d throughout adult life. Disrupted maintenance of the intricate balance of proliferation and differentiation leads to loss of epithelial integrity or barrier function or to cancer. There is a tight correlation between the epigenetic status of genes and expression changes during differentiation; however, the mechanism of how changes in DNA methylation direct gene expression and the progression from stem cells to their differentiated descendants is unclear. Using conditional gene ablation of the maintenance methyltransferase Dnmt1, we demonstrate that reducing DNA methylation causes intestinal crypt expansion in vivo. Determination of the base-resolution DNA methylome in intestinal stem cells and their differentiated descendants shows that DNA methylation is dynamic at enhancers, which are often associated with genes important for both stem cell maintenance and differentiation. We establish that the loss of DNA methylation at intestinal stem cell gene enhancers causes inappropriate gene expression and delayed differentiation.

Not to mention, the National Institutes of Health is investing almost 200 million total for the Epigenetics Roadmap Project. Roadmap is ENCODE’s sister, so to speak.

http://www.roadmapepigenomics.org/
stcordova on February 1, 2016 at 1:28 am said:

I’m sympathetic to the following view as to the role of lncRNAs and other RNAs. It is called the ceRNA hypothesis:

https://en.wikipedia.org/wiki/Competing_endogenous_RNA_%28CeRNA%29

In molecular biology, competing endogenous RNAs (abbreviated ceRNAs) regulate other RNA transcripts by competing for shared microRNAs.[1]

MicroRNAs (miRNA) are an abundant class of small, non-coding RNAs (~22nt long), which negatively regulate gene expression at the levels of messenger RNAs (mRNAs) stability and translation inhibition. The human genome consists of over 500 miRNA, each one targeting hundreds of different genes. It is estimated that half of all genes of the genome are targets of miRNA, spanning a large layer of regulation on a post-transcriptional level.[2] The seed region, which comprises nucleotides 2-8 of the 5’ portion of the miRNA, is particularly crucial for mRNA recognition and silencing.[3]

Recent studies have shown that the interaction of the miRNA seed region with mRNA is not unidirectional, but that the pool of mRNAs, transcribed pseudogenes, long noncoding RNAs (lncRNA),[4] circular RNA (circRNA) [5][6] compete for the same pool of miRNA.[7] These competitive endogenous RNAs (ceRNAs) act as molecular sponges for a microRNA through their miRNA binding sites (also referred to as miRNA response elements, MRE), thereby de-repressing all target genes of the respective miRNA family. Experimental evidence for such a ceRNA crosstalk has been initially shown for the tumor suppressor gene PTEN, which is regulated by the 3’ untranslated region (3’UTR) of the pseudogene PTENP1 in a DICER-dependent manner.[8]
…
Other validated ceRNA regulators
….
Linc-MD1

Linc-MD1, a muscle-specific long non-coding RNA, activates muscle-specific gene expression by regulating expression of MAML1 and MEF2C via antagonizing miR-133 and miR-135.

which relates to:

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3234495/

A Long Noncoding RNA Controls Muscle Differentiation by Functioning as a Competing Endogenous RNA

Recently, a new regulatory circuitry has been identified in which RNAs can crosstalk with each other by competing for shared microRNAs. Such competing endogenous RNAs (ceRNAs) regulate the distribution of miRNA molecules on their targets and thereby impose an additional level of post-transcriptional regulation. Here we identify a muscle-specific long noncoding RNA, linc-MD1, which governs the time of muscle differentiation by acting as a ceRNA in mouse and human myoblasts. Downregulation or overexpression of linc-MD1 correlate with retardation or anticipation of the muscle differentiation program, respectively. We show that linc-MD1 “sponges” miR-133 and miR-135 to regulate the expression of MAML1 and MEF2C, transcription factors that activate muscle-specific gene expression. Finally, we demonstrate that linc-MD1 exerts the same control over differentiation timing in human myoblasts, and that its levels are strongly reduced in Duchenne muscle cells. We conclude that the ceRNA network plays an important role in muscle differentiation.
TomMueller on February 1, 2016 at 11:27 am said:

Hi stcordova

I think you may be misreading Ptashne. Ptashne is not claiming that methylation states of DNA are unimportant. Of course they are! Ptashne is merely claiming that methylation states of DNA are neither necessary nor sufficient for initiating or maintaining epigenetic changes.
I direct your attention to the seminal line within your quote from the CSH paper:

“…how changes in DNA methylation direct gene expression and the progression from stem cells to their differentiated descendants is unclear.”

Ptashne is emphatic that DNA methylation “directs” nothing. Without continual recruitment of transcription factors, nucleosome modifications will inexorably decay. The maintenance of these recruitment factors are dependent upon positive feedback and double negative feedback mechanisms long known to geneticists of the “old school” who first elucidated genetic control in bacterial/bacteriophage systems.

Here is a great review: http://www.medicine.mcgill.ca/physio/mackeylab/courses_mackey/pdf_files/ferrell_2002.pdf

Ptashne must be on to something. As I mentioned above regarding Hinnies and Mules. Why should male/female Hinnies be different from male/female Mules?!

The methylation status of their chromosomes are presumably identical.

Of course, parental imprinting may be occurring similar to the story of IGF2r gene regulation. Problem is that even the IGF2r story is remarkably similar to the Lambda CI- CII story Ptashne is describing.
TomMueller on February 1, 2016 at 11:55 am said:

stcordova – Thank you for the the ceRNA reference
TomMueller on February 1, 2016 at 11:57 am said:

@ Dave Carlson

You will note I prefaced my initial soliloquy with

I recently had an epiphany… or perhaps a brain-fart. I cannot tell which yet.

The notion that gene regulation can be leaky was also a phenomenon long known to geneticists of the “old school”. Look no further than the “Lactose Paradox”

Leaky gene expression would constitute lethality for certain classes of genes in multicellular organisms.

However, my naïve line of speculation that multicellular organisms employ facultative heterochromatin as a method to globally and completely shut down the expression of certain categories of genes during development and differentiation is proving problematic. I am having difficulty wrapping my head how Fugu fish deal with the leaky gene regulation problem.

I think I am in over my head here…
Allan Miller on February 1, 2016 at 12:32 pm said:

TomMueller,

[…] rather perturbed that different lineages of onions could differ so much from each other.

Is this because you remain determined to find a role for junk? 😀 Intra-species variation of as much as 2.5% is observed in this study. Given that diploids and tetraploids don’t appear to suffer much (there could of course be compensatory benefits), I think the intuition that genome size must have a significant selective component is probably mistaken. Within any species, genome size is kept around a mean by meiosis and random(ish) mating. But reproductive isolation allows the separate means to diverge. Hence the authors infer cryptic species where variations as high as 6.9% were found.
Allan Miller on February 1, 2016 at 12:34 pm said:

Of course, the silencing of transposons is a slightly different issue. The main problem there is that they damage genes, so there is stronger selection – it’s mediated by that effect, rather than the incremental effect on gene size, which is pretty tiny per generation.
Allan Miller on February 1, 2016 at 12:35 pm said:

Allan Miller,

*genome size
TomMueller on February 1, 2016 at 1:16 pm said:

Hi Allan – great to reconnect!!!

As we discussed on another blog site: It appears that chromosomes have architecture that is organized into active and inactive domains. Whole domains of DNA can be inactivated by conversion into inactive Facultative Heterochromatin nestled towards the interior of the nucleus. Inactive heterochromatin DNA is methylated and tightly wound around acetylated histones.

I am a huge fan of Peter Fraser’s work. Perhaps chromosomes have their equivalent to tertiary and quaternary structure. Otherwise how does one explain constancy of karyotypes across primate lineages unless invoking positive selection?

ITMT – Check out this link: http://phys.org/news/2013-09-x-shape-true-picture-chromosome-imaging.html

Back to Fugu… maybe multi-cellular Eukaryotes have already managed to overcome the “leaky gene regulation” problem by redundancy of regulatory mechanisms rendering my proposal superfluous or exuberantly redundant.

But on the other hand, facultative heterochromatin formation while redundant may be “less expensive” while also providing “backup – just in case”. That would address the Fugu exception and the positive selection rebuttals n’est-ce pas?