The extent of variation present in human populations and the consequences of genetic load seem to be topics of perennial interest here (see, for example, recent comments in the Evolution Visualized thread). Recent issues of Nature have published a flurry of papers aimed at getting a better handle on just how much genetic diversity is likely to exist among humans. One notable paper from last August is the following:
In this study, Monkol Lek and many, many colleagues sequenced the exomes–i.e., the portion of DNA sequences that code for proteins along with some accompanying untranslated regions–of more than 60,000 people. The results were pretty spectacular. The paper is incredibly dense, but here are some highlights:
- The authors found more than 7 million reliably identified variants. Most were single base pair substitutions, but the variants also include more than 300,000 insertions/deletions.
- On average, 1 out of every 8 nucleotides is variable. However, the overwhelming majority of variants are rare. That is they are found in only a single or a few individuals
- The frequency of different kinds of variants is proportional to both the rate at which they occur as well as the extent to which they are likely to be deleterious. This is not at all surprising, but it’s neatly demonstrated. For example, 63.1% of all possible CpG transitions (i.e., a cytosine adjacent to a guanine that mutates to a thymine) were observed, while only 3% of possible transversions were present. CpG transitions are among the most common type of substitution in mammals, while transversions are less frequent. Likewise, the proportion of possible synonymous variants that were actually observed was much higher than the proportion of possible nonsynonymous variants that were observed, which is consistent with the generally accepted notion that nonsynonymous mutations are usually subject to stronger purify selection than synonymous mutations.
- They identified almost 180,000 different protein truncation variants (PTVs), which are protein-coding genes predicted to be shortened due to an introduced stop codon, a frameshift, or removal of a critical splice site. Amazingly (to me at least), the average genome in their dataset includes 85 PTVs in the heterozygous state and almost 35 PTVs in the homozygous state.
- They identified more than 100 variants previously thought to contribute to disease phenotypes that are present at anomalously high frequencies in human populations (> 1%). Based on the fact that the evidence of pathogenicity for most of these variants is actually extremely weak, the authors suggest that these variants are most likely benign.
There is a lot more in the paper that’s worth chewing over, so give it a read. This is easily the largest dataset of its type ever generated, but it has limitations. The sampling is heavily biased toward Europeans, and there is likely some variation missing, especially in Central and Middle Eastern Asia.
I imagine that within a few years, we’ll have datasets of similar size consisting of high-coverage, whole genome sequences, which will no doubt show even larger amounts of genomic variation. It’s an exciting time to be interested in biology!
The Lenski lab has just published a new paper in Nature that looks at the dynamics of genome evolution in E. coli populations over the course of the LTEE. Here is the abstract:
Adaptation by natural selection depends on the rates, effects and interactions of many mutations, making it difficult to determine what proportion of mutations in an evolving lineage are beneficial. Here we analysed 264 complete genomes from 12 Escherichia coli populations to characterize their dynamics over 50,000 generations. The populations that retained the ancestral mutation rate support a model in which most fixed mutations are beneficial, the fraction of beneficial mutations declines as fitness rises, and neutral mutations accumulate at a constant rate. We also compared these populations to mutation-accumulation lines evolved under a bottlenecking regime that minimizes selection. Nonsynonymous mutations, intergenic mutations, insertions and deletions are overrepresented in the long-term populations, further supporting the inference that most mutations that reached high frequency were favoured by selection. These results illuminate the shifting balance of forces that govern genome evolution in populations adapting to a new environment.
I’m assuming the whole thing is pay-walled, but a pre-print copy (which may or may not be identical to the final version) is freely available here.
I’ve only read the abstract thus far, but the paper seems likely to touch on a variety of topics that folks here like to discuss. Have at it!
My favorite subject-specific journal is Molecular Biology and Evolution (MBE). This journal publishes on topics primarily related to molecular evolution and evolutionary genomics, which are among my favorite subjects in biology. I’m happy to report that the latest issue of MBE is out today, and there are lots of great articles that I think will be of interest to folks here, many of which are open-access.
I sadly don’t have time to write up any of these articles, but I thought it might be useful to “sample” a few in case any any of you would like to read and discuss them. Here are a handful that seem particularly interesting:
How about some cool science as we head toward the weekend?
Let’s talk about long noncoding RNAs (lncRNA) – they are (somewhat arbitrarily) defined as stretches of DNA that are at least 200 base pairs in length that are transcribed into mRNA but have little potential to code for proteins. Determining the function (if one exists) of a particular lncRNA can often be difficult. In part, this may be due to the fact that lncRNA evolve much more quickly than protein-coding genes do and therefore exhibit a much smaller degree of sequence conservation, which can make identifying orthologs in other related organisms more difficult. Nevertheless, if a particular lncRNA is functionally important, we would usually expect to see copies of it in related organisms, so finding these homologs can be an important indicator of function.
A new paper in Genes and Development by Quinn et al. is a useful demonstration of this. The authors find evidence of 47 homologs of roX, an lncRNA involved in X chromosome dosage compensation, across 35 fruit fly species. The researchers identity roX homologs based on a combination of short regions of sequence conservation (“microhomology”), RNA secondary structure and synteny (i.e., similarity in location along a chromosome) Here is the abstract (I believe the paper itself is open access): Continue reading
Earlier this week, Nick Matzke published an article at Science that uses modern phylogenetics techniques to analyze how creationist (or antievolution, if you prefer) legislation has evolved in the post-Dover era (note: I have not read the paper yet). This seems to have struck a pretty sensitive nerve over at the Discovery Institute. On the Evolution News and Views blog, the DI’s John West published an article that all but accuses Matzke of misappropriation of taxpayer funding because he acknowledges funding support from an NSF grant that doesn’t appear to be obviously connected to the content of this particular piece of research.
Needless to say, that accusation is as silly as it is groundless, which Matzke explains here. Regardless of the merits (or obvious lack thereof) of the allegation, I’m rather curious regarding West’s motivations for making such a claim. It seems ugly, intemperate and ill-advised even by the DI’s standards. Are they upset that creationists are once again getting a black eye in a prominent and respected scientific journal? Is the fact that the article was written by somebody who was instrumental in the ID movement’s failure during the Dover trial what’s driving this response? Something else entirely?
Whatever the reason, I view this accusation by West as a particularly egregious example of bad behavior on the part of the DI, and I wanted to bring it up for discussion here.