New data on human genomic diversity

The extent of variation present in human populations and the consequences of genetic load seem to be topics of perennial interest here (see, for example, recent comments in the Evolution Visualized thread).  Recent issues of Nature have published a flurry of papers aimed at getting a better handle on just how much genetic diversity is likely to exist among humans.   One notable paper from last August is the following:

Analysis of protein-coding genetic variation in 60,706 humans

In this study, Monkol Lek and many, many colleagues sequenced the exomes–i.e., the portion of DNA sequences that code for proteins along with some accompanying untranslated regions–of more than 60,000 people.  The results were pretty spectacular.  The paper is incredibly dense, but here are some highlights:

  • The authors found more than 7 million reliably identified variants.  Most were single base pair substitutions, but the variants also include more than 300,000 insertions/deletions.
  • On average, 1 out of every 8 nucleotides is variable.  However, the overwhelming majority of variants are rare.  That is they are found in only a single or a few individuals
  • The frequency of different kinds of variants is proportional to both the rate at which they occur as well as the extent to which they are likely to be deleterious.  This is not at all surprising, but it’s neatly demonstrated.  For example,  63.1% of all possible CpG transitions (i.e., a cytosine adjacent to a guanine that mutates to a thymine) were observed, while only 3% of possible transversions were present.  CpG transitions are among the most common type of substitution in mammals, while transversions are less frequent.  Likewise, the proportion of possible synonymous variants that were actually observed was much higher than the proportion of possible nonsynonymous variants that were observed, which is consistent with the generally accepted notion that nonsynonymous mutations are usually subject to stronger purify selection than synonymous mutations.
  • They identified almost 180,000 different protein truncation variants (PTVs), which are protein-coding genes predicted to be shortened due to an introduced stop codon, a frameshift, or removal of a critical splice site.  Amazingly (to me at least), the average genome in their dataset includes 85 PTVs in the heterozygous state and almost 35 PTVs in the homozygous state.
  • They identified more than 100 variants previously thought to contribute to disease phenotypes that are present at anomalously high frequencies in human populations (> 1%).  Based on the fact that the evidence of pathogenicity for most of these variants is actually extremely weak, the authors suggest that these variants are most likely benign.

There is a lot more in the paper that’s worth chewing over, so give it a read.  This is easily the largest dataset of its type ever generated, but it has limitations.  The sampling is heavily biased toward Europeans, and there is likely some variation missing, especially in Central and Middle Eastern Asia.

I imagine that within a few years, we’ll have datasets of similar size consisting of high-coverage, whole genome sequences, which will no doubt show even larger amounts of genomic variation.  It’s an exciting time to be interested in biology!

Some evidence ALUs and SINES aren’t junk and garbologists are wrong

Larry Moran, Dan Graur and other garbologists (promoters of the junkDNA perspective), have argued SINES and ALU elements are non-functional junk. That claim may have been a quasi-defensible position a decade ago, but real science marches forward. Dan Graur can only whine and complain about the hundreds of millions of dollars spent at the NIH and elsewhere that now strengthens his unwitting claim in 2013, “If ENCODE is right, Evolution is wrong.”

Larry said in Junk in Your Genome: SINES
Dynamics of genome evolution in E. coli

Hi All,

The Lenski lab has just published a new paper in Nature that looks at the dynamics of genome evolution in E. coli populations over the course of the LTEE.  Here is the abstract:

Tempo and mode of genome evolution in a 50,000-generation experiment

Adaptation by natural selection depends on the rates, effects and interactions of many mutations, making it difficult to determine what proportion of mutations in an evolving lineage are beneficial. Here we analysed 264 complete genomes from 12 Escherichia coli populations to characterize their dynamics over 50,000 generations. The populations that retained the ancestral mutation rate support a model in which most fixed mutations are beneficial, the fraction of beneficial mutations declines as fitness rises, and neutral mutations accumulate at a constant rate. We also compared these populations to mutation-accumulation lines evolved under a bottlenecking regime that minimizes selection. Nonsynonymous mutations, intergenic mutations, insertions and deletions are overrepresented in the long-term populations, further supporting the inference that most mutations that reached high frequency were favoured by selection. These results illuminate the shifting balance of forces that govern genome evolution in populations adapting to a new environment.

I’m assuming the whole thing is pay-walled, but a pre-print copy (which may or may not be identical to the final version) is freely available here.

I’ve only read the abstract thus far, but the paper seems likely to touch on a variety of topics that folks here like to discuss. Have at it!

What is the Plan?

A prominent ID supporter at UD, gpuccio, has this to say:

My simple point is: reasoning in terms of design, intention and plans is a true science promoter which can help give new perspective to our approach to biology. Questions simply change. The question is no more:

how did this sequence evolve by some non existent neo darwinian mechanism giving reproductive advantage?

but rather:

why was this functional information introduced at this stage? what is the plan? what functions (even completely unrelated to sheer survival and reproduction) are being engineered here?


On the thread entitled “Species Kinds”, commenter phoodoo asks:

What’s the definition of a species?

A simple question but hard to answer. Talking of populations of interbreeding individuals immediately creates problems when looking at asexual organisms, especially the prokaryotes: bacteria and archaea. How to delineate a species temporally is also problematic. Allan Miller links to an excellent basic resource on defining a species and the Wikipedia entry does not shy away from the difficulties.

In case phoodoo thought his question was being ignored, I thought I’d open this thread to allow discussion without derailing the thread on “kinds”.

Thorp, Shannon: Inspiration for Alternative Perspectives on the ID vs. Naturalism Debate

The writings and life work of Ed Thorp, professor at MIT, influenced many of my notions of ID (though Thorp and Shannon are not ID proponents). I happened upon a forgotten mathematical paper by Ed Thorp in 1961 in the Proceedings of the National Academy of Sciences that launched his stellar career into Wall Street. If the TSZ regulars are tired of talking and arguing ID, then I offer a link to Thorp’s landmark paper. That 1961 PNAS article consists of a mere three pages. It is terse, and almost shocking in its economy of words and straightforward English. The paper can be downloaded from:

A Favorable Strategy for Twenty One, Proceedings National Academy of Sciences.

Thorp was a colleague of Claude Shannon (founder of information theory, and inventor of the notion of “bit”) at MIT. Thorp managed to publish his theory about blackjack through the sponsorship of Shannon. He was able to scientifically prove his theories in the casinos and Wall Street and went on to make hundreds of millions of dollars through his scientific approach to estimating and profiting from expected value. Thorp was the central figure in the real life stories featured in the book
Fortune’s Formula: The Untold Story of the Scientific Betting System that Beat the Casino’s and Wall Street by William Poundstone.
Impractical Naturalism of Dan Graur vs. the NIH

I’ll be making a presentation at AM-NAT 2016, and Dan Graur will be the poster boy of impractical naturalism. Below are some things I collected from his websites, some of which I view as highly anti-science. The aim of my presentation isn’t to settle the question of God or no God or ultimate questions of whether godless naturalism is the best description of reality. The goal is to suggest there are some unspoken naturalistic creeds that often take priority over experiments and observations. In a manner of speaking, there are some interpretations of naturalism that actually go against dispassionate examination of how the natural world actually operates.
Epigenetic Memory Changes during Embryogenesis

DNA is not just a static read-only memory (ROM) for coding proteins, but hosts dynamic random access memory (RAM) in the form of methylations and histone modifications for regulation of gene expression, cellular differentiation, learning and cognition, and who knows what else. The picture below depicts how rapidly the RAM aspect of DNA is changed during embryogenesis.
CRISPR goes retro: Bacteria can also take ‘RNA mug shots’ of threatening RNA-viruses.

The emergence of life for the first time on this planet constitutes the classic question of what came first; the chicken or the egg?!  Did a self-replicating DNA system occur before transcription or translation evolved (the DNA World) or did a self-replicating RNA system first emerge (the RNA world) or did self-replicating protein system first emerge (the Protein World)…or … let's just leave it there for now.

Wright, Fisher, and the Weasel

Richard Dawkins’s computer simulation algorithm explores how long it takes a 28-letter-long phrase to evolve to become the phrase “Methinks it is like a weasel”. The Weasel program has a single example of the phrase which produces a number of offspring, with each letter subject to mutation, where there are 27 possible letters, the 26 letters A-Z and a space. The offspring that is closest to that target replaces the single parent. The purpose of the program is to show that creationist orators who argue that evolutionary biology explains adaptations by “chance” are misleading their audiences. Pure random mutation without any selection would lead to a random sequence of 28-letter phrases. There are 27^{28} possible 28-letter phrases, so it should take about 10^{40} different phrases before we found the target. That is without arranging that the phrase that replaces the parent is the one closest to the target. Once that highly nonrandom condition is imposed, the number of generations to success drops dramatically, from 10^{40} to mere thousands.

Although Dawkins’s Weasel algorithm is a dramatic success at making clear the difference between pure “chance” and selection, it differs from standard evolutionary models. It has only one haploid adult in each generation, and since the offspring that is most fit is always chosen, the strength of selection is in effect infinite. How does this compare to the standard Wright-Fisher model of theoretical population genetics? Continue reading

Chargaff Parity Rule 2, Biased/Non-Random Mutations

There is an approximate 8% excess of Adenine and Thymine above random in the DNA of humans. This suggests mutational bias and/or non-random mutation. If 3 billion coins were found to be 58% heads vs. 42% tails, then the chance hypothesis of a random unbiased coin flip would be easily rejected. The odds of such an event happening are astronomical according to the binomial distribution.

But such an imbalance is reflected in the human genome where:
Conservation and function of long noncoding RNAs

How about some cool science as we head toward the weekend?

Let’s talk about long noncoding RNAs (lncRNA) – they are (somewhat arbitrarily) defined as stretches of DNA that are at least 200 base pairs in length that are transcribed into mRNA but have little potential to code for proteins. Determining the function (if one exists) of a particular lncRNA can often be difficult.  In part, this may be due to the fact that lncRNA evolve much more quickly than protein-coding genes do and therefore exhibit a much smaller degree of sequence conservation, which can make identifying orthologs in other related organisms more difficult.  Nevertheless, if a particular lncRNA is functionally important, we would usually expect to see copies of it in related organisms, so finding these homologs can be an important indicator of function.

A new paper in Genes and Development by Quinn et al. is a useful demonstration of this.  The authors find evidence of 47 homologs of roX, an lncRNA involved in X chromosome dosage compensation,  across 35 fruit fly species.  The researchers identity roX homologs based on a combination of short regions of sequence conservation (“microhomology”), RNA secondary structure and synteny (i.e., similarity in location along a chromosome)  Here is the abstract (I believe the paper itself is open access): Continue reading

The Reasonableness of Atheism and Black Swans

As an ID proponent and creationist, the irony is that at the time in my life where I have the greatest level of faith in ID and creation, it is also the time in my life at some level I wish it were not true. I have concluded if the Christian God is the Intelligent Designer then he also makes the world a miserable place by design, that He has cursed this world because of Adam’s sin. See Malicious Intelligent Design.
Tiny sponge fossil predates the Cambrian explosion

New tools could allow scientists to discover other fossils that significantly predate the start of the Cambrian explosion, according to David Bottjer, a professor at the USC Dornsife College of Letters, Arts and Sciences and co-author of a study announcing the finding of the sponge in the Proceedings of the National Academy of Sciences.

“Fundamental traits in sponges were not suddenly appearing in the Cambrian Period, which is when many think these traits were evolving, but many million years earlier,” Bottjer said. “To reveal these types of findings, you have to use pretty high-tech approaches and work with the best people around the world.”

Absolute Fitness in Theoretical Evolutionary Genetics

Joe Felsenstein, like other population geneticists, holds a special place in the Creation/Evolution controversy because his works are regarded highly by many creationists who are familiar with genetics. This is a thread for all of us (myself included) to try to learn and understand one of the key concepts in his book Theoretical Evolutionary Genetics, namely absolute fitness. He has generously made his book available on his website (a book of this calibre could sell for hundreds of dollars).
Lawyers and Scientists

There’s been a skirmish between Larry Moran and Barry Arrington about whether Barry understands the Theory of Evolution, and the latest salvo is a piece at UD, entitled, Can a Lowly Lawyer Make a Useful Contribution? Maybe.

Well, in a sense, Barry makes a useful contribution in that post, as he gives a very nice illustration of a common misunderstanding about the process of hypothesis testing, in this case, basic model-fitting and null hypothesis testing, the workhorse (with all its faults) of scientific research.  Barry writes:

[Philip]Johnson is saying that attorneys are trained to detect baloney.  And that training is very helpful in the evolution debate, because that debate is chock-full of faulty logic (especially circular reasoning), abuse of language (especially equivocations), assumptions masquerading as facts, unexamined premises, etc. etc.

Consider, to take one example of many, cladistics.  It does not take a genius to know that cladistic techniques do not establish common descent; rather they assume it.  But I bet if one asked, 9 out of 10 materialist evolutionists, even the trained scientists among them, would tell you that cladistics is powerful evidence for common descent.  As Johnson argues, a lawyer’s training may help him understand when faulty arguments are being made, sometimes even better than those with a far superior grasp of the technical aspects of the field.  This is not to say that common descent is necessarily false; only cladistics does not establish the matter one way or the other.

In summary, I am trained to evaluate arguments by stripping them down to examine the meaning of the terms used, exposing the underlying assumptions, and following the logic (or, as is often the case, exposing the lack of logic).  And I think I do a pretty fair job of that, both in my legal practice and here at UD.

Barry has made two common errors here.  First he has confused the assumption of common descent with the conclusion of common descent, and thus detected circular reasoning where there is none.  Secondly he has confused the process of fitting a model with the broader concept of a hypothesised model.

Alzheimer’s and Evolution

I have to say, while the UD “newsdesk” is terrible source for comment on scientific news, the links themselves are often interesting.  Today, the UD “newsdesk” reports on a pretty interesting study, reported in Nature, here, and a preprint of which seems to be open access here

It’s been apparent for a while from Genome-Wide Association Studies (GWAS) that “risk” alleles for various mental disorders, despite being statistically significant, have extremely small effect sizes.  In other words, while the studies show that many mental disorders are indeed associated with specific alleles (and we already know that many are highly heritable, including schizophrenia, ADHD and Alzheimer’s), there aren’t just a few rogue alleles of large effect (well, there are, but they are far rarer than these disorders), but instead, a whole cocktail of alleles with very slightly raised Odds Ratios for certain disorders (and some are shared between multiple disorders).  This means that the vast majority of people carrying these “risk alleles” are perfectly fine. That would help explain why they have not been weeded out by selection.

