New data on human genomic diversity

The extent of variation present in human populations and the consequences of genetic load seem to be topics of perennial interest here (see, for example, recent comments in the Evolution Visualized thread).  Recent issues of Nature have published a flurry of papers aimed at getting a better handle on just how much genetic diversity is likely to exist among humans.   One notable paper from last August is the following:

Analysis of protein-coding genetic variation in 60,706 humans

In this study, Monkol Lek and many, many colleagues sequenced the exomes–i.e., the portion of DNA sequences that code for proteins along with some accompanying untranslated regions–of more than 60,000 people.  The results were pretty spectacular.  The paper is incredibly dense, but here are some highlights:

  • The authors found more than 7 million reliably identified variants.  Most were single base pair substitutions, but the variants also include more than 300,000 insertions/deletions.
  • On average, 1 out of every 8 nucleotides is variable.  However, the overwhelming majority of variants are rare.  That is they are found in only a single or a few individuals
  • The frequency of different kinds of variants is proportional to both the rate at which they occur as well as the extent to which they are likely to be deleterious.  This is not at all surprising, but it’s neatly demonstrated.  For example,  63.1% of all possible CpG transitions (i.e., a cytosine adjacent to a guanine that mutates to a thymine) were observed, while only 3% of possible transversions were present.  CpG transitions are among the most common type of substitution in mammals, while transversions are less frequent.  Likewise, the proportion of possible synonymous variants that were actually observed was much higher than the proportion of possible nonsynonymous variants that were observed, which is consistent with the generally accepted notion that nonsynonymous mutations are usually subject to stronger purify selection than synonymous mutations.
  • They identified almost 180,000 different protein truncation variants (PTVs), which are protein-coding genes predicted to be shortened due to an introduced stop codon, a frameshift, or removal of a critical splice site.  Amazingly (to me at least), the average genome in their dataset includes 85 PTVs in the heterozygous state and almost 35 PTVs in the homozygous state.
  • They identified more than 100 variants previously thought to contribute to disease phenotypes that are present at anomalously high frequencies in human populations (> 1%).  Based on the fact that the evidence of pathogenicity for most of these variants is actually extremely weak, the authors suggest that these variants are most likely benign.

There is a lot more in the paper that’s worth chewing over, so give it a read.  This is easily the largest dataset of its type ever generated, but it has limitations.  The sampling is heavily biased toward Europeans, and there is likely some variation missing, especially in Central and Middle Eastern Asia.

I imagine that within a few years, we’ll have datasets of similar size consisting of high-coverage, whole genome sequences, which will no doubt show even larger amounts of genomic variation.  It’s an exciting time to be interested in biology!

36 thoughts on “New data on human genomic diversity

  1. Based on the fact that the evidence of pathogenicity for most of these variants is actually extremely weak, the authors suggest that these variants are most likely benign.

    One might call these variants “near neutral, slightly deleterious” exactly the sort of thing geneticist Kondrashov was concerned about when he posed the rhetorical question: “why have we not died 100 times over?”

    http://www.ncbi.nlm.nih.gov/pubmed/7475094

  2. Rumraket: They could even be effectless because they sit in junk. Who’da thunkit?

    The default and most reasonable assumption — when dealing with unknowns — is goddidit.

  3. stcordova: One might call these variants“near neutral, slightly deleterious” exactly the sort of thing geneticist Kondrashov was concerned about when he posed the rhetorical question:“why have we not died 100 times over?”

    http://www.ncbi.nlm.nih.gov/pubmed/7475094

    Many of the variants discussed in the paper might fall under the category of nearly-neutral. However, if you read the relevant section of the paper, you’ll note that for most of the variants I was discussing in the passage you quoted, there is probably little reason to believe they’re deleterious at all. A few of them were just incorrectly annotated in pathogenicity databases; for others, the evidence of pathogenicity was simply weak. The authors of the paper re-classified over 100 of the variants as non-pathogenic based on their results.

  4. you’ll note that for most of the variants I was discussing in the passage you quoted, there is probably little reason to believe they’re deleterious at all.

    Genetic diversity is important to healthy populations, so there has to be variability, no question of that.

    Sometimes the implication in disease is not evident unless there are more than one change. People with heritable disease may have several nucleotides that need to be in place simultaneously for the disease to take hold, whereas other individuals only holding a single mutated nucleotide are able to live healthy lives, but strictly speaking, they are closer to the edge of experiencing damage.

    There are many reasons for this. An obvious one is that with codon degeneracy synonymous changes can be somewhat tolerated with one mutation, but with two simultaneous mutations there could be a significant change because the amino acid changes. An amino acid change may not immediately affect protein function as far as binding and catalytic ability, but especially in higher organisms there is a lot of post-translational chemical enhancements that are highly amino acid specific and we are only beginning to see the importance of this (a good example are the modifications on specific amino acids on histone tails and portions of transcription factors).

    Additionally, binding sites on DNA can only tolerate so many mutations before they become ineffective. There is some optimal configuration, and the more there is deviation from that configuration, the less the binding is effective. Examples off the top of my head are the vitamin D receptor binding sites where one gene is regulated by molecular machines parked on DNA binding sites tens of thousands of base pairs away.

    Add to this the micro RNA regulation which shows the importance of codon bias.

    There are probably more issues.

    Bottom line, although immediate pathogenicity may not be indicated for a single nucleotide variant, that is not to say it doesn’t compromise the integrity of the genome as it is potentially closer to situation where a few more mutations would be disease inducing. It’s a bit of a simplification to say a mutation is benign merely because it doesn’t have an immediate effect. The margin of safety could be compromised, and that is in my book, not benign.

    What I’m saying isn’t novel, it was something discussed at ENCODE 2015 regarding GWAS studies.

  5. stcordova: People with heritable disease may have several nucleotides that need to be in place simultaneously for the disease to take hold

    If only their parents had not sinned they would have not been inflicted with such!

  6. stcordova,

    Sal, all of that is beside the point. There were a group of variants that had previously been thought to specifically be disease-causing. Turns out that there is simply no good evidence to support that claim in some cases, and the annotations of those variants were altered to reflect the best available evidence.

    You might choose to believe that the 100-200 variants that this section was talking about are actually deleterious for the reasons you mentioned above despite the lack of evidence, but it’s not at all clear to me why you would want to make that assumption for these particular variants as opposed to any of the others.

  7. Dave Carlson: You might choose to believe that the 100-200 variants that this section was talking about are actually deleterious for the reasons you mentioned above despite the lack of evidence, but it’s not at all clear to me why you would want to make that assumption for these particular variants as opposed to any of the others.

    Weeeell, of course there’s this:
    “By definition, no apparent, perceived or claimed evidence in any field, including history and chronology, can be valid if it contradicts the scriptural record.” – Answers In Genesis

  8. Rumraket: Weeeell, of course there’s this:
    “By definition, no apparent, perceived or claimed evidence in any field, including history and chronology, can be valid if it contradicts the scriptural record.” – Answers In Genesis

    Well, sure, but I don’t really see what the particular variants we’re talking about have to do with the scriptural record. Given Sal’s comments in this thread and elsewhere, I don’t believe he’s making the argument that all variation is deleterious. I’m just confused why he’s singling out these 100-200 relatively high frequency variants and suggesting that they may actually be deleterious.

    Certainly many of the variants identified in the study can be put on a continuum representing different degrees of harmful effect. Indeed, the paper goes into some detail trying to determine the extent to which different classes of variants are likely subject to purifying selection. The variants that I was talking about, however, were among those for which there is not good evidence of deleteriousness.

  9. Dave – interesting, thanks. Equally interesting: Sal’s determination to reject data inconvenient for theory, in the now-familiar manner.

  10. Allan Miller:
    Dave – interesting, thanks. Equally interesting: Sal’s determination to reject data inconvenient for theory, in the now-familiar manner.

    Thanks, Allan.
    I think part of the problem was likely my breezy summary of what are actually extremely detailed results. The paper is chock-full of information, and I just highlighted a small portion of what stood out most to me. I would highly recommend reading the whole thing.

  11. Dave Carlson,

    There is no basis in biology for calling a mutation deleterious, if the organism with it exists. That is the problem with the concept of fitness. It is, thus it is fit.

  12. phoodoo:
    Dave Carlson,

    There is no basis in biology for calling a mutation deleterious, if the organism with it exists.That is the problem with the concept of fitness.It is, thus it is fit.

    I addressed this over a week ago:

    phoodoo:
    Actually I think fitness in evolution is a concept with no meaning whatsover, but according to your side, that is what it means.

    You have been provided with multiple references to consistent definitions of biological fitness.

    Continuing to claim otherwise is not honest behavior.

    And yet you continue to claim otherwise.

  13. You might choose to believe that the 100-200 variants that this section was talking about are actually deleterious for the reasons you mentioned above despite the lack of evidence, but it’s not at all clear to me why you would want to make that assumption for these particular variants as opposed to any of the others.

    I choose to say we don’t know for sure either way, but to declare a variant “benign” in the most rigorous fashion would entail studying the behavior and effect of the variation in every cell type in every phase of development in every tissue and in every possible environmental context and also determining if some margin of safety were compromised.

    Making such statements that the variation is benign is speculation, it isn’t experimental validation. The effect may not be detectable in the ways we study things, but that’s not the same there being no effect. Saying “there is no evidence” is a bit of spin of the real situation in which is “we weren’t able to conclusively prove” or “some previous studies made illegitimate inferences”. The inference may be wrong, but there is a chance the conclusion is still correct.

    We don’t know. Declaring it benign reminds me of people saying junkDNA is junk just because they couldn’t think of a way it was functional — like SINES involved in chromatin extrusion Alus involved in memory and cognition.

  14. stcordova: I choose to say we don’t know for sure either way, but to declare a variant “benign” in the most rigorous fashion would entail studying the behavior and effect of the variation in every cell type in every phase of development in every tissue and in every possible environmental context and also determining if some margin of safety were compromised.

    Making such statements that the variation is benign is speculation, it isn’t experimental validation.The effect may not be detectable in the ways we study things, but that’s not the same there being no effect.Saying “there is no evidence” is a bit of spin of the real situation in which is “we weren’t able to conclusively prove”.

    It sounds as if you have a problem with the guidelines that the American College of Medical Genetics have set up for interpreting genetic variation, because those are the rules, however imperfect, that the authors of the study in question used. I would recommend taking it up with the ACMG. Alternatively, if you’d like to discuss deleterious variation discovered in this paper, we could talk about the variants for which there is decent evidence of a harmful effect.

  15. phoodoo:
    Dave Carlson,

    How do you know when its harmful?

    The answer to that question is entirely dependent on the organism in question. For some (e.g., bacteria or yeast), we can determine that a variant is harmful through experimental manipulation. For other organisms, (e.g., humans) we have to rely on somewhat more indirect methods such as pedigrees or association studies.

    But I think you knew all of that already.

  16. It sounds as if you have a problem with the guidelines that the American College of Medical Genetics have set up for interpreting genetic variation, because those are the rules, however imperfect, that the authors of the study in question used.

    Correct, the choice of words has unfortunate connotations especially when those terms are used in questions of mutational load which the OP was somewhat related to. Functional compromising mutations that are near neutral are a major problem for mutational load.

    It isn’t the big phenotypic changes that are the concern for long term health of the species, it is the small genetic changes that are not immediately perceptible but are like one straw at a time that will eventually break the camel’s back but which natural selection cannot individually remove from the population. They may be benign according the ACMG guideline, but as far as population genetics, those are exactly the sorts of things (if they are indeed function compromising) that could lead to permanent irreversible damage to the collective genome of a species.

  17. stcordova,

    Sal, I don’t disagree with any of that per se, but we seem to be talking past each other for reasons that I don’t entirely understand. Perhaps my overly terse and un-detailed presentation of the paper in the OP is to blame. Have you read the actual paper? If you do (or already have) and wanted to discuss specific aspects of it, I would be happy to do so.

  18. Mung: I reject it because of who it came from.

    Mung, you wound me! What have I ever done to you? Or is it Monkol Lek and/or one of his dozens of colleagues with whom you have a disagreement?

  19. Perhaps my overly terse and un-detailed presentation of the paper in the OP is to blame. Have you read the actual paper? If you do (or already have) and wanted to discuss specific aspects of it, I would be happy to do so.

    I didn’t mean to be overly negative on your OP. I appreciate you highlighting it, and it seems like an excellent paper overall.

    I have not read the paper yet, I just got hung up on the choice of words. Apologies.

    I agree these are exciting times to study biology.

    FWIW, at ENCODE 2015, one researcher said she is expecting in the not too distant future there will be 2 to 20 million medical records with DNA samples associated with them since heritable diseases are of intense interest and they want diagnostic methods to identify individuals who are at risk. We could have some amazing data sets in the not to distant future. The paper you highlight probably anticipates things to come. Thanks for sharing it, and I’ll try to have a look at it.

  20. Dave Carlson: Mung, you wound me! What have I ever done to you?

    Sorry, just having a bit of fun at your expense. 🙂

    Over in another thread Patrick is arguing that we ought not trust anything that comes from a creationist. Yes, his view runs precisely counter to Elizabeth’s stated goals for the site, but that seems to go without notice.

    I didn’t mean for my comment to be aimed at you personally.

  21. Mung: Sorry, just having a bit of fun at your expense.

    Over in another thread Patrick is arguing that we ought not trust anything that comes from a creationist. Yes, his view runs precisely counter to Elizabeth’s stated goals for the site, but that seems to go without notice.

    I didn’t mean for my comment to be aimed at you personally.

    Heh, it’s fine. I was joking also, but I was too lazy to figure out how emojis work on WordPress.

  22. Mung:
    . . .
    Over in another thread Patrick is arguing that we ought not trust anything that comes from a creationist.
    . . . .

    Not exactly. I simply pointed out that creationists are motivated by goals other than scientific accuracy and, historically, this has led to them making claims unsupported by, and sometimes in direct contradiction to, the available evidence. I explicitly noted that this does not mean that any particular statement by a creationist is necessarily inaccurate, but that, again based on historical observations, it would be foolish to accept such without confirming it for oneself.

    There’s my Friday gift to you: Word lawyer to your heart’s content.

  23. Yes, Patrick just admitted the truth of my claim. No need to word lawyer anything. Don’t trust anything from a creationist. They could be lying. Probably are. Probably best to assume they are. Lizzie’s RuleZ!

  24. So Dave, is this human genomic diversity a good thing?

    Does it call into question any creationist claims?

    Isn’t it evidence for “young life”?

    😉

  25. Mung:
    Yes, Patrick just admitted the truth of my claim. No need to word lawyer anything. Don’t trust anything from a creationist. They could be lying. Probably are. Probably best to assume they are. Lizzie’s RuleZ!

    Do read more carefully (yes, I know it slows down your seagull commenting — that’s a feature, not a bug). Creationists have a history of getting the science wrong. (If they didn’t, they probably wouldn’t be creationists.) It is foolish not to consider that history when evaluating a creationist claim about science.

  26. What’s your point Patrick, that Salvador, when he’s talking about “the science” is lying, otherwise what he says can be trusted?

  27. Patrick: Based on historical observations, trusting creationists is foolish.

    Must suck, trying to figure out who is or who is not a creationist so that you can decide whether or not they can be trusted. Make Lizzie proud, Patrick.

  28. Mung:
    What’s your point Patrick, that Salvador, when he’s talking about “the science” is lying, otherwise what he says can be trusted?

    My point is that whenever any creationist makes scientific claims it would be foolish, based on historical observations, to accept them without verification. On other topics it depends on the creationist.

Leave a Reply