There is an approximate 8% excess of Adenine and Thymine above random in the DNA of humans. This suggests mutational bias and/or non-random mutation. If 3 billion coins were found to be 58% heads vs. 42% tails, then the chance hypothesis of a random unbiased coin flip would be easily rejected. The odds of such an event happening are astronomical according to the binomial distribution.
But such an imbalance is reflected in the human genome where:
(Guanine + Cytosine) / (Total Number of Bases) = CG ratio= 42%
(Adenine + Thymine) / (Total Number of Bases) = AT ratio= 58%
Is this due to natural selection?
Well let’s ask it a different way, can selection maintain this if the mutations weren’t biased or non-random? If the mutation rate is hypothetically about 126 point mutations per individual per generation (about Moran and Gruar’s figures), and they all had an a priori probability of 29 Adenine, 29 Cytosine, 29 Guanine, 29 Thymine — that would mean selection would have to select against 8 Guanine and 8 Cytosine, to leave C+G = 42 and A+T = 58. One can of course scale the numbers, but I chose 126 to make the math easier to see.
That’s a whopping 16 mutations per individual that selection has to select against for each generation. Given the Muller limit of mutations is about 1, this falsifies natural selection as the cause for maintaining the low CG ratio unless selection can act in a synergistic sort of way.
What is peculiar however is the CG ratio is different depending on the species. Woese noticed this even in 1961:
THE ratio of guanine plus cytosine to adenine plus thymine (GC/AT) in the deoxyribonucleic acid (DNA) of a given species of a micro-organism is constant, but the GC/AT ratio of different kinds of microorganisms can vary by a factor of as much as five times
and this from wiki:
GC content is found to be variable with different organisms, the process of which is envisaged to be contributed to by variation in selection, mutational bias, and biased recombination-associated DNA repair. The species problem in prokaryotic taxonomy has led to various suggestions in classifying bacteria, and the ad hoc committee on reconciliation of approaches to bacterial systematics has recommended use of GC ratios in higher level hierarchical classification. For example, the Actinobacteria are characterised as “high GC-content bacteria”. In Streptomyces coelicolor A3(2), GC content is 72%. The GC-content of Yeast (Saccharomyces cerevisiae) is 38%, and that of another common model organism, Thale Cress (Arabidopsis thaliana), is 36%. Because of the nature of the genetic code, it is virtually impossible for an organism to have a genome with a GC-content approaching either 0% or 100%. A species with an extremely low GC-content is Plasmodium falciparum (GC% = ~20%), and it is usually common to refer to such examples as being AT-rich instead of GC-poor.
And a curiosity, the Chargaff parity rules:
First parity rule
The first rule holds that a double-stranded DNA molecule globally has percentage base pair equality: %A = %T and %G = %C. The rigorous validation of the rule constitutes the basis of Watson-Crick pairs in the DNA do.
Second parity rule
The second rule holds that both %A ~ %T and %G ~ %C are valid for each of the two DNA strands. This describes only a global feature of the base composition in a single DNA strand.
So we have the DNA that is duplicated via Okazaki fragments the antisense way and the DNA read straight up. I suppose if the 2nd parity rule were violated, we could see an excess amount of G’s or C’s depending the direction we were reading. I suppose parity rule 2 is somewhat unsurprising, except for this fact, that it is occasionally violated:
In 2006, it was shown that this rule applies to four of the five types of double stranded genomes; specifically it applies to the eukaryotic chromosomes, the bacterial chromosomes, the double stranded DNA viral genomes, and the archaeal chromosomes. It does not apply to organellar genomes (mitochondria and plastids) smaller than ~20-30 kbp, nor does it apply to single stranded DNA (viral) genomes or any type of RNA genome. The basis for this rule is still under investigation, although genome size may play a role.
Is the violation because there is a mutation bias or natural selection or both?
Chargaff parity rules only emphasizes the lack of parity between CG and AT and yet this other strange feature:
Wacław Szybalski, in the 1960s, showed that in bacteriophage coding sequences purines (A and G) exceed pyrimidines (C and T). This rule has since been confirmed in other organisms and should probably be now termed “Szybalski’s rule”.
A valid experimental observation is to see whether there is a real time change in the present day from generation to generation of these various ratios or parities. If for example, Plasmodium falciparum CG content doesn’t change from generation to generation even if the genome mutates, then it would seem to me mutations are biased/non-random or something.
1. the 42% figure was inexact since it the average wasn’t weighted by the size of the chromosomes, but I felt 42% was in the Ball Park. The unweighted average is here: