Adaptive evolution in the human genome
results from the propagation of advantageous mutations through positive selection. This is the modern synthesis of the process which Darwin and Wallace originally identified as the mechanism of evolution. However, in the last half century there has been considerable debate as to whether evolutionary changes at the molecular level are largely driven by natural selection or random genetic drift. Unsurprisingly, the forces which drive evolutionary changes in our own species’ lineage have been of particular interest. Quantifying adaptive evolution in the human genome gives insights into our own evolutionary history and helps to resolve this neutralist-selectionist debate. Identifying specific regions of the human genome that show evidence of adaptive evolution helps us find functionally significant genes, including genes important for human health, such as those associated with diseases.
Methods
The methods used to identify adaptive evolution are generally devised to test the null hypothesis of neutral evolution, which, if rejected, provides evidence of adaptive evolution. These tests can be broadly divided into two categories.Firstly, there are methods that use a comparative approach to search for evidence of function altering mutations. The dN/dS rates-ratio test estimates ω, the rates at which nonsynonymous and synonymous nucleotide substitutions occur. In this model, neutral evolution is considered the null hypothesis, in which dN and dS approximately balance so that ω ≈ 1. The two alternative hypotheses are a relative absence of nonsynonymous substitutions, suggesting the effect on fitness of such mutations is negative, indicating positive effect on fitness, i.e. diversifying selection.
The McDonald-Kreitman test quantifies the amount of adaptive evolution occurring by estimating the proportion of nonsynonymous substitutions which are adaptive, referred to as α. α is calculated as: α = 1-, where dn and ds are as above, and pn and ps are the number of nonsynonymous and synonymous polymorphisms respectively .
Note, both these tests are presented here in basic forms, and these tests are normally modified considerably to account for other factors, such as the effect of slightly deleterious mutations.
The other methods for detecting adaptive evolution use genome wide approaches, often to look for evidence of selective sweeps. Evidence of complete selective sweeps is shown by a decrease in genetic diversity, and can be inferred from comparing the patterns of the Site Frequency Spectrum obtained with the SFS expected under a neutral model. Partial selective sweeps provide evidence of the most recent adaptive evolution, and the methods identify adaptive evolution by searching for regions with a high proportion of derived alleles.
Examining patterns of Linkage Disequilibrium can locate signatures of adaptive evolution. LD tests work on the basic principle that, assuming equal recombination rates, LD will rise with increasing natural selection. These genomic methods can also be applied to search for adaptive evolution in non-coding DNA, where putatively neutral sites are hard to identify.
Another recent method used to detect selection in non-coding sequences examines insertions and deletions, rather than point mutations, although the method has only been applied to examine patterns of negative selection.
Amount of adaptive evolution
Coding DNA
Many different studies have attempted to quantify the amount of adaptive evolution in the human genome, the vast majority using the comparative approaches outlined above. Although there are discrepancies between studies, generally there is relatively little evidence of adaptive evolution in protein coding DNA, with estimates of adaptive evolution often near 0%. The most obvious exception to this is the 35% estimate of α. This comparatively early study used relatively few loci for their estimate, and the polymorphism and divergence data used was obtained from different genes, both of which may have led to an overestimate of α. The next highest estimate is the 20% value of α. However, the MK test used in this study was sufficiently weak that the authors state that this value of α is not statistically significantly different from 0%. Nielsen et al. ’s estimate that 9.8% of genes have undergone adaptive evolution also has a large margin of error associated with it, and their estimate shrinks dramatically to 0.4% when they stipulate that the degree of certainty that there has been adaptive evolution must be 95% or more.This raises an important issue, which is that many of these tests for adaptive evolution are very weak. Therefore, the fact that many estimates are at 0% does not rule out the occurrence of any adaptive evolution in the human genome, but simply shows that positive selection is not frequent enough to be detected by the tests. In fact, the most recent study mentioned states that confounding variables, such as demographic changes, mean that the true value of α may be as high as 40%. Another recent study, which uses a relatively robust methodology, estimates α at 10-20% Boyko et al.. Clearly, the debate over the amount of adaptive evolution occurring in human coding DNA is not yet resolved.
Even if low estimates of α are accurate, a small proportion of substitutions evolving adaptively can still equate to a considerable amount of coding DNA. Many authors, whose studies have small estimates of the amount of adaptive evolution in coding DNA, nevertheless accept that there has been some adaptive evolution in this DNA, because these studies identify specific regions within the human genome which have been evolving adaptively. More genes underwent positive selection in chimpanzee evolution than in human.
The generally low estimates of adaptive evolution in human coding DNA can be contrasted with other species. Bakewell et al. found more evidence of adaptive evolution in chimpanzees than humans, with 1.7% of chimpanzee genes showing evidence of adaptive evolution. Comparing humans with more distantly related animals, an early estimate for α in Drosophila species was 45%, and later estimates largely agree with this. Bacteria and viruses generally show even more evidence of adaptive evolution; research shows values of α in a range of 50-85%, depending on the species examined. Generally, there does appear to be a positive correlation between population size of the species, and amount of adaptive evolution occurring in the coding DNA regions. This may be because random genetic drift becomes less powerful at altering allele frequencies, compared to natural selection, as population size increases.
Non-coding DNA
Estimates of the amount of adaptive evolution in non-coding DNA are generally very low, although fewer studies have been done on non-coding DNA. As with the coding DNA however, the methods currently used are relatively weak. Ponting and Lunter speculate that underestimates may be even more severe in non-coding DNA, because non-coding DNA may undergo periods of functionality, followed by periods of neutrality. If this is true, current methods for detecting adaptive evolution are inadequate to account for such patterns. Additionally, even if low estimates of the amount of adaptive evolution are correct, this can still equate to a large amount of adaptively evolving non-coding DNA, since non-coding DNA makes up approximately 98% of the DNA in the human genome. For example, Ponting and Lunter detect a modest 0.03% of non-coding DNA showing evidence of adaptive evolution, but this still equates to approximately 1 Mb of adaptively evolving DNA. Where there is evidence of adaptive evolution in non-coding DNA, these regions are generally thought to be involved in the regulation of protein coding sequences.As with humans, fewer studies have searched for adaptive evolution in non-coding regions of other organisms. However, where research has been done on Drosophila, there appears to be large amounts of adaptively evolving non-coding DNA. Andolfatto estimated that adaptive evolution has occurred in 60% of untranslated mature portions of mRNAs, and in 20% of intronic and intergenic regions. If this is true, this would imply that much non-coding DNA could be of more functional importance than coding DNA, dramatically altering the consensus view. However, this would still leave unanswered what function all this non-coding DNA performs, as the regulatory activity observed thus far is in just a tiny proportion of the total amount of non-coding DNA. Ultimately, significantly more evidence needs to be gathered to substantiate this viewpoint.
Variation between human populations
Several recent studies have compared the amounts of adaptive evolution occurring between different populations within the human species. Williamson et al. found more evidence of adaptive evolution in European and Asian populations than African American populations. Assuming African Americans are representative of Africans, these results makes sense intuitively, because humans spread out of Africa approximately 50,000 years ago , and these humans would have adapted to the new environments they encountered. By contrast, African populations remained in a similar environment for the following tens of thousands of years, and were therefore probably nearer their adaptive peak for the environment. However, Voight et al. found evidence of more adaptive evolution in Africans, than in Non-Africans, and Boyko et al. found no significant difference in the amount of adaptive evolution occurring between different human populations. Therefore, the evidence obtained so far is inconclusive as to what extent different human populations have undergone different amounts of adaptive evolution.Rate of adaptive evolution
The rate of adaptive evolution in the human genome has often been assumed to be constant over time. For example, the 35% estimate for α calculated by Fay et al. led them to conclude that there was one adaptive substitution in the human lineage every 200 years since human divergence from old-world monkeys. However, even if the original value of α is accurate for a particular time period, this extrapolation is still invalid. This is because there has been a large acceleration in the amount of positive selection in the human lineage over the last 40,000 years, in terms of the number of genes that have undergone adaptive evolution. This agrees with simple theoretical predictions, because the human population size has expanded dramatically in the last 40,000 years, and with more people, there should be more adaptive substitutions. Hawks et al. argue that demographic changes may greatly facilitate adaptive evolution, an argument that somewhat corroborates the positive correlation inferred between population size and amount of adaptive evolution occurring mentioned previously.It has been suggested that cultural evolution may have replaced genetic evolution, and hence slowed the rate of adaptive evolution over the past 10,000 years. However, it is possible that cultural evolution could actually increase genetic adaption. Cultural evolution has vastly increased communication and contact between different populations, and this provides much greater opportunities for genetic admixture between the different populations. However, recent cultural phenomena, such as modern medicine and the smaller variation in modern family sizes, may reduce genetic adaption as natural selection is relaxed, overriding the increased potential for adaptation due to greater genetic admixture.
Strength of positive selection
Studies generally do not attempt to quantify the average strength of selection propagating advantageous mutations in the human genome. Many models make assumptions about how strong selection is, and some of the discrepancies between the estimates of the amounts of adaptive evolution occurring have been attributed to the use of such differing assumptions. The way to accurately estimate the average strength of positive selection acting on the human genome is by inferring the distribution of fitness effects of new advantageous mutations in the human genome, but this DFE is difficult to infer because new advantageous mutations are very rare. The DFE may be exponential shaped in an adapted population. However, more research is required to produce more accurate estimates of the average strength of positive selection in humans, which will in turn improve the estimates of the amount of adaptive evolution occurring in the human genome.Regions of the genome which show evidence of adaptive evolution
A considerable number of studies have used genomic methods to identify specific human genes that show evidence of adaptive evolution. Table 2 gives selected examples of such genes for each gene type discussed, but provides nowhere near an exhaustive list of the human genes showing evidence of adaptive evolution. Below are listed some of the types of gene which show strong evidence of adaptive evolution in the human genome.- Disease genes
- Immune genes
- Testes genes
- Olfactory genes
- Nutrition genes
- Pigmentation genes
- Brain genes?
- Other
Difficulties in identifying positive selection
As noted previously, many of the tests used to detect adaptive evolution have very large degrees of uncertainty surrounding their estimates. While there are many different modifications applied to individual tests to overcome the associated problems, two types of confounding variables are particularly important in hindering the accurate detection of adaptive evolution: demographic changes and biased gene conversion.Demographic changes are particularly problematic and may severely bias estimates of adaptive evolution. The human lineage has undergone both rapid population size contractions and expansions over its evolutionary history, and these events will change many of the signatures thought to be characteristic of adaptive evolution. Some genomic methods have been shown through simulations to be relatively robust to demographic changes. However, no tests are completely robust to demographic changes, and new genetic phenomena linked to demographic changes have recently been discovered. This includes the concept of “surfing mutations”, where new mutations can be propagated with a population expansion.
A phenomenon which could severely alter the way we look for signatures of adaptive evolution is biased gene conversion . Meiotic recombination between homologous chromosomes that are heterozygous at a particular locus can produce a DNA mismatch. DNA repair mechanisms are biased towards repairing a mismatch to the CG base pair. This will lead allele frequencies to change, leaving a signature of non-neutral evolution. The excess of AT to GC mutations in human genomic regions with high substitution rates implies that BGC has occurred frequently in the human genome. Initially, it was postulated that BGC could have been adaptive, but more recent observations have made this seem unlikely. Firstly, some HARs show no substantial signs of selective sweeps around them. Secondly, HARs tend to be present in regions with high recombination rates. In fact, BGC could lead to HARs containing a high frequency of deleterious mutations. However, it is unlikely that HARs are generally maladaptive, because DNA repair mechanisms themselves would be subject to strong selection if they propagated deleterious mutations. Either way, BGC should be further investigated, because it may force radical alteration of the methods which test for the presence of adaptive evolution.
Table 1: Estimates of the amount of adaptive evolution in the human genome
α or proportion of loci that have undergone adaptive evolution | Locus type | Outgroup species | Method | Study |
20 | Protein | Chimpanzee | MK | Zhang and Li 2005 |
6 | Protein | Chimpanzee | MK | Bustamante et al. 2005 |
0-9 | Protein | Chimpanzee | MK | Chimpanzee Sequencing and Analysis Consortium 2005 |
10-20 | Protein | Chimpanzee | MK | Boyko et al. 2008 |
9.8 | Protein | Chimpanzee | dn/ds | Nielsen et al. 2005a |
1.1 | Protein | Chimpanzee | dn/ds | Bakewell et al. 2007 |
35 | Protein | Old-world monkey | MK | Fay et al. 2001 |
0 | Protein | Old-world monkey | MK | Zhang and Li 2005 |
0 | Protein | Old-world monkey | MK | Eyre-Walker and Keightley 2009 |
0.4 | Protein | Old-world monkey | dn/ds | Nielsen et al. 2005b |
0 | Protein | Mouse | MK | Zhang and Li 2005 |
0.11-0.14 | Non-coding | Chimpanzee | MK | Keightley et al. 2005 |
4 | Non-coding | Chimpanzee and Old-world monkey | dn/ds | Haygood et al. 2007 |
0 | Non-coding | Old-world monkey | MK | Eyre-Walker and Keightley 2009 |
0.03 | Non-coding | N/A | dn/ds | Ponting and Lunter 2006 |