Fay and Wu's H


Fay and Wu's H is a statistical test created by and named after two researchers Justin Fay and Chung-I Wu. The purpose of the test is to distinguish between a DNA sequence evolving randomly and one evolving under positive selection. This test is an advancement over Tajima's D, which is used to differentiate neutrally evolving sequences from those evolving non-randomly. Fay and Wu's H is frequently used to identify sequences which have experienced selective sweeps in their evolutionary history.

Concept

Imagine a DNA sequence which has very few polymorphisms in its alleles across different populations. This could arise due to at least three causes:
  1. The sequence is experiencing heavy negative selection, so any new mutation in the sequence is deleterious and is purged off immediately, or
  2. The sequence just experienced a bout of selective sweep, so all alleles became homogenized. The rare polymorphisms you see are very recent, or
  3. There was a population bottleneck, so all individuals in the population are derived from a small set common ancestor
Now, when you calculate Tajima's D using all the alleles across all populations, because there is an excess of rare polymorphisms, Tajima's D will show up negative and will tell you that the particular sequence was evolving non-randomly. However, you don't know whether this is because of some selection acting or whether there was some selective sweep recently or due to population expansion/contraction. To know that, you calculate Fay and Wu's H.
Fay and Wu's H not only uses population polymorphism data but also data from an outgroup species. Due to the outgroup species, you can now tell what the ancestral state of the allele was before the two lineages split. If, for example, the ancestral allele was different, you can now say that there was a selective sweep in that region. The magnitude of the selective sweep will be decided by the strength of H. If the allele was the same, it means the sequence is experiencing negative selection and the ancestral state is maintained. On the other hand, an H close to 0 means that there is no evidence of deviation from neutrality.

Interpretation

A significantly positive Fay and Wu's H indicates a deficit of moderate- and high-frequency derived single nucleotide polymorphisms relative to equilibrium expectations, whereas a significant negative Fay and Wu's H indicates an excess of high-frequency derived SNPs.