The measures FIS, FST, and FIT are related to the amounts of heterozygosity at various levels of population structure. Together, they are called F-statistics, and are derived from F, the inbreeding coefficient. In a simple two-allele system with inbreeding, the genotypic frequencies are: The value for F is found by solving the equation for F using heterozygotes in the above inbred population. This becomes one minus the observed frequency of heterozygotes in a population divided by the expected frequency of heterozygotes at Hardy–Weinberg equilibrium: where the expected frequency at Hardy–Weinberg equilibrium is given by where p and q are the allele frequencies of A and a, respectively. It is also the probability that at any locus, two alleles from a random individual of the population are identical by descent. For example, consider the data from E.B. Ford on a single population of the scarlet tiger moth: From this, the allele frequencies can be calculated, and the expectation of ƒ derived : The different F-statistics look at different levels of population structure. FIT is the inbreeding coefficient of an individual relative to the total population, as above; FIS is the inbreeding coefficient of an individual relative to the subpopulation, using the above for subpopulations and averaging them; and FST is the effect of subpopulations compared to the total population, and is calculated by solving the equation: as shown in the next section.
Partition due to population structure
Consider a population that has a population structure of two levels; one from the individual to the subpopulation and one from the subpopulation to the total. Then the total F, known here as FIT, can be partitioned into FIS and FST : This may be further partitioned for population substructure, and it expands according to the rules of binomial expansion, so that for I partitions:
A reformulation of the definition of F would be the ratio of the average number of differences between pairs of chromosomes sampled within diploid individuals with the average number obtained when sampling chromosomes randomly from the population. One can modify this definition and consider a grouping per sub-population instead of per individual. Population geneticists have used that idea to measure the degree of structure in a population. Unfortunately, there is a large number of definitions for FST, causing some confusion in the scientific literature. A common definition is the following: where the variance of p is computed across sub-populations and p is the expected frequency of heterozygotes.
Fixation index in human populations
It is well established that the genetic diversity among human populations is low, although the distribution of the genetic diversity was only roughly estimated. Early studies argued that 85–90% of the genetic variation is found within individuals residing in the same populations within continents and only an additional 10–15% is found between populations of different continents. Later studies based on hundreds of thousands single-nucleotide polymorphism suggested that the genetic diversity between continental populations is even smaller and accounts for 3 to 7% A later study based on three million SNPs found that 12% of the genetic variation is found between continental populations and only 1% within them. Most of these studies have used the FST statistics or closely related statistics.