Single-nucleotide polymorphism
A single-nucleotide polymorphism is a substitution of a single nucleotide at a specific position in the genome, that is present in a sufficiently large fraction of the population.
For example, at a specific base position in the human genome, the C nucleotide may appear in most individuals, but in a minority of individuals, the position is occupied by an A. This means that there is a SNP at this specific position, and the two possible nucleotide variations – C or A – are said to be the alleles for this specific position.
SNPs pinpoint differences in our susceptibility to a wide range of diseases. The severity of illness and the way the body responds to treatments are also manifestations of genetic variations. For example, a single-base mutation in the APOE gene is associated with a lower risk for Alzheimer's disease.
A single-nucleotide variant is a variation in a single nucleotide without any limitations of frequency and may arise in somatic cells. A somatic single-nucleotide variation may also be called a single-nucleotide alteration.
Types
Single-nucleotide polymorphisms may fall within coding sequences of genes, non-coding regions of genes, or in the intergenic regions. SNPs within a coding sequence do not necessarily change the amino acid sequence of the protein that is produced, due to degeneracy of the genetic code.SNPs in the coding region are of two types: synonymous and nonsynonymous SNPs. Synonymous SNPs do not affect the protein sequence, while nonsynonymous SNPs change the amino acid sequence of protein. The nonsynonymous SNPs are of two types: missense and nonsense.
SNPs that are not in protein-coding regions may still affect gene splicing, transcription factor binding, messenger RNA degradation, or the sequence of noncoding RNA. Gene expression affected by this type of SNP is referred to as an eSNP and may be upstream or downstream from the gene.
Applications
- Association studies can determine whether a genetic variant is associated with a disease or trait.
- A tag SNP is a representative single-nucleotide polymorphism in a region of the genome with high linkage disequilibrium. Tag SNPs are useful in whole-genome SNP association studies, in which hundreds of thousands of SNPs across the entire genome are genotyped.
- Haplotype mapping: sets of alleles or DNA sequences can be clustered so that a single SNP can identify many linked SNPs.
- Linkage disequilibrium The distance between the SNPs . 2) Recombination rate .
Frequency
Within a genome
The genomic distribution of SNPs is not homogenous; SNPs occur in non-coding regions more frequently than in coding regions or, in general, where natural selection is acting and "fixing" the allele of the SNP that constitutes the most favorable genetic adaptation. Other factors, like genetic recombination and mutation rate, can also determine SNP density.SNP density can be predicted by the presence of microsatellites: AT microsatellites in particular are potent predictors of SNP density, with long repeat tracts tending to be found in regions of significantly reduced SNP density and low GC content.
Within a population
There are variations between human populations, so a SNP allele that is common in one geographical or ethnic group may be much rarer in another. Within a population, SNPs can be assigned a minor allele frequency—the lowest allele frequency at a locus that is observed in a particular population. This is simply the lesser of the two allele frequencies for single-nucleotide polymorphisms.Importance
Variations in the DNA sequences of humans can affect how humans develop diseases and respond to pathogens, chemicals, drugs, vaccines, and other agents. SNPs are also critical for personalized medicine. Examples include biomedical research, forensics, pharmacogenetics, and disease causation, as outlined below.Clinical research
SNPs' greatest importance in clinical research is for comparing regions of the genome between cohorts in genome-wide association studies. SNPs have been used in genome-wide association studies as high-resolution markers in gene mapping related to diseases or normal traits. SNPs without an observable impact on the phenotype are still useful as genetic markers in genome-wide association studies, because of their quantity and the stable inheritance over generations.Forensics
SNPs were used initially for matching a forensic DNA sample to a suspect but it has been phased out with development of STR-based DNA fingerprinting techniques. Current next-generation-sequencing techniques may allow for better use of SNP genotyping in a forensic application so long as problematic loci are avoided. In the future SNPs may be used in forensics for some phenotypic clues like eye color, hair color, ethnicity, etc. Kidd et al. have demonstrated that a panel of 19 SNPs can identify the ethnic group with good probability of match in 40 population groups studied. One example of how this might potentially be useful is in the area of artistic reconstruction of possible premortem appearances of skeletal remains of unknown individuals. Although a facial reconstruction can be fairly accurate based strictly upon anthropological features, other data that might allow a more accurate representation include eye color, skin color, hair color, etc.In a situation with a low amount of forensic sample or a degraded sample, SNP methods can be a good alternative to STR methods due to the abundance of potential markers, amenability to automation, and potential reduction of required fragment length to only 60–80 bp. In the absence of a STR match in DNA profile database; different SNPs can be used to get clues regarding ethnicity, phenotype, lineage, and even identity.
Pharmacogenetics
Some SNPs are associated with the metabolism of different drugs. SNP's can be mutations, such as deletions, which can inhibit or promote enzymatic activity; such change in enzymatic activity can lead to decreased rates of drug metabolism. The association of a wide range of human diseases like cancer, infectious diseases autoimmune, neuropsychiatric and many other diseases with different SNPs can be made as relevant pharmacogenomic targets for drug therapy.Disease
A single SNP may cause a Mendelian disease, though for complex diseases, SNPs do not usually function individually, rather, they work in coordination with other SNPs to manifest a disease condition as has been seen in Osteoporosis. One of the earliest successes in this field was finding a single base mutation in the non-coding region of the APOC3 that associated with higher risks of hypertriglyceridemia and atherosclerosis.All types of SNPs can have an observable phenotype or can result in disease:
- SNPs in non-coding regions can manifest in a higher risk of cancer, and may affect mRNA structure and disease susceptibility. Non-coding SNPs can also alter the level of expression of a gene, as an eQTL.
- SNPs in coding regions:
- *synonymous substitutions by definition do not result in a change of amino acid in the protein, but still can affect its function in other ways. An example would be a seemingly silent mutation in the multidrug resistance gene 1, which codes for a cellular membrane pump that expels drugs from the cell, can slow down translation and allow the peptide chain to fold into an unusual conformation, causing the mutant pump to be less functional and the C3435T polymorphism changes ATC to ATT at position 1145 ).
- * nonsynonymous substitutions:
- **missense – single change in the base results in change in amino acid of protein and its malfunction which leads to disease in the DNA sequence
- ** nonsense – point mutation in a sequence of DNA that results in a premature stop codon, or a nonsense codon in the transcribed mRNA, and in a truncated, incomplete, and usually nonfunctional protein product.
Examples
- rs6311 and rs6313 are SNPs in the Serotonin 5-HT2A receptor gene on human chromosome 13.
- A SNP in the F5 gene causes Factor V Leiden thrombophilia.
- rs3091244 is an example of a triallelic SNP in the CRP gene on human chromosome 1.
- TAS2R38 codes for PTC tasting ability, and contains 6 annotated SNPs.
- rs148649884 and rs138055828 in the FCN1 gene encoding M-ficolin crippled the ligand-binding capability of the recombinant M-ficolin.
- An intronic SNP in DNA mismatch repair gene PMS2 is associated with increased sperm DNA damage and risk of male infertility.
Databases
- dbSNP is a SNP database from the National Center for Biotechnology Information., dbSNP listed 149,735,377 SNPs in humans.
- is a compendium of SNPs from multiple data sources including dbSNP.
- SNPedia is a wiki-style database supporting personal genome annotation, interpretation and analysis.
- The OMIM database describes the association between polymorphisms and diseases
- dbSAP – single amino-acid polymorphism database for protein variation detection
- The Human Gene Mutation Database provides gene mutations causing or associated with human inherited diseases and functional SNPs
- The International HapMap Project, where researchers are identifying Tag SNPs to be able to determine the collection of haplotypes present in each subject.
- GWAS Central allows users to visually interrogate the actual summary-level association data in one or more genome-wide association studies.
Chromosome | Length | All SNPs | TSC SNPs | ||
Total SNPs | kb per SNP | Total SNPs | kb per SNP | ||
1 | 214,066,000 | 129,931 | 1.65 | 75,166 | 2.85 |
2 | 222,889,000 | 103,664 | 2.15 | 76,985 | 2.90 |
3 | 186,938,000 | 93,140 | 2.01 | 63,669 | 2.94 |
4 | 169,035,000 | 84,426 | 2.00 | 65,719 | 2.57 |
5 | 170,954,000 | 117,882 | 1.45 | 63,545 | 2.69 |
6 | 165,022,000 | 96,317 | 1.71 | 53,797 | 3.07 |
7 | 149,414,000 | 71,752 | 2.08 | 42,327 | 3.53 |
8 | 125,148,000 | 57,834 | 2.16 | 42,653 | 2.93 |
9 | 107,440,000 | 62,013 | 1.73 | 43,020 | 2.50 |
10 | 127,894,000 | 61,298 | 2.09 | 42,466 | 3.01 |
11 | 129,193,000 | 84,663 | 1.53 | 47,621 | 2.71 |
12 | 125,198,000 | 59,245 | 2.11 | 38,136 | 3.28 |
13 | 93,711,000 | 53,093 | 1.77 | 35,745 | 2.62 |
14 | 89,344,000 | 44,112 | 2.03 | 29,746 | 3.00 |
15 | 73,467,000 | 37,814 | 1.94 | 26,524 | 2.77 |
16 | 74,037,000 | 38,735 | 1.91 | 23,328 | 3.17 |
17 | 73,367,000 | 34,621 | 2.12 | 19,396 | 3.78 |
18 | 73,078,000 | 45,135 | 1.62 | 27,028 | 2.70 |
19 | 56,044,000 | 25,676 | 2.18 | 11,185 | 5.01 |
20 | 63,317,000 | 29,478 | 2.15 | 17,051 | 3.71 |
21 | 33,824,000 | 20,916 | 1.62 | 9,103 | 3.72 |
22 | 33,786,000 | 28,410 | 1.19 | 11,056 | 3.06 |
X | 131,245,000 | 34,842 | 3.77 | 20,400 | 6.43 |
Y | 21,753,000 | 4,193 | 5.19 | 1,784 | 12.19 |
RefSeq | 15,696,674 | 14,534 | 1.08 | - | - |
Totals | 2,710,164,000 | 1,419,190 | 1.91 | 887,450 | 3.05 |
Nomenclature
The nomenclature for SNPs can be confusing: several variations can exist for an individual SNP, and consensus has not yet been achieved.The rs### standard is that which has been adopted by dbSNP and uses the prefix "rs", for "reference SNP", followed by a unique and arbitrary number. SNPs are frequently referred to by their dbSNP rs number, as in the examples above.
The Human Genome Variation Society uses a standard which conveys more information about the SNP. Examples are:
- c.76A>T: "c." for coding region, followed by a number for the position of the nucleotide, followed by a one-letter abbreviation for the nucleotide, followed by a greater than sign to indicate substitution, followed by the abbreviation of the nucleotide which replaces the former
- p.Ser123Arg: "p." for protein, followed by a three-letter abbreviation for the amino acid, followed by a number for the position of the amino acid, followed by the abbreviation of the amino acid which replaces the former.
SNP analysis
- DNA sequencing;
- capillary electrophoresis;
- mass spectrometry;
- single-strand conformation polymorphism ;
- single-base extension;
- electrochemical analysis;
- denaturating HPLC and gel electrophoresis;
- restriction fragment length polymorphism;
- hybridization analysis;
Programs for prediction of SNP effects
- This program provides insight into how a laboratory induced missense or nonsynonymous mutation will affect protein function based on physical properties of the amino acid and sequence homology.
- estimates the potential deleteriousness of mutations resulted from altering their protein functions. It is based on the assumption that variations observed in closely related species are more significant when assessing conservation compared to those in distantly related species.
- MutationTaster:
- from the Ensembl project
- This program provides a 3D representation of the protein affected, highlighting the amino acid change so doctors can determine pathogenicity of the mutant protein.
- is a database which maps variants to experimental and predicted protein structures.
- is a tool which provides a stereochemical report on the effect of missense variants on protein structure.