Proteinogenic amino acid


Proteinogenic amino acids are amino acids that are incorporated biosynthetically into proteins during translation. The word "proteinogenic" means "protein creating". Throughout known life, there are 22 genetically encoded amino acids, 20 in the standard genetic code and an additional 2 that can be incorporated by special translation mechanisms.
In contrast, non-proteinogenic amino acids are amino acids that are either not incorporated into proteins, misincorporated in place of a genetically encoded amino acid, or not produced directly and in isolation by standard cellular machinery. The latter often results from post-translational modification of proteins. Some non-proteinogenic amino acids are incorporated into nonribosomal peptides which are synthesized by non-ribosomal peptide synthetases.
Both eukaryotes and prokaryotes can incorporate selenocysteine into their proteins via a nucleotide sequence known as a SECIS element, which directs the cell to translate a nearby UGA codon as selenocysteine. In some methanogenic prokaryotes, the UAG codon can also be translated to pyrrolysine.
In eukaryotes, there are only 21 proteinogenic amino acids, the 20 of the standard genetic code, plus selenocysteine. Humans can synthesize 12 of these from each other or from other molecules of intermediary metabolism. The other nine must be consumed, and so they are called essential amino acids. The essential amino acids are histidine, isoleucine, leucine, lysine, methionine, phenylalanine, threonine, tryptophan, and valine.
The proteinogenic amino acids have been found to be related to the set of amino acids that can be recognized by ribozyme autoaminoacylation systems. Thus, non-proteinogenic amino acids would have been excluded by the contingent evolutionary success of nucleotide-based life forms. Other reasons have been offered to explain why certain specific non-proteinogenic amino acids are not generally incorporated into proteins; for example, ornithine and homoserine cyclize against the peptide backbone and fragment the protein with relatively short half-lives, while others are toxic because they can be mistakenly incorporated into proteins, such as the arginine analog canavanine.

Structures

The following illustrates the structures and abbreviations of the 21 amino acids that are directly encoded for protein synthesis by the genetic code of eukaryotes. The structures given below are standard chemical structures, not the typical zwitterion forms that exist in aqueous solutions.
values
IUPAC/IUBMB now also recommends standard abbreviations for the following two amino acids:

Chemical properties

Following is a table listing the one-letter symbols, the three-letter symbols, and the chemical properties of the side chains of the standard amino acids. The masses listed are based on weighted averages of the elemental isotopes at their natural abundances. Forming a peptide bond results in elimination of a molecule of water. Therefore, the protein's mass is equal to the mass of amino acids the protein is composed of minus 18.01524 Da per peptide bond.

General chemical properties

Side-chain properties

Amino acidShortSide chainHydro-
phobic
PolarpHSmallTinyAromatic
or Aliphatic
van der Waals
volume
AlanineAAla-CH3--Aliphatic67
CysteineCCys-CH2SH8.55acidic-86
Aspartic acidDAsp-CH2COOH3.67acidic-91
Glutamic acidEGlu-CH2CH2COOH4.25acidic-109
PhenylalanineFPhe-CH2C6H5--Aromatic135
GlycineGGly-H---48
HistidineHHis-CH2-C3H3N26.54weak basicAromatic118
IsoleucineIIle-CHCH2CH3--Aliphatic124
LysineKLys-4NH210.40basic-135
LeucineLLeu-CH2CH2--Aliphatic124
MethionineMMet-CH2CH2SCH3--Aliphatic124
AsparagineNAsn-CH2CONH2---96
PyrrolysineOPyl-4NHCOC4H5NCH3N.D.weak basic-
ProlinePPro-CH2CH2CH2----90
GlutamineQGln-CH2CH2CONH2---114
ArginineRArg-3NH-CNH212.3strongly basic-148
SerineSSer-CH2OH---73
ThreonineTThr-CHCH3---93
SelenocysteineUSec-CH2SeH5.43acidic-
ValineVVal-CH2--Aliphatic105
TryptophanWTrp-CH2C8H6N--Aromatic163
TyrosineYTyr-CH2-C6H4OH9.84weak acidicAromatic141

§: Values for Asp, Cys, Glu, His, Lys & Tyr were determined using the amino acid residue placed centrally in an alanine pentapeptide. The value for Arg is from Pace et al.. The value for Sec is from Byun & Kang.
N.D.: The pKa value of Pyrrolysine has not been reported.
Note: The pKa value of an amino-acid residue in a small peptide is typically slightly different when it is inside a protein. Protein pKa calculations are sometimes used to calculate the change in the pKa value of an amino-acid residue in this situation.

Gene expression and biochemistry

* UAG is normally the amber stop codon, but in organisms containing the biological machinery encoded by the pylTSBCD cluster of genes the amino acid pyrrolysine will be incorporated.

** UGA is normally the opal stop codon, but encodes selenocysteine if a SECIS element is present.

The stop codon is not an amino acid, but is included for completeness.

†† UAG and UGA do not always act as stop codons.

An essential amino acid cannot be synthesized in humans and must, therefore, be supplied in the diet. Conditionally essential amino acids are not normally required in the diet, but must be supplied exogenously to specific populations that do not synthesize it in adequate amounts.

& Occurrence of amino acids is based on 135 Archaea, 3775 Bacteria, 614 Eukaryota proteomes and human proteome respectively.

Mass spectrometry

In mass spectrometry of peptides and proteins, knowledge of the masses of the residues is useful. The mass of the peptide or protein is the sum of the residue masses plus the mass of water. The residue masses are calculated from the tabulated chemical formulas and atomic weights. In mass spectrometry, ions may also include one or more protons.
Amino acidShortFormulaMon. mass§
AlanineAAlaC3H5NO71.0371171.0779
CysteineCCysC3H5NOS103.00919103.1429
Aspartic acidDAspC4H5NO3115.02694115.0874
Glutamic acidEGluC5H7NO3129.04259129.1140
PhenylalanineFPheC9H9NO147.06841147.1739
GlycineGGlyC2H3NO57.0214657.0513
HistidineHHisC6H7N3O137.05891137.1393
IsoleucineIIleC6H11NO113.08406113.1576
LysineKLysC6H12N2O128.09496128.1723
LeucineLLeuC6H11NO113.08406113.1576
MethionineMMetC5H9NOS131.04049131.1961
AsparagineNAsnC4H6N2O2114.04293114.1026
PyrrolysineOPylC12H19N3O2237.14773237.2982
ProlinePProC5H7NO97.0527697.1152
GlutamineQGlnC5H8N2O2128.05858128.1292
ArginineRArgC6H12N4O156.10111156.1857
SerineSSerC3H5NO287.0320387.0773
ThreonineTThrC4H7NO2101.04768101.1039
SelenocysteineUSecC3H5NOSe150.95364150.0489
ValineVValC5H9NO99.0684199.1311
TryptophanWTrpC11H10N2O186.07931186.2099
TyrosineYTyrC9H9NO2163.06333163.1733

§ Monoisotopic mass

Stoichiometry and metabolic cost in cell

The table below lists the abundance of amino acids in E.coli cells and the metabolic cost for synthesis of the amino acids. Negative numbers indicate the metabolic processes are energy favorable and do not cost net ATP of the cell. The abundance of amino acids includes amino acids in free form and in polymerization form.

Remarks

Catabolism

Amino acids can be classified according to the properties of their main products:
*