Protein primary structure

Protein primary structure is the linear sequence of amino acids in a peptide or protein. By convention, the primary structure of a protein is reported starting from the amino-terminal end to the carboxyl-terminal end. Protein biosynthesis is most commonly performed by ribosomes in cells. Peptides can also be synthesized in the laboratory. Protein primary structures can be directly sequenced, or inferred from DNA sequences.

Formation

Biological

Amino acids are polymerised via peptide bonds to form a long backbone, with the different amino acid side chains protruding along it. In biological systems, proteins are produced during translation by a cell's ribosomes. Some organisms can also make short peptides by non-ribosomal peptide synthesis, which often use amino acids other than the standard 20, and may be cyclised, modified and cross-linked.

Chemical

Peptides can be synthesised chemically via a range of laboratory methods. Chemical methods typically synthesise peptides in the opposite order to biological protein synthesis.

Notation

Protein sequence is typically notated as a string of letters, listing the amino acids starting at the amino-terminal end through to the carboxyl-terminal end. Either a three letter code or single letter code can be used to represent the 20 naturally occurring amino acids, as well as mixtures or ambiguous amino acids.
Peptides can be directly sequenced, or inferred from DNA sequences. Large sequence databases now exist that collate known protein sequences.

Amino Acid	3-Letter	1-Letter
Alanine	Ala	A
Arginine	Arg	R
Asparagine	Asn	N
Aspartic acid	Asp	D
Cysteine	Cys	C
Glutamic acid	Glu	E
Glutamine	Gln	Q
Glycine	Gly	G
Histidine	His	H
Isoleucine	Ile	I
Leucine	Leu	L
Lysine	Lys	K
Methionine	Met	M
Phenylalanine	Phe	F
Proline	Pro	P
Serine	Ser	S
Threonine	Thr	T
Tryptophan	Trp	W
Tyrosine	Tyr	Y
Valine	Val	V

Symbol	Description	Residues represented
X	Any amino acid, or unknown	All
B	Aspartate or Asparagine	D, N
Z	Glutamate or Glutamine	E, Q
J	Leucine or Isoleucine	I, L
Φ	Hydrophobic	V, I, L, F, W, M
Ω	Aromatic	F, W, Y, H
Ψ	Aliphatic	V, I, L, M
π	Small	P, G, A, S
ζ	Hydrophilic	S, T, H, N, Q, E, D, K, R, Y
+	Positively charged	K, R, H
-	Negatively charged	D, E

Modification

In general, polypeptides are unbranched polymers, so their primary structure can often be specified by the sequence of amino acids along their backbone. However, proteins can become cross-linked, most commonly by disulfide bonds, and the primary structure also requires specifying the cross-linking atoms, e.g., specifying the cysteines involved in the protein's disulfide bonds. Other crosslinks include desmosine.

Isomerisation

The chiral centers of a polypeptide chain can undergo racemization. Although it does not change the sequence, it does affect the chemical properties of the sequence. In particular, the L-amino acids normally found in proteins can spontaneously isomerize at the atom to form D-amino acids, which cannot be cleaved by most proteases. Additionally, proline can form stable trans-isomers at the peptide bond.

Posttranslational modification

Finally, the protein can undergo a variety of posttranslational modifications, which are briefly summarized here.
The N-terminal amino group of a polypeptide can be modified covalently, e.g.,

acetylation
formylation
pyroglutamate
myristoylation

The C-terminal carboxylate group of a polypeptide can also be modified, e.g.,

amination
glycosyl phosphatidylinositol attachment

Finally, the peptide side chains can also be modified covalently, e.g.,

phosphorylation
glycosylation
deamidation
hydroxylation
methylation
acetylation
sulfation
prenylation and palmitoylation
carboxylation
ADP-ribosylation
ubiquitination and SUMOylation

Most of the polypeptide modifications listed above occur post-translationally, i.e., after the protein has been synthesized on the ribosome, typically occurring in the endoplasmic reticulum, a subcellular organelle of the eukaryotic cell.
Many other chemical reactions have been applied to proteins by chemists, although they are not found in biological systems.

Cleavage and ligation

In addition to those listed above, the most important modification of primary structure is peptide cleavage. Proteins are often synthesized in an inactive precursor form; typically, an N-terminal or C-terminal segment blocks the active site of the protein, inhibiting its function. The protein is activated by cleaving off the inhibitory peptide.
Some proteins even have the power to cleave themselves. Typically, the hydroxyl group of a serine or the thiol group of a cysteine residue will attack the carbonyl carbon of the preceding peptide bond, forming a tetrahedrally bonded intermediate . This intermediate tends to revert to the amide form, expelling the attacking group, since the amide form is usually favored by free energy,. However, additional molecular interactions may render the amide form less stable; the amino group is expelled instead, resulting in an ester or thioester bond in place of the peptide bond. This chemical reaction is called an N-O acyl shift.
The ester/thioester bond can be resolved in several ways:

Simple hydrolysis will split the polypeptide chain, where the displaced amino group becomes the new N-terminus. This is seen in the maturation of glycosylasparaginase.
A β-elimination reaction also splits the chain, but results in a pyruvoyl group at the new N-terminus. This pyruvoyl group may be used as a covalently attached catalytic cofactor in some enzymes, especially decarboxylases such as S-adenosylmethionine decarboxylase that exploit the electron-withdrawing power of the pyruvoyl group.
Intramolecular transesterification, resulting in a branched polypeptide. In inteins, the new ester bond is broken by an intramolecular attack by the soon-to-be C-terminal asparagine.
Intermolecular transesterification can transfer a whole segment from one polypeptide to another, as is seen in the Hedgehog protein autoprocessing.
History

The proposal that proteins were linear chains of α-amino acids was made nearly simultaneously by two scientists at the same conference in 1902, the 74th meeting of the Society of German Scientists and Physicians, held in Karlsbad. Franz Hofmeister made the proposal in the morning, based on his observations of the biuret reaction in proteins. Hofmeister was followed a few hours later by Emil Fischer, who had amassed a wealth of chemical details supporting the peptide-bond model. For completeness, the proposal that proteins contained amide linkages was made as early as 1882 by the French chemist E. Grimaux.
Despite these data and later evidence that proteolytically digested proteins yielded only oligopeptides, the idea that proteins were linear, unbranched polymers of amino acids was not accepted immediately. Some well-respected scientists such as William Astbury doubted that covalent bonds were strong enough to hold such long molecules together; they feared that thermal agitations would shake such long molecules asunder. Hermann Staudinger faced similar prejudices in the 1920s when he argued that rubber was composed of macromolecules.
Thus, several alternative hypotheses arose. The colloidal protein hypothesis stated that proteins were colloidal assemblies of smaller molecules. This hypothesis was disproved in the 1920s by ultracentrifugation measurements by Theodor Svedberg that showed that proteins had a well-defined, reproducible molecular weight and by electrophoretic measurements by Arne Tiselius that indicated that proteins were single molecules. A second hypothesis, the cyclol hypothesis advanced by Dorothy Wrinch, proposed that the linear polypeptide underwent a chemical cyclol rearrangement C=O + HN C-N that crosslinked its backbone amide groups, forming a two-dimensional fabric. Other primary structures of proteins were proposed by various researchers, such as the diketopiperazine model of Emil Abderhalden and the pyrrol/piperidine model of Troensegaard in 1942. Although never given much credence, these alternative models were finally disproved when Frederick Sanger successfully sequenced insulin and by the crystallographic determination of myoglobin and hemoglobin by Max Perutz and John Kendrew.

Primary structure in other molecules

Any linear-chain heteropolymer can be said to have a "primary structure" by analogy to the usage of the term for proteins, but this usage is rare compared to the extremely common usage in reference to proteins. In RNA, which also has extensive secondary structure, the linear chain of bases is generally just referred to as the "sequence" as it is in DNA. Other biological polymers such as polysaccharides can also be considered to have a primary structure, although the usage is not standard.

Relation to secondary and tertiary structure

The primary structure of a biological polymer to a large extent determines the three-dimensional shape. Protein sequence can be used to predict local features, such as segments of secondary structure, or trans-membrane regions. However, the complexity of protein folding currently prohibits predicting the tertiary structure of a protein from its sequence alone. Knowing the structure of a similar homologous sequence allows highly accurate prediction of the tertiary structure by homology modeling. If the full-length protein sequence is available, it is possible to estimate its general biophysical properties, such as its isoelectric point.
Sequence families are often determined by sequence clustering, and structural genomics projects aim to produce a set of representative structures to cover the sequence space of possible non-redundant sequences.

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...