Statistical potential

In protein structure prediction, a statistical potential or knowledge-based potential is an energy function derived from an analysis of known protein structures in the Protein Data Bank.
Many methods exist to obtain such potentials; two notable methods are the quasi-chemical approximation and the potential of mean force. Although the obtained energies are often considered as approximations of the free energy, this physical interpretation is incorrect. Nonetheless, they have been applied with a limited success in many cases because they frequently correlate with actual free energy differences.

Assigning an energy

Possible features to which an energy can be assigned include torsion angles, solvent exposure or hydrogen bond geometry. The classic application of such potentials is however pairwise amino acid contacts or distances. For pairwise amino acid contacts, a statistical potential is formulated as an interaction matrix that assigns a weight or energy value to each possible pair of standard amino acids. The energy of a particular structural model is then the combined energy of all pairwise contacts in the structure. The energies are determined using statistics on amino acid contacts in a database of known protein structures.

Sippl's potential of mean force

Overview

Many textbooks present the potentials of mean force as proposed by Sippl as a simple consequence of the Boltzmann distribution, as applied to pairwise distances between amino acids. This is incorrect, but a useful start to introduce the construction of the potential in practice.
The Boltzmann distribution applied to a specific pair of amino acids,
is given by:
where is the distance, is the Boltzmann constant, is
the temperature and is the partition function, with
The quantity is the free energy assigned to the pairwise system.
Simple rearrangement results in the inverse Boltzmann formula,
which expresses the free energy as a function of :
To construct a PMF, one then introduces a so-called reference
state with a corresponding distribution and partition function
, and calculates the following free energy difference:
The reference state typically results from a hypothetical
system in which the specific interactions between the amino acids
are absent. The second term involving and
can be ignored, as it is a constant.
In practice, is estimated from the database of known protein
structures, while typically results from calculations
or simulations. For example, could be the conditional probability
of finding the atoms of a valine and a serine at a given
distance from each other, giving rise to the free energy difference
. The total free energy difference of a protein,
, is then claimed to be the sum
of all the pairwise free energies:
where the sum runs over all amino acid pairs
and is their corresponding distance. In many studies does not depend on the amino
acid sequence.
Intuitively, it is clear that a low value for indicates
that the set of distances in a structure is more likely in proteins than
in the reference state. However, the physical meaning of these PMFs have
been widely disputed since their introduction. The main issues are the interpretation of this "potential" as a true, physically valid potential of mean force, the nature of the reference state and its optimal formulation, and the validity of generalizations beyond pairwise distances.

Justification

Analogy with liquid systems

The first, qualitative justification of PMFs is due to Sippl, and
based on an analogy with the statistical physics of liquids.
For liquids,
the potential of mean force is related to the radial distribution function, which is given by:
where and are the respective probabilities of
finding two particles at a distance from each other in the liquid
and in the reference state. For liquids, the reference state
is clearly defined; it corresponds to the ideal gas, consisting of
non-interacting particles. The two-particle potential of mean force
is related to by:
According to the reversible work theorem, the two-particle
potential of mean force is the reversible work required to
bring two particles in the liquid from infinite separation to a distance
from each other.
Sippl justified the use of PMFs - a few years after he introduced
them for use in protein structure prediction - by
appealing to the analogy with the reversible work theorem for liquids. For liquids, can be experimentally measured
using small angle X-ray scattering; for proteins, is obtained
from the set of known protein structures, as explained in the previous
section. However, as Ben-Naim writes in a publication on the subject:

the quantities, referred to as `statistical potentials,' `structure
based potentials,' or `pair potentials of mean force', as derived from
the protein data bank, are neither `potentials' nor `potentials of
mean force,' in the ordinary sense as used in the literature on
liquids and solutions.

Another issue is that the analogy does not specify
a suitable reference state for proteins.

Analogy with likelihood

Baker and co-workers justified PMFs from a
Bayesian point of view and used these insights in the construction of
the coarse grained ROSETTA energy function. According
to Bayesian probability calculus, the conditional probability of a structure, given the amino acid sequence, can be
written as:
is proportional to the product of
the likelihood times the prior
. By assuming that the likelihood can be approximated
as a product of pairwise probabilities, and applying Bayes' theorem, the
likelihood can be written as:
where the product runs over all amino acid pairs , and is the distance between amino acids and.
Obviously, the negative of the logarithm of the expression
has the same functional form as the classic
pairwise distance PMFs, with the denominator playing the role of the
reference state. This explanation has two shortcomings: it is purely qualitative,
and relies on the unfounded assumption the likelihood can be expressed
as a product of pairwise probabilities.

Applications

Statistical potentials are used as energy functions in the assessment of an ensemble of structural models produced by homology modeling or protein threading - predictions for the tertiary structure assumed by a particular amino acid sequence made on the basis of comparisons to one or more homologous proteins with known structure. Many differently parameterized statistical potentials have been shown to successfully identify the native state structure from an ensemble of "decoy" or non-native structures. Statistical potentials are not only used for protein structure prediction, but also for modelling the protein folding pathway.

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...