DNA binding site

DNA binding sites are a type of binding site found in DNA where other molecules may bind. DNA binding sites are distinct from other binding sites in that they are part of a DNA sequence and they are bound by DNA-binding proteins. DNA binding sites are often associated with specialized proteins known as transcription factors, and are thus linked to transcriptional regulation. The sum of DNA binding sites of a specific transcription factor is referred to as its cistrome. DNA binding sites also encompasses the targets of other proteins, like restriction enzymes, site-specific recombinases and methyltransferases.
DNA binding sites can be thus defined as short DNA sequences that are specifically bound by one or more DNA-binding proteins or protein complexes. It has been reported that some binding sites have potential to undergo fast evolutionary change.

Types of DNA binding sites

DNA binding sites can be categorized according to their biological function. Thus, we can distinguish between transcription factor-binding sites, restriction sites and recombination sites. Some authors have proposed that binding sites could also be classified according to their most convenient mode of representation. On the one hand, restriction sites can be generally represented by consensus sequences. This is because they target mostly identical sequences and restriction efficiency decreases abruptly for less similar sequences. On the other hand, DNA binding sites for a given transcription factor are usually all different, with varying degrees of affinity of the transcription factor for the different binding sites. This makes it difficult to accurately represent transcription factor binding sites using consensus sequences, and they are typically represented using position specific frequency matrices, which are often graphically depicted using sequence logos. This argument, however, is partly arbitrary. Restriction enzymes, like transcription factors, yield a gradual, though sharp, range of affinities for different sites and are thus also best represented by PSFM. Likewise, site-specific recombinases also show a varied range of affinities for different target sites.

History and main experimental techniques

The existence of something akin to DNA binding sites was suspected from the experiments on the biology of the bacteriophage lambda and the regulation of the Escherichia coli lac operon. DNA binding sites were finally confirmed in both systems with the advent of DNA sequencing techniques. From then on, DNA binding sites for many transcription factors, restriction enzymes and site-specific recombinases have been discovered using a profusion of experimental methods. Historically, the experimental techniques of choice to discover and analyze DNA binding sites have been the DNAse footprinting assay and the Electrophoretic Mobility Shift Assay. However, the development of DNA microarrays and fast sequencing techniques has led to new, massively parallel methods for in-vivo identification of binding sites, such as ChIP-chip and ChIP-Seq. To quantify the binding affinity of proteins and other molecules to specific DNA binding sites the biophysical method Microscale Thermophoresis is used.

Databases

Due to the diverse nature of the experimental techniques used in determining binding sites and to the patchy coverage of most organisms and transcription factors, there is no central database for DNA binding sites. Even though NCBI contemplates DNA binding site annotation in its reference sequences, most submissions omit this information. Moreover, due to the limited success of bioinformatics in producing efficient DNA binding site prediction tools, there has been no systematic effort to computationally annotate these features in sequenced genomes.
There are, however, several private and public databases devoted to compilation of experimentally reported, and sometimes computationally predicted, binding sites for different transcription factors in different organisms. Below is a non-exhaustive table of available databases:

Name	Organisms	Source	Access	URL
PlantRegMap	165 plant species	Expert curation and projection	Public
JASPAR	Vertebrates, Plants, Fungi, Flies, and Worms	Expert curation with literature support	Public
CIS-BP	All Eukaryotes	Experimentally derived motifs and predictions	Public
CollecTF	Prokaryotes	Literature curation	Public
RegPrecise	Prokaryotes	Expert curation	Public
RegTransBase	Prokaryotes	Expert/literature curation	Public
RegulonDB	Escherichia coli	Expert curation	Public
PRODORIC	Prokaryotes	Expert curation	Public
TRANSFAC	Mammals	Expert/literature curation	Public/Private
TRED	Human, Mouse, Rat	Computer predictions, manual curation	Public
DBSD	Drosophila species	Literature/Expert curation	Public
HOCOMOCO	Human, Mouse	Literature/Expert curation	Public	,
MethMotif	Human, Mouse	Expert curation	Public

Representation of DNA binding sites

A collection of DNA binding sites, typically referred to as a DNA binding motif, can be represented by a consensus sequence. This representation has the advantage of being compact, but at the expense of disregarding a substantial amount of information. A more accurate way of representing binding sites is through Position Specific Frequency Matrices. These matrices give information on the frequency of each base at each position of the DNA binding motif. PSFM are usually conceived with the implicit assumption of positional independence, although this assumption has been disputed for some DNA binding sites. Frequency information in a PSFM can be formally interpreted under the framework of Information Theory, leading to its graphical representation as a sequence logo.

	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16
A	1	0	1	5	32	5	35	23	34	14	43	13	34	4	52	3
C	50	1	0	1	5	6	0	4	4	13	3	8	17	51	2	0
G	0	0	54	15	5	5	12	2	7	1	1	3	1	0	1	52
T	5	55	1	35	14	40	9	27	11	28	9	32	4	1	1	1
Sum	56	56	56	56	56	56	56	56	56	56	56	56	56	56	56	56

PSFM for the transcriptional repressor LexA as derived from 56 LexA-binding sites stored in Prodoric. Relative frequencies are obtained by dividing the counts in each cell by the total count

Computational search and discovery of binding sites

In bioinformatics, one can distinguish between two separate problems regarding DNA binding sites: searching for additional members of a known DNA binding motif and discovering novel DNA binding motifs in collections of functionally related sequences. Many different methods have been proposed to search for binding sites. Most of them rely on the principles of information theory and have available web servers, while other authors have resorted to machine learning methods, such as artificial neural networks. A plethora of algorithms is also available for sequence motif discovery. These methods rely on the hypothesis that a set of sequences share a binding motif for functional reasons. Binding motif discovery methods can be divided roughly into enumerative, deterministic and stochastic. MEME and Consensus are classical examples of deterministic optimization, while the Gibbs sampler is the conventional implementation of a purely stochastic method for DNA binding motif discovery. Another instance of this class of methods is SeSiMCMC that is focused of weak TFBS sites with symmetry. While enumerative methods often resort to regular expression representation of binding sites, PSFM and their formal treatment under Information Theory methods are the representation of choice for both deterministic and stochastic methods. Hybrid methods, e.g. ChIPMunk that combines greedy optimization with subsampling, also use PSFM. Recent advances in sequencing have led to the introduction of comparative genomics approaches to DNA binding motif discovery, as exemplified by PhyloGibbs.
More complex methods for binding site search and motif discovery rely on the base stacking and other interactions between DNA bases, but due to the small sample sizes typically available for binding sites in DNA, their efficiency is still not completely harnessed. An example of such tool is the

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...

	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16
A	1	0	1	5	32	5	35	23	34	14	43	13	34	4	52	3
C	50	1	0	1	5	6	0	4	4	13	3	8	17	51	2	0
G	0	0	54	15	5	5	12	2	7	1	1	3	1	0	1	52
T	5	55	1	35	14	40	9	27	11	28	9	32	4	1	1	1
Sum	56	56	56	56	56	56	56	56	56	56	56	56	56	56	56	56

	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16
A	1	0	1	5	32	5	35	23	34	14	43	13	34	4	52	3
C	50	1	0	1	5	6	0	4	4	13	3	8	17	51	2	0
G	0	0	54	15	5	5	12	2	7	1	1	3	1	0	1	52
T	5	55	1	35	14	40	9	27	11	28	9	32	4	1	1	1
Sum	56	56	56	56	56	56	56	56	56	56	56	56	56	56	56	56

	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16
A	1	0	1	5	32	5	35	23	34	14	43	13	34	4	52	3
C	50	1	0	1	5	6	0	4	4	13	3	8	17	51	2	0
G	0	0	54	15	5	5	12	2	7	1	1	3	1	0	1	52
T	5	55	1	35	14	40	9	27	11	28	9	32	4	1	1	1
Sum	56	56	56	56	56	56	56	56	56	56	56	56	56	56	56	56