Transcription factor
In molecular biology, a transcription factor is a protein that controls the rate of transcription of genetic information from DNA to messenger RNA, by binding to a specific DNA sequence. The function of TFs is to regulate—turn on and off—genes in order to make sure that they are expressed in the right cell at the right time and in the right amount throughout the life of the cell and the organism. Groups of TFs function in a coordinated fashion to direct cell division, cell growth, and cell death throughout life; cell migration and organization during embryonic development; and intermittently in response to signals from outside the cell, such as a hormone. There are up to 1600 TFs in the human genome.
TFs work alone or with other proteins in a complex, by promoting, or blocking the recruitment of RNA polymerase to specific genes.
A defining feature of TFs is that they contain at least one DNA-binding domain, which attaches to a specific sequence of DNA adjacent to the genes that they regulate. TFs are grouped into classes based on their DBDs. Other proteins such as coactivators, chromatin remodelers, histone acetyltransferases, histone deacetylases, kinases, and methylases are also essential to gene regulation, but lack DNA-binding domains, and therefore are not TFs.
TFs are of interest in medicine because TF mutations can cause specific diseases, and medications can be potentially targeted toward them.
Number
Transcription factors are essential for the regulation of gene expression and are, as a consequence, found in all living organisms. The number of transcription factors found within an organism increases with genome size, and larger genomes tend to have more transcription factors per gene.There are approximately 2800 proteins in the human genome that contain DNA-binding domains, and 1600 of these are presumed to function as transcription factors, though other studies indicate it to be a smaller number. Therefore, approximately 10% of genes in the genome code for transcription factors, which makes this family the single largest family of human proteins. Furthermore, genes are often flanked by several binding sites for distinct transcription factors, and efficient expression of each of these genes requires the cooperative action of several different transcription factors. Hence, the combinatorial use of a subset of the approximately 2000 human transcription factors easily accounts for the unique regulation of each gene in the human genome during development.
Mechanism
Transcription factors bind to either enhancer or promoter regions of DNA adjacent to the genes that they regulate. Depending on the transcription factor, the transcription of the adjacent gene is either up- or down-regulated. Transcription factors use a variety of mechanisms for the regulation of gene expression. These mechanisms include:- stabilize or block the binding of RNA polymerase to DNA
- catalyze the acetylation or deacetylation of histone proteins. The transcription factor can either do this directly or recruit other proteins with this catalytic activity. Many transcription factors use one or the other of two opposing mechanisms to regulate transcription:
- * histone acetyltransferase activity – acetylates histone proteins, which weakens the association of DNA with histones, which make the DNA more accessible to transcription, thereby up-regulating transcription
- * histone deacetylase activity – deacetylates histone proteins, which strengthens the association of DNA with histones, which make the DNA less accessible to transcription, thereby down-regulating transcription
- recruit coactivator or corepressor proteins to the transcription factor DNA complex
Function
Basal transcription regulation
In eukaryotes, an important class of transcription factors called general transcription factors are necessary for transcription to occur. Many of these GTFs do not actually bind DNA, but rather are part of the large transcription preinitiation complex that interacts with RNA polymerase directly. The most common GTFs are TFIIA, TFIIB, TFIID, TFIIE, TFIIF, and TFIIH. The preinitiation complex binds to promoter regions of DNA upstream to the gene that they regulate.Differential enhancement of transcription
Other transcription factors differentially regulate the expression of various genes by binding to enhancer regions of DNA adjacent to regulated genes. These transcription factors are critical to making sure that genes are expressed in the right cell at the right time and in the right amount, depending on the changing requirements of the organism.Development
Many transcription factors in multicellular organisms are involved in development. Responding to stimuli, these transcription factors turn on/off the transcription of the appropriate genes, which, in turn, allows for changes in cell morphology or activities needed for cell fate determination and cellular differentiation. The Hox transcription factor family, for example, is important for proper body pattern formation in organisms as diverse as fruit flies to humans. Another example is the transcription factor encoded by the Sex-determining Region Y gene, which plays a major role in determining sex in humans.Response to intercellular signals
Cells can communicate with each other by releasing molecules that produce signaling cascades within another receptive cell. If the signal requires upregulation or downregulation of genes in the recipient cell, often transcription factors will be downstream in the signaling cascade. Estrogen signaling is an example of a fairly short signaling cascade that involves the estrogen receptor transcription factor: Estrogen is secreted by tissues such as the ovaries and placenta, crosses the cell membrane of the recipient cell, and is bound by the estrogen receptor in the cell's cytoplasm. The estrogen receptor then goes to the cell's nucleus and binds to its DNA-binding sites, changing the transcriptional regulation of the associated genes.Response to environment
Not only do transcription factors act downstream of signaling cascades related to biological stimuli but they can also be downstream of signaling cascades involved in environmental stimuli. Examples include heat shock factor, which upregulates genes necessary for survival at higher temperatures, hypoxia inducible factor, which upregulates genes necessary for cell survival in low-oxygen environments, and sterol regulatory element binding protein, which helps maintain proper lipid levels in the cell.Cell cycle control
Many transcription factors, especially some that are proto-oncogenes or tumor suppressors, help regulate the cell cycle and as such determine how large a cell will get and when it can divide into two daughter cells. One example is the Myc oncogene, which has important roles in cell growth and apoptosis.Pathogenesis
Transcription factors can also be used to alter gene expression in a host cell to promote pathogenesis. A well studied example of this are the transcription-activator like effectors secreted by Xanthomonas bacteria. When injected into plants, these proteins can enter the nucleus of the plant cell, bind plant promoter sequences, and activate transcription of plant genes that aid in bacterial infection. TAL effectors contain a central repeat region in which there is a simple relationship between the identity of two critical residues in sequential repeats and sequential DNA bases in the TAL effector's target site. This property likely makes it easier for these proteins to evolve in order to better compete with the defense mechanisms of the host cell.Regulation
It is common in biology for important processes to have multiple layers of regulation and control. This is also true with transcription factors: Not only do transcription factors control the rates of transcription to regulate the amounts of gene products available to the cell but transcription factors themselves are regulated. Below is a brief synopsis of some of the ways that the activity of transcription factors can be regulated:Synthesis
Transcription factors are transcribed from a gene on a chromosome into RNA, and then the RNA is translated into protein. Any of these steps can be regulated to affect the production of a transcription factor. An implication of this is that transcription factors can regulate themselves. For example, in a negative feedback loop, the transcription factor acts as its own repressor: If the transcription factor protein binds the DNA of its own gene, it down-regulates the production of more of itself. This is one mechanism to maintain low levels of a transcription factor in a cell.Nuclear localization
In eukaryotes, transcription factors are transcribed in the nucleus but are then translated in the cell's cytoplasm. Many proteins that are active in the nucleus contain nuclear localization signals that direct them to the nucleus. But, for many transcription factors, this is a key point in their regulation. Important classes of transcription factors such as some nuclear receptors must first bind a ligand while in the cytoplasm before they can relocate to the nucleus.Activation
Transcription factors may be activated through their signal-sensing domain by a number of mechanisms including:- ligand binding – Not only is ligand binding able to influence where a transcription factor is located within a cell but ligand binding can also affect whether the transcription factor is in an active state and capable of binding DNA or other cofactors.
- phosphorylation – Many transcription factors such as STAT proteins must be phosphorylated before they can bind DNA.
- interaction with other transcription factors or coregulatory proteins
Accessibility of DNA-binding site
Availability of other cofactors/transcription factors
Most transcription factors do not work alone. Many large TF families form complex homotypic or heterotypic interactions through dimerization. For gene transcription to occur, a number of transcription factors must bind to DNA regulatory sequences. This collection of transcription factors, in turn, recruit intermediary proteins such as cofactors that allow efficient recruitment of the preinitiation complex and RNA polymerase. Thus, for a single transcription factor to initiate transcription, all of these other proteins must also be present, and the transcription factor must be in a state where it can bind to them if necessary.Cofactors are proteins that modulate the effects of transcription factors. Cofactors are interchangeable between specific gene promoters; the protein complex that occupies the promoter DNA and the amino acid sequence of the cofactor determine its spatial conformation. For example, certain steroid receptors can exchange cofactors with NF-κB, which is a switch between inflammation and cellular differentiation; thereby steroids can affect the inflammatory response and function of certain tissues.
Interaction with methylated cytosine
Transcription factors and methylated cytosines in DNA both have major roles in regulating gene expression. Methylation of CpG sites in a promoter region of a gene usually represses gene transcription, while methylation of CpGs in the body of a gene increases expression. TET enzymes play a central role in demethylation of methylated cytosines. Demethylation of CpGs in a gene promoter by TET enzyme activity increases transcription of the gene.The DNA binding sites of 519 transcription factors were evaluated. Of these, 169 transcription factors did not have CpG dinucleotides in their binding sites, and 33 transcription factors could bind to a CpG-containing motif but did not display a preference for a binding site with either a methylated or unmethylated CpG. There were 117 transcription factors that were inhibited from binding to their binding sequence if it contained a methylated CpG site, 175 transcription factors that had enhanced binding if their binding sequence had a methylated CpG site, and 25 transcription factors were either inhibited or had enhanced binding depending on where in the binding sequence the methylated CpG was located.
TET enzymes do not specifically bind to methylcytosine except when recruited. Multiple transcription factors important in cell differentiation and lineage specification, including NANOG, SALL4A, WT1, EBF1, PU.1, and E2A, have been shown to recruit TET enzymes to specific genomic loci to act on methylcytosine and convert it to hydroxymethylcytosine hmC. TET-mediated conversion of mC to hmC appears to disrupt the binding of 5mC-binding proteins including MECP2 and MBD proteins, facilitating nucleosome remodeling and the binding of transcription factors, thereby activating transcription of those genes. EGR1 is an important transcription factor in memory formation. It has an essential role in brain neuron epigenetic reprogramming. The transcription factor EGR1 recruits the TET1 protein that initiates a pathway of DNA demethylation. EGR1, together with TET1, is employed in programming the distribution of methylation sites on brain DNA during brain development and in learning.
Structure
Transcription factors are modular in structure and contain the following domains:- DNA-binding domain, which attaches to specific sequences of DNA adjacent to regulated genes. DNA sequences that bind transcription factors are often referred to as response elements.
- Activation domain, which contains binding sites for other proteins such as transcription coregulators. These binding sites are frequently referred to as activation functions, Transactivation domain or Trans-activating domain TAD but not mix with topologically associating domain TAD.
- An optional signal-sensing domain , which senses external signals and, in response, transmits these signals to the rest of the transcription complex, resulting in up- or down-regulation of gene expression. Also, the DBD and signal-sensing domains may reside on separate proteins that associate within the transcription complex to regulate gene expression.
DNA-binding domain
Family | InterPro | Pfam | SCOP |
basic helix-loop-helix | |||
basic-leucine zipper | |||
C-terminal effector domain of the bipartite response regulators | |||
AP2/ERF/GCC box | |||
helix-turn-helix | |||
homeodomain proteins, which are encoded by homeobox genes, are transcription factors. Homeodomain proteins play critical roles in the regulation of development. | |||
lambda repressor-like | |||
srf-like | |||
paired box | |||
winged helix | |||
zinc fingers | |||
* multi-domain Cys2His2 zinc fingers | |||
* Zn2/Cys6 | |||
* Zn2/Cys8 nuclear receptor zinc finger |
Response elements
The DNA sequence that a transcription factor binds to is called a transcription factor-binding site or response element.Transcription factors interact with their binding sites using a combination of electrostatic and Van der Waals forces. Due to the nature of these chemical interactions, most transcription factors bind DNA in a sequence specific manner. However, not all bases in the transcription factor-binding site may actually interact with the transcription factor. In addition, some of these interactions may be weaker than others. Thus, transcription factors do not bind just one sequence but are capable of binding a subset of closely related sequences, each with a different strength of interaction.
For example, although the consensus binding site for the TATA-binding protein is TATAAAA, the TBP transcription factor can also bind similar sequences such as TATATAT or TATATAA.
Because transcription factors can bind a set of related sequences and these sequences tend to be short, potential transcription factor binding sites can occur by chance if the DNA sequence is long enough. It is unlikely, however, that a transcription factor will bind all compatible sequences in the genome of the cell. Other constraints, such as DNA accessibility in the cell or availability of cofactors may also help dictate where a transcription factor will actually bind. Thus, given the genome sequence it is still difficult to predict where a transcription factor will actually bind in a living cell.
Additional recognition specificity, however, may be obtained through the use of more than one DNA-binding domain that bind to two or more adjacent sequences of DNA.
Clinical significance
Transcription factors are of clinical significance for at least two reasons: mutations can be associated with specific diseases, and they can be targets of medications.Disorders
Due to their important roles in development, intercellular signaling, and cell cycle, some human diseases have been associated with mutations in transcription factors.Many transcription factors are either tumor suppressors or oncogenes, and, thus, mutations or aberrant regulation of them is associated with cancer. Three groups of transcription factors are known to be important in human cancer: the NF-kappaB and AP-1 families, the STAT family and the steroid receptors.
Below are a few of the better-studied examples:
Condition | Description | Locus |
Rett syndrome | Mutations in the MECP2 transcription factor are associated with Rett syndrome, a neurodevelopmental disorder. | Xq28 |
Diabetes | A rare form of diabetes called MODY can be caused by mutations in hepatocyte nuclear factors or insulin promoter factor-1. | multiple |
Developmental verbal dyspraxia | Mutations in the FOXP2 transcription factor are associated with developmental verbal dyspraxia, a disease in which individuals are unable to produce the finely coordinated movements required for speech. | 7q31 |
Autoimmune diseases | Mutations in the FOXP3 transcription factor cause a rare form of autoimmune disease called IPEX. | Xp11.23-q13.3 |
Li-Fraumeni syndrome | Caused by mutations in the tumor suppressor p53. | 17p13.1 |
Breast cancer | The STAT family is relevant to breast cancer. | multiple |
Multiple cancers | The HOX family are involved in a variety of cancers. | multiple |
Osteoarthritis | Mutation or reduced activity of SOX9 |
Potential drug targets
Approximately 10% of currently prescribed drugs directly target the nuclear receptor class of transcription factors. Examples include tamoxifen and bicalutamide for the treatment of breast and prostate cancer, respectively, and various types of anti-inflammatory and anabolic steroids. In addition, transcription factors are often indirectly modulated by drugs through signaling cascades. It might be possible to directly target other less-explored transcription factors such as NF-κB with drugs. Transcription factors outside the nuclear receptor family are thought to be more difficult to target with small molecule therapeutics since it is not clear that they are "drugable" but progress has been made on Pax2 and the notch pathway.Role in evolution
Gene duplications have played a crucial role in the evolution of species. This applies particularly to transcription factors. Once they occur as duplicates, accumulated mutations encoding for one copy can take place without negatively affecting the regulation of downstream targets. However, changes of the DNA binding specificities of the single-copy LEAFY transcription factor, which occurs in most land plants, have recently been elucidated. In that respect, a single-copy transcription factor can undergo a change of specificity through a promiscuous intermediate without losing function. Similar mechanisms have been proposed in the context of all alternative phylogenetic hypotheses, and the role of transcription factors in the evolution of all species.Analysis
There are different technologies available to analyze transcription factors. On the genomic level, DNA-sequencing and database research are commonly used The protein version of the transcription factor is detectable by using specific antibodies. The sample is detected on a western blot. By using electrophoretic mobility shift assay, the activation profile of transcription factors can be detected. A multiplex approach for activation profiling is a TF chip system where several different transcription factors can be detected in parallel.The most commonly used method for identifying transcription factor binding sites is chromatin immunoprecipitation. This technique relies on chemical fixation of chromatin with formaldehyde, followed by co-precipitation of DNA and the transcription factor of interest using an antibody that specifically targets that protein. The DNA sequences can then be identified by microarray or high-throughput sequencing to determine transcription factor binding sites. If no antibody is available for the protein of interest, DamID may be a convenient alternative.
Classes
As described in more detail below, transcription factors may be classified by their mechanism of action, regulatory function, or sequence homology in their DNA-binding domains.Mechanistic
There are two mechanistic classes of transcription factors:- General transcription factors are involved in the formation of a preinitiation complex. The most common are abbreviated as TFIIA, TFIIB, TFIID, TFIIE, TFIIF, and TFIIH. They are ubiquitous and interact with the core promoter region surrounding the transcription start site of all class II genes.
- Upstream transcription factors are proteins that bind somewhere upstream of the initiation site to stimulate or repress transcription. These are roughly synonymous with specific transcription factors, because they vary considerably depending on what recognition sequences are present in the proximity of the gene.
Functional
- I. constitutively active – present in all cells at all times – general transcription factors, Sp1, NF1, CCAAT
- II. conditionally active – requires activation
- * II.A developmental – expression is tightly controlled, but, once expressed, require no additional activation – GATA, HNF, PIT-1, MyoD, Myf5, Hox, Winged Helix
- * II.B signal-dependent – requires external signal for activation
- ** II.B.1 extracellular ligand -dependent – nuclear receptors
- ** II.B.2 intracellular ligand -dependent - activated by small intracellular molecules – SREBP, p53, orphan nuclear receptors
- ** II.B.3 cell membrane receptor-dependent – second messenger signaling cascades resulting in the phosphorylation of the transcription factor
- *** II.B.3.a resident nuclear factors – reside in the nucleus regardless of activation state – CREB, AP-1, Mef2
- *** II.B.3.b latent cytoplasmic factors – inactive form reside in the cytoplasm, but, when activated, are translocated into the nucleus – STAT, R-SMAD, NF-κB, Notch, TUBBY, NFAT
Structural
- 1 Superclass: Basic Domains
- *1.1 Class: Leucine zipper factors
- **1.1.1 Family: AP-1 components; includes
- **1.1.2 Family: CREB
- **1.1.3 Family: C/EBP-like factors
- **1.1.4 Family: bZIP / PAR
- **1.1.5 Family: Plant G-box binding factors
- **1.1.6 Family: ZIP only
- *1.2 Class: Helix-loop-helix factors
- **1.2.1 Family: Ubiquitous factors
- **1.2.2 Family: Myogenic transcription factors
- **1.2.3 Family: Achaete-Scute
- **1.2.4 Family: Tal/Twist/Atonal/Hen
- *1.3 Class: Helix-loop-helix / leucine zipper factors
- **1.3.1 Family: Ubiquitous bHLH-ZIP factors; includes USF ; SREBP
- **1.3.2 Family: Cell-cycle controlling factors; includes c-Myc
- *1.4 Class: NF-1
- **1.4.1 Family: NF-1
- *1.5 Class: RF-X
- **1.5.1 Family: RF-X
- *1.6 Class: bHSH
- 2 Superclass: Zinc-coordinating DNA-binding domains
- *2.1 Class: Cys4 zinc finger of nuclear receptor type
- **2.1.1 Family: Steroid hormone receptors
- **2.1.2 Family: Thyroid hormone receptor-like factors
- *2.2 Class: diverse Cys4 zinc fingers
- **2.2.1 Family: GATA-Factors
- *2.3 Class: Cys2His2 zinc finger domain
- **2.3.1 Family: Ubiquitous factors, includes TFIIIA, Sp1
- **2.3.2 Family: Developmental / cell cycle regulators; includes Krüppel
- **2.3.4 Family: Large factors with NF-6B-like binding properties
- *2.4 Class: Cys6 cysteine-zinc cluster
- *2.5 Class: Zinc fingers of alternating composition
- 3 Superclass: Helix-turn-helix
- *3.1 Class: Homeo domain
- **3.1.1 Family: Homeo domain only; includes Ubx
- **3.1.2 Family: POU domain factors; includes Oct
- **3.1.3 Family: Homeo domain with LIM region
- **3.1.4 Family: homeo domain plus zinc finger motifs
- *3.2 Class: Paired box
- **3.2.1 Family: Paired plus homeo domain
- **3.2.2 Family: Paired domain only
- *3.3 Class: Fork head / winged helix
- **3.3.1 Family: Developmental regulators; includes forkhead
- **3.3.2 Family: Tissue-specific regulators
- **3.3.3 Family: Cell-cycle controlling factors
- **3.3.0 Family: Other regulators
- *3.4 Class: Heat Shock Factors
- **3.4.1 Family: HSF
- *3.5 Class: Tryptophan clusters
- **3.5.1 Family: Myb
- **3.5.2 Family: Ets-type
- **3.5.3 Family: Interferon regulatory factors
- *3.6 Class: TEA domain
- **3.6.1 Family: TEA
- 4 Superclass: beta-Scaffold Factors with Minor Groove Contacts
- *4.1 Class: RHR
- **4.1.1 Family: Rel/ankyrin; NF-kappaB
- **4.1.2 Family: ankyrin only
- **4.1.3 Family: NFAT
- *4.2 Class: STAT
- **4.2.1 Family: STAT
- *4.3 Class: p53
- **4.3.1 Family: p53
- *4.4 Class: MADS box
- **4.4.1 Family: Regulators of differentiation; includes
- **4.4.2 Family: Responders to external signals, SRF
- **4.4.3 Family: Metabolic regulators
- *4.5 Class: beta-Barrel alpha-helix transcription factors
- *4.6 Class: TATA binding proteins
- **4.6.1 Family: TBP
- *4.7 Class: HMG-box
- **4.7.1 Family: SOX genes, SRY
- **4.7.2 Family: TCF-1
- **4.7.3 Family: HMG2-related, SSRP1
- **4.7.4 Family: UBF
- **4.7.5 Family: MATA
- *4.8 Class: Heteromeric CCAAT factors
- **4.8.1 Family: Heteromeric CCAAT factors
- *4.9 Class: Grainyhead
- **4.9.1 Family: Grainyhead
- *4.10 Class: Cold-shock domain factors
- **4.10.1 Family: csd
- *4.11 Class: Runt
- **4.11.1 Family: Runt
- 0 Superclass: Other Transcription Factors
- *0.1 Class: Copper fist proteins
- *0.2 Class: HMGI
- **0.2.1 Family: HMGI
- *0.3 Class: Pocket domain
- *0.4 Class: E1A-like factors
- *0.5 Class: AP2/EREBP-related factors
- **0.5.1 Family: AP2
- **0.5.2 Family: EREBP
- **0.5.3 Superfamily: AP2/B3
- ***0.5.3.1 Family: ARF
- ***0.5.3.2 Family: ABI
- ***0.5.3.3 Family: RAV