Chromosome conformation capture


Chromosome conformation capture techniques are a set of molecular biology methods used to analyze the spatial organization of chromatin in a cell. These methods quantify the number of interactions between genomic loci that are nearby in 3-D space, but may be separated by many nucleotides in the linear genome. Such interactions may result from biological functions, such as promoter-enhancer interactions, or from random polymer looping, where undirected physical motion of chromatin causes loci to collide. Interaction frequencies may be analyzed directly, or they may be converted to distances and used to reconstruct 3-D structures.
The chief difference between 3C-based methods is their scope. For example, when using PCR to detect interaction in a 3C experiment, the interactions between two specific fragments are quantified. In contrast, Hi-C quantifies interactions between all possible pairs of fragments simultaneously. Deep sequencing of material produced by 3C also produces genome-wide interactions maps.

History

Historically, microscopy was the primary method of investigating nuclear organization, which can be dated back to 1590.
All 3C methods start with a similar set of steps, performed on a sample of cells.
First, the cell genomes are cross-linked with formaldehyde, which introduces bonds that "freeze" interactions between genomic loci. Treatment of cells with 1-3% formaldehyde, for 10-30min at room temperature is most common, however, standardization for preventing high protein-DNA cross linking is necessary, as this may negatively affect the efficiency of restriction digestion in the subsequent step. The genome is then cut into fragments with a restriction endonuclease. The size of restriction fragments determines the resolution of interaction mapping. Restriction enzymes that make cuts on 6bp recognition sequences, such as EcoR1 or HindIII, are used for this purpose, as they cut the genome once every 4000bp, giving ~ 1 million fragments in the human genome. For more precise interaction mapping, a 4bp recognizing RE may also be used. The next step is, proximity based ligation. This takes place at low DNA concentrations or within intact, permeabilized nuclei in the presence of T4 DNA ligase, such that ligation between cross-linked interacting fragments is favored over ligation between fragments that are not cross-linked. Subsequently, interacting loci are quantified by amplifying ligated junctions by PCR methods.

Original methods

3C (one-vs-one)

The chromosome conformation capture experiment quantifies interactions between a single pair of genomic loci. For example, 3C can be used to test a candidate promoter-enhancer interaction. Ligated fragments are detected using PCR with known primers. That is why this technique requires the prior knowledge of the interacting regions.

4C (one-vs-all)

Chromosome conformation capture-on-chip captures interactions between one locus and all other genomic loci. It involves a second ligation step, to create self-circularized DNA fragments, which are used to perform inverse PCR. Inverse PCR allows the known sequence to be used to amplify the unknown sequence ligated to it. In contrast to 3C and 5C, the 4C technique does not require the prior knowledge of both interacting chromosomal regions. Results obtained using 4C are highly reproducible with most of the interactions that are detected between regions proximal to one another. On a single microarray, approximately a million interactions can be analyzed.

5C (many-vs-many)

Chromosome conformation capture carbon copy detects interactions between all restriction fragments within a given region, with this region's size typically no greater than a megabase. This is done by ligating universal primers to all fragments. However, 5C has relatively low coverage. The 5C technique overcomes the junctional problems at the intramolecular ligation step and is useful for constructing complex interactions of specific loci of interest. This approach is unsuitable for conducting genome-wide complex interactions since that will require millions of 5C primers to be used.

Hi-C (all-vs-all)

Hi-C uses high-throughput sequencing to find the nucleotide sequence of fragments and uses paired end sequencing, which retrieves a short sequence from each end of each ligated fragment. As such, for a given ligated fragment, the two sequences obtained should represent two different restriction fragments that were ligated together in the proximity based ligation step. The pair of sequences are individually aligned to the genome, thus determining the fragments involved in that ligation event. Hence, all possible pairwise interactions between fragments are tested.

Sequence capture-based methods

A number of methods use oligonucleotide capture to enrich 3C and Hi-C libraries for specific loci of interest. These methods include Capture-C, NG Capture-C, Capture-3C, and Capture Hi-C. These methods are able to produce higher resolution and sensitivity than 4C based methods.

Single-cell methods

Single-cell adaptations of these methods, such as ChIP-seq and Hi-C can be used to investigate the interactions occurring in individual cells.

Immunoprecipitation-based methods

ChIP-loop

ChIP-loop combines 3C with ChIP-seq to detect interactions between two loci of interest mediated by a protein of interest. The ChIP-loop may be useful in identifying long-range cis-interactions and trans interaction mediated through proteins since frequent DNA collisions will not occur.

Genome wide methods

combines Hi-C with ChIP-seq to detect all interactions mediated by a protein of interest. HiChIP was designed to allow similar analysis as ChIA-PET with less input material.

Biological impact

3C methods have led to a number of biological insights, including the discovery of new structural features of chromosomes, the cataloguing of chromatin loops, and increased understanding of transcriptional regulation mechanisms.
3C methods have demonstrated the importance of spatial proximity of regulatory elements to the genes that they regulate. For example, in tissues that express globin genes, the β-globin locus control region forms a loop with these genes. This loop is not found in tissues where the gene is not expressed. This technology has further aided the genetic and epigenetic study of chromosomes both in model organisms and in humans.
These methods have revealed large-scale organization of the genome into topologically associating domains, which correlate with epigenetic markers. Some TADs are transcriptionally active, while others are repressed. Many TADs have been found in D. melanogaster, mouse and human. Moreover, CTCF and cohesion play important roles in determining TADs and enhancer-promoter interactions. The result shows that the orientation of CTCF binding motifs in an enhancer-promoter loop should be facing to each other in order for the enhancer to find its correct target.

Human disease

There are several diseases caused by defects in promoter-enhancer interactions, which is reviewed in this paper.
Beta thalassemia is a certain type of blood disorders caused by a deletion of LCR enhancer element.
Holoprosencephaly is cephalic disorder caused by a mutation in the SBE2 enhancer element, which in turn weakened the production of SHH gene.
PPD2 is caused by a mutation of ZRS enhancer, which in turn strengthened the production of SHH gene.
Adenocarcinoma of the lung can be caused by a duplication of enhancer element for MYC gene.
T-cell acute lymphoblastic leukemia is caused by an introduction of a new enhancer.

Data analysis

The different 3C-style experiments produce data with very different structures and statistical properties. As such, specific analysis packages exist for each experiment type.
Hi-C data is often used to analyze genome-wide chromatin organization, such as topologically associating domains, linearly contiguous regions of the genome that are associated in 3-D space. Several algorithms have been developed to identify TADs from Hi-C data.
Hi-C and its subsequent analyses are evolving. Fit-Hi-C is a method based on a discrete binning approach with modifications of adding distance of interaction and refining the null model. The result of Fit-Hi-C is a list of pairwise intra-chromosomal interactions with their p-values and q-values.
The 3-D organization of the genome can also be analyzed via eigendecomposition of the contact matrix. Each eigenvector corresponds to a set of loci, which are not necessarily linearly contiguous, that share structural features.
A significant confounding factor in 3C technologies is the frequent non-specific interactions between genomic loci that occur due to random polymer behavior. An interaction between two loci must be confirmed as specific through statistical significance testing.

Normalization of Hi-C contact map

There are two major ways of normalizing raw Hi-C contact heat maps. The first way is to assume equal visibility, meaning there is an equal chance for each chromosomal position to have an interaction. Therefore, the true signal of a Hi-C contact map should be a balanced matrix. An example of algorithms that assumes equal visibility is Sinkhorn-Knopp algorithm, which scales the raw Hi-C contact map into a balanced matrix.
The other way is to assume there is a bias associated with each chromosomal position. The contact map value at each coordinate will be the true signal at that position times bias associated with the two contact positions. An example of algorithms that aim to solve this model of bias is iterative correction, which iteratively regressed out row and column bias from the raw Hi-C contact map. There are a number of software tools available for analysis of Hi-C data.

DNA motif analysis

are specific short DNA sequences, often 8-20 nucleotides in length which are statistically overrepresented in a set of sequences with a common biological function. Currently, regulatory motifs on the long-range chromatin interactions have not been studied extensively. Several studies have focused on elucidating the impact of DNA motifs in promoter-enhancer interactions.
Bailey et al. has identified that ZNF143 motif in the promoter regions provides sequence specificity for promoter-enhancer interactions. Mutation of ZNF143 motif decreased the frequency of promoter-enhancer interactions suggesting that ZNF143 is a novel chromatin-looping factor.
For genome-scale motif analysis, in 2016, Wong et al. reported a list of 19,491 DNA motif pairs for K562 cell line on the promoter-enhancer interactions. As a result, they proposed that motif pairing multiplicity is linked to interaction distance and regulatory region type. In the next year, Wong published another article reporting 18,879 motif pairs in 6 human cell lines. A novel contribution of this work is MotifHyades, a motif discovery tool that can be directly applied to paired sequences.

Cancer genome analysis

The 3C-based techniques can provide insights into the chromosomal rearrangements in the cancer genomes. Moreover, they can show changes of spatial proximity for regulatory elements and their target genes, which bring deeper understanding of the structural and functional basis of the genome.