Genome architecture mapping


In molecular biology, genome architecture mapping is a cryosectioning method to map colocalized DNA regions in a ligation independent manner. It overcomes some limitations of Chromosome conformation capture, as these methods have a reliance on digestion and ligation to capture interacting DNA segments. GAM is the first genome-wide method for capturing three-dimensional proximities between any number of genomic loci without ligation.

Principle

Genome Architecture Mapping was developed in the laboratory of Ana Pombo, based on a concept of a theoretical approach for linkage mapping human genome published in 1989, GAM implements the measure of physical distance between genomic regions through cryosectioning and laser microdissection. To learn the interacting of loci in the genome, GAM uses a set of slices that collected from random directions of nuclei. Here is a simple outline for GAM:
First, get an ultra-thin nuclear slice through cryosectioning. Then isolate a single nuclear profile by laser capture microdissection. After that, extract DNA from nuclear profiles and do amplification. Next, identify DNA sequences present in each nuclear slice by Next Generation Sequencing. With these sequence data, plot pair-wise co-segregation matrices to display pairwise chromatin contacts. Use co-segregation tables to perform SLICE analysis to get the probabilities of interaction.

Cryosection and laser microdissection

Cryosections are produced according to the Tokuyasu method, involving stringent fixation to preserve nuclear and cellular architecture, cryoprotection with a sucrose-PBS solution, before freezing in liquid nitrogen. In Genome Architecture Mapping, sectioning is a necessary step for exploring the 3D topology of the genome, before Laser Microdissection. Then laser microdissection can isolate each nuclear profile, before DNA extraction and sequencing.

Data analysis (bioinformatics method)

GAMtools

GAMtools is a collection of software utilities for Genome Architecture Mapping data developed by Robert Beagrie. Bowtie2 is required before running GAMtools. Fastq format data is input file. The program will do sequence mapping first. Then windows calling, producing proximity matrics and quality control checks.

Mapping the sequencing data

The Gamtools use gamtools process_nps command to implement the mapping task. It maps the raw sequence data from the nuclear profiles.

Windows calling

Compute the number of reads from each NP, which overlap with each window in the background genome file. The default window size is 50kb. After this, it generates a segregation table.

Producing proximity matrices

The command for this process is gamtools matrix. The input file is the segregation table that calculated from windows calling.

Performing quality control checks

This function is included in the gamtools process_nps. With the quality control check, the gamtools can exclude poor quality NPs.

SLICE

SLICE plays a key role in GAM data analysis. It was developed in the laboratory of Mario Nicodemi to provide a math model to identify the most specific interactions among loci from GAM cosegregation data. It estimates the proportion of specific interaction for each pair loci at a given time. It is a kind of likelihood method. The first step of SLICE is to provide a function of the expected proportion of GAM nuclear profiles. Then find the best probability result to explain the experimental data.

SLICE Model

The SLICE Model is based on a hypothesis that the probability of non-interacting loci falls into the same NP is predictable. The probability is depended on the distance of these loci.
The SLICE Model considers a pair of loci as two types: one is interacting, the other is non-interacting. As the hypothesis, the proportions of nuclear profiles state can be predicted by mathematical analysis. By deriving a function of the interaction probability, these GAM data can also be used to find prominent interactions and explore the sensitivity of GAM.

Calculate distribution in a single nuclear profile

SLICE considers a pair of loci can be interaction or non-interaction across the cell population. The first step of this calculation is to describe a single locus. We can consider a pair of loci, A and B. They have two possible states: one is that A and B have no interactions with each other. The other is that they have. The first problem is that whether a single locus can be found in a nuclear profile.

Here is the mathematical expression:
Single locus probability:

- <> probability that the locus is found in an NP.

- <><> probability that the locus is not found in an NP.

- <>=

Estimation of average nuclear radius

As the equation above, we find that the volume of the nuclear is a necessary value for calculation. We can use the radii of these NPs to estimate the nuclear radius. The SLICE prediction for radius matches Monte Carlo simulations. With the result of the estimated radius, we can estimate the probability of two loci in a non-interacting state and the probability of these two loci in an interacting state.

Here is the mathematical expression of non-interacting:

<>,i = 0, 1, 2 represents: find 0, 1 or 2 loci of a pair of non-interacting loci.

Two loci in a non-interacting state:



Here is the mathematical expression of interacting:

Estimation of two loci interaction state: probability

~, ~0, ~

Calculate probability of pairs of loci in single NP

With the results of previous processes, the occurrence probability of a pair of loci in one NP can be calculated by statistics method. A pair of loci can exist in three different states.Each of them has a probability of

Occurrence probability of pairs of loci in single nuclear profiles:

SLICE Statistical Analysis





represent: number i is for A. Number j is for B..

Detection efficiency

As the number of experiments is limited, there should be some detection efficiency. Considering the detection efficiency can expand this SLICE model to accommodate additional complications. It is a statistical method to improve the calculation result. In this part, the GAM data is divided into two types: one is that the locus in the slice is found in the experiments, and the other is that the locus in the slice is not detected in the experiments.

Estimating interaction probabilities of pairs

Based on the estimated detection efficiency and the previous probability of,the interaction probability of pairs can be calculated. The loci are detected by next generation sequencing.

Advantages

In comparison with 3C based methods, GAM provides three key advantages.