A network is a set of nodes and a set of directed or undirected edges between the nodes. Many types of biological networks exist, including transcriptional, signalling and metabolic. Few such networks are known in anything approaching their complete structure, even in the simplest bacteria. Still less is known on the parameters governing the behavior of such networks over time, how the networks at different levels in a cell interact, and how to predict the complete state description of a eukaryotic cell or bacterial organism at a given point in the future. Systems biology, in this sense, is still in its infancy. There is great interest in network medicine for the modelling biological systems. This article focuses on a necessary prerequisite to dynamic modeling of a network: inference of the topology, that is, prediction of the "wiring diagram" of the network. More specifically, we focus here on inference of biological network structure using the growing sets of high-throughput expression data for genes, proteins, and metabolites. Briefly, methods using high-throughput data for inference of regulatory networks rely on searching for patterns of partial correlation or conditional probabilities that indicate causal influence. Such patterns of partial correlations found in the high-throughput data, possibly combined with other supplemental data on the genes or proteins in the proposed networks, or combined with other information on the organism, form the basis upon which such algorithms work. Such algorithms can be of use in inferring the topology of any network where the change in state of one node can affect the state of other nodes.
Transcriptional regulatory networks
Genes are the nodes and the edges are directed. A gene serves as the source of a direct regulatory edge to a target gene by producing an RNA or protein molecule that functions as a transcriptional activator or inhibitor of the target gene. If the gene is an activator, then it is the source of a positive regulatory connection; if an inhibitor, then it is the source of a negative regulatory connection. Computational algorithms take as primary input data measurements of mRNA expression levels of the genes under consideration for inclusion in the network, returning an estimate of the network topology. Such algorithms are typically based on linearity, independence or normality assumptions, which must be verified on a case-by-case basis. Clustering or some form of statistical classification is typically employed to perform an initial organization of the high-throughput mRNA expression values derived from microarray experiments, in particular to select sets of genes as candidates for network nodes. The question then arises: how can the clustering or classification results be connected to the underlying biology? Such results can be useful for pattern classification – for example, to classify subtypes of cancer, or to predict differential responses to a drug. But to understand the relationships between the genes, that is, to more precisely define the influence of each gene on the others, the scientist typically attempts to reconstruct the transcriptional regulatory network. This can be done by data integration in dynamic models supported by background literature, or information in public databases, combined with the clustering results. The modelling can be done by a Boolean network, by Ordinary differential equations or Linear regression models, e.g. Least-angle regression, by Bayesian network or based on Information theory approaches. For instance it can be done by the application of a correlation-based inference algorithm, as will be discussed below, an approach which is having increased success as the size of the available microarray sets keeps increasing
Signal transduction
networks. Proteins are the nodes and directed edges represent interaction in which the biochemical conformation of the child is modified by the action of the parent. Primary input into the inference algorithm would be data from a set of experiments measuring protein activation / inactivation across a set of proteins. Inference for such signalling networks is complicated by the fact that total concentrations of signalling proteins will fluctuate over time due to transcriptional and translational regulation. Such variation can lead to statistical confounding. Accordingly, more sophisticated statistical techniques must be applied to analyse such datasets.
Metabolic
networks. Metabolites are the nodes and the edges are directed. Primary input into an algorithm would be data from a set of experiments measuring metabolite levels.
Protein-protein interaction networks are also under very active study. However, reconstruction of these networks does not use correlation-based inference in the sense discussed for the networks already described, and a description of such interaction network reconstruction is left to other articles.