SBML


The Systems Biology Markup Language is a representation format, based on XML, for communicating and storing computational models of biological processes. It is a free and open standard with widespread software support and a community of users and developers. SBML can represent many different classes of biological phenomena, including metabolic networks, cell signaling pathways, regulatory networks, infectious diseases, and many others. It has been proposed as a standard for representing computational models in systems biology today.

History

Late in the year 1999 through early 2000, with funding from the Japan Science and Technology Corporation, Hiroaki Kitano and John C. Doyle assembled a small team of researchers to work on developing better software infrastructure for computational modeling in systems biology. Hamid Bolouri was the leader of the development team, which consisted of Andrew Finney, Herbert Sauro, and Michael Hucka. Bolouri identified the need for a framework to enable interoperability and sharing between the different simulation software systems for biology in existence during the late 1990s, and he organized an informal workshop in December 1999 at the California Institute of Technology to discuss the matter. In attendance at that workshop were the groups responsible for the development of DBSolve, E-Cell, Gepasi, Jarnac, StochSim, and The Virtual Cell. Separately, earlier in 1999, some members of these groups also had discussed the creation of a portable file format for metabolic network models in the BioThermoKinetics group. The same groups who attended the first Caltech workshop met again on April 28–29, 2000, at the first of a newly created meeting series called Workshop on Software Platforms for Systems Biology. It became clear during the second workshop that a common model representation format was needed to enable the exchange of models between software tools as part of any functioning interoperability framework, and the workshop attendees decided the format should be encoded in XML.
The Caltech ERATO team developed a proposal for this XML-based format and circulated the draft definition to the attendees of the 2nd Workshop on Software Platforms for Systems Biology in August 2000. This draft underwent extensive discussion over mailing lists and during the 2nd Workshop on Software Platforms for Systems Biology, held in Tokyo, Japan, in November 2000 as a satellite workshop of the ICSB 2000 conference. After further revisions, discussions and software implementations, the Caltech team issued a specification for SBML Level 1, Version 1 in March 2001.
SBML Level 2 was conceived at the 5th Workshop on Software Platforms for Systems Biology, held in July 2002, at the University of Hertfordshire, UK. By this time, far more people were involved than the original group of SBML collaborators and the continued evolution of SBML became a larger community effort, with many new tools having been enhanced to support SBML. The workshop participants in 2002 collectively decided to revise the form of SBML in Level 2. The first draft of the Level 2 Version 1 specification was released in August 2002, and the final set of features was finalized in May 2003 at the 7th Workshop on Software Platforms for Systems Biology in Ft. Lauderdale, Florida.
The next iteration of SBML took two years in part because software developers requested time to absorb and understand the larger and more complex SBML Level 2. The inevitable discovery of limitations and errors led to the development of SBML Level 2 Version 2, issued in September 2006. By this time, the team of SBML Editors had changed and now consisted of Andrew Finney, Michael Hucka and Nicolas Le Novère.
SBML Level 2 Version 3 was published in 2007 after countless contributions by and discussions with the SBML community. 2007 also saw the election of two more SBML Editors as part of the introduction of the modern SBML Editor organization in the context of the SBML development process.
SBML Level 2 Version 4 was published in 2008 after certain changes in Level 2 were requested by popular demand. Version 4 was finalized after the SBML Forum meeting held in Gothenburg, Sweden, as a satellite workshop of ICSB 2008 in the fall of 2008.
SBML Level 3 Version 1 Core was published in final form in 2010, after prolonged discussion and revision by the SBML Editors and the SBML community. It contains numerous significant changes in syntax and constructs from Level 2 Version 4, but also represents a new modular base for continued expansion of SBML's features and capabilities going into the future.
SBML Level 2 Version 5 was published in 2015. This revision included a number of textual changes in response to user feedback, thereby addressing the list of errata collected over many years for the SBML Level 2 Version 4 specification. In addition, Version 5 introduced a facility to use nested annotations within SBML's annotation format.

The language

SBML is sometimes incorrectly assumed to be limited in scope only to biochemical network models because the original publications and early software focused on this domain. In reality, although the central features of SBML are indeed oriented towards representing chemical reaction-like processes that act on entities, this same formalism serves analogously for many other types of processes; moreover, SBML has language features supporting the direct expression of mathematical formulas and discontinuous events separate from reaction processes, allowing SBML to represent much more than solely biochemical reactions. Evidence for SBML's ability to be used for more than merely descriptions of biochemistry can be seen in the variety of models available from BioModels Database.

Purposes

SBML has three main purposes:
SBML is not an attempt to define a universal language for quantitative models. SBML's purpose is to serve as a lingua franca—an exchange format used by different present-day software tools to communicate the essential aspects of a computational model.

Main capabilities

SBML can encode models consisting of entities acted upon by processes. An important principle is that models are decomposed into explicitly-labeled constituent elements, the set of which resembles a verbose rendition of chemical reaction equations together with optional explicit equations ; the SBML representation deliberately does not cast the model directly into a set of differential equations or other specific interpretation of the model. This explicit, modeling-framework-agnostic decomposition makes it easier for a software tool to interpret the model and translate the SBML form into whatever internal form the tool actually uses.
A software package can read an SBML model description and translate it into its own internal format for model analysis. For example, a package might provide the ability to simulate the model by constructing differential equations and then perform numerical time integration on the equations to explore the model's dynamic behavior. Or, alternatively, a package might construct a discrete stochastic representation of the model and use a Monte Carlo simulation method such as the Gillespie algorithm.
SBML allows models of arbitrary complexity to be represented. Each type of component in a model is described using a specific type of data structure that organizes the relevant information. The data structures determine how the resulting model is encoded in XML.
In addition to the elements above, another important feature of SBML is that every entity can have machine-readable annotations attached to it. These annotations can be used to express relationships between the entities in a given model and entities in external resources such as databases. A good example of the value of this is in BioModels Database, where every model is annotated and linked to relevant data resources such as publications, databases of compounds and pathways, controlled vocabularies, and more. With annotations, a model becomes more than simply a rendition of a mathematical construct—it becomes a semantically-enriched framework for communicating knowledge.

Levels and versions

SBML is defined in Levels: upward-compatible specifications that add features and expressive power. Software tools that do not need or cannot support the complexity of higher Levels can go on using lower Levels; tools that can read higher Levels are assured of also being able to interpret models defined in the lower Levels. Thus new Levels do not supersede previous ones. However, each Level can have multiple Versions within it, and new Versions of a Level do supersede old Versions of that same Level.
There are currently three Levels of SBML defined. The current Versions within those Levels are the following:
Open-source software infrastructure such as libSBML and JSBML allows developers to support all Levels of SBML their software with a minimum amount of effort.
The SBML Team maintains a public issue tracker where readers may report errors or other issues in the SBML specification documents. Reported issues are eventually put on the list of official errata associated with each specification release. The lists of errata are documented on the page of SBML.org.

Level 3 packages

Development of SBML Level 3 has been proceeding in a modular fashion. The Core specification is a complete format that can be used alone. Additional Level 3 packages can be layered on to this core to provide additional, optional features.

Hierarchical Model Composition

The Hierarchical Model Composition package, known as "comp", was released in November 2012. This package provides
the ability to include models as submodels inside another model. The goal is to support the ability
of modelers and software tools to do such things as decompose larger models into smaller ones,
as a way to manage complexity; incorporate multiple instances of a given model within one or more
enclosing models, to avoid literal duplication of repeated elements; and create libraries of reusable,
tested models, much as is done in software development and other engineering fields. The specification was the culmination
of years of discussion by a wide number of people.

Flux Balance Constraints

The Flux Balance Constraints package was first released
in February, 2013. Import revisions were introduced as part of
Version 2, released in September, 2015. The
"fbc" package provides support for constraint-based modeling, frequently used to analyze and study biological networks on
both a small and large scale. This SBML package makes use of
standard components from the SBML Level 3 core specification, including
species and reactions, and extends them with additional attributes and
structures to allow modelers to define such things as flux bounds and
optimization functions.

Qualitative Models

The Qualitative Models or "qual" package for SBML Level 3 was
released in May 2013. This package supports the representation of models where an
in-depth knowledge of the biochemical reactions and their kinetics is missing
and a qualitative approach must be used. Examples of phenomena that have
been modeled in this way include gene regulatory networks
and signaling pathways, basing the model structure on
the definition of regulatory or influence graphs. The definition and use of
some components of this class of models differ from the way that species and
reactions are defined and used in core SBML models. For example,
qualitative models typically associate discrete levels of activities with
entity pools; consequently, the processes involving them cannot be described
as reactions per se, but rather as transitions between states. These systems
can be viewed as reactive systems whose dynamics are represented by means of
state transition graphs in
which the nodes are the reachable states and the edges are the state
transitions.

Layout

The SBML layout package originated as a set of annotation conventions
usable in SBML Level 2. It was introduced at the SBML Forum in
St. Louis in 2004. Ralph Gauges wrote the
specification and provided an implementation that
was widely used. This original definition was reformulated as an SBML
Level 3 package, and a specification was formally released in August,
2013.
The SBML Level 3 Layout package provides a specification for how to
represent a reaction network in a graphical form. It is thus better tailored
to the task than the use of an arbitrary drawing or graph. The SBML
Level 3 package only deals with the information necessary to define the
position and other aspects of a graph's layout; the additional details
necessary to complete the graph—namely, how the visual aspects are meant
to be rendered— are the purvey of the separate SBML Level 3
package called Rendering. As of November 2015, a draft
specification for the "render" package is available, but it has not yet been
officially finalized.

Packages under development

Development of SBML Level 3 packages is being undertaken such that specifications are reviewed and implementations
attempted during the development process. Once a specification is stable and there are two implementations that support it,
the package is considered accepted. The packages detailed above have all reached the acceptance stage.
The table below gives a brief summary of packages that are currently in the development phase.
Package nameLabelDescription
ArraysarraysSupport for expressing arrays of components
DistributionsdistribSupport for encoding models that sample values from statistical distributions or specify statistics associated with numerical values
DynamicsdynSupport for creating and destroying entities during a simulation
GroupsgroupsA means for grouping elements
Multistate and Multicomponent speciesmultiObject structures for representing entity pools with multiple states and composed of multiple components, and reaction rules involving them
RenderingrenderSupport for defining the graphical symbols and glyphs used in a diagram of the model; adjunct to the layout package
Required ElementsreqSupport for a fine-grained indication of SBML elements that have been changed by the presence of another package
Spatial ProcessesspatialSupport for describing processes that involve a spatial component, and describing the geometries involved

Structure

A model definition in SBML Levels 2 and 3 consists of lists of one or more of the following components:
As of February 2020, nearly 300 software systems advertise support for SBML. A current list is available in the form of the , hosted at SBML.org.
SBML has been and continues to be developed by the community of people making software platforms for systems biology, through active email discussion lists and biannual workshops. The meetings are often held in conjunction with other biology conferences, especially the International Conference on Systems Biology. The community effort is coordinated by an elected editorial board made up of five members. Each editor is elected for a 3-year non-renewable term.
Tools such as an online model validator as well as open-source libraries for incorporating SBML into software programmed in the C, C++, Java, Python, Mathematica, MATLAB and other languages are developed partly by the SBML Team and partly by the broader SBML community.
SBML is an official IETF MIME type, specified by RFC 3823.