Chemical table file


Chemical table file is a family of text-based chemical file formats that describe molecules and chemical reactions. One format, for example, lists each atom in a molecule, the x-y-z coordinates of that atom, and the bonds among the atoms.

File formats

There are several file formats in the family.
The formats were created by MDL Information Systems, which was acquired by Symyx Technologies then merged with Accelrys Corp., and now called BIOVIA, a subsidiary of Dassault Systemes of Dassault Group
CT File is an open format, BIOVIA publishes its specification.

Molfile

An MDL Molfile is a file format for holding information about the atoms, bonds, connectivity and coordinates of a molecule.
The molfile consists of some header information, the Connection Table containing atom info, then bond connections and types, followed by sections for more complex information.
The molfile is sufficiently common that most, if not all, cheminformatics software systems/applications are able to read the format, though not always to the same degree. It is also supported by some computational software such as Mathematica.
The current de facto standard version is molfile V2000; although, more recently, the V3000 format has been circulating widely enough to present a potential compatibility issue for those applications that are not yet V3000-capable.

L-Alanine
Title line Header Block
 ACD/Labs09071717443D
Program / file timestamp line
Header Block
Exported from ACD/Labs
Comment line Header Block
6 5 0 0 1 0 3 V2000
Counts lineConnection table
-0.6622 0.5342 0.0000 C 0 0 2 0 0 0
0.6622 -0.3000 0.0000 C 0 0 0 0 0 0
-0.7207 2.0817 0.0000 C 1 0 0 0 0 0
-1.8622 -0.3695 0.0000 N 0 3 0 0 0 0
0.6220 -1.8037 0.0000 O 0 0 0 0 0 0
1.9464 0.4244 0.0000 O 0 5 0 0 0 0
Atom block
: x, y, z, element, etc.
Connection table
1 2 1 0 0 0
1 3 1 1 0 0
1 4 1 0 0 0
2 5 2 0 0 0
2 6 1 0 0 0
Bond block
: 1st atom, 2nd atom, type, etc.
Connection table
M CHG 2 4 1 6 -1
M ISO 1 3 13
Properties blockConnection table

M END
END line
END

Counts line

The Original Counts line has the following specification.
Value660001V2000
Descriptionnumber of atomsnumber of bondsnumber of atom listChiral flag, 1 = chiral;
0 = not chiral
number of stext entriesnumber of lines of
additional properties
mol version
Type

The Extended Connection Table (V3000)

The extended molfile consists of a regular molfile “no structure” followed by a single molfile appendix that contains the body of the connection table. The following figure shows both an alanine structure and the extended molfile corresponding to it.
Note that the “no structure” is flagged with the “V3000” instead of the “V2000” version stamp. There are two other changes to the header in addition to the version:
Unlike the V2000 molfile, the V3000 extended Rgroup molfile has the same header format as a non-Rgroup molfile.

L-Alanine
DescriptionHeader block

GSMACCS-II07189510252D 1 0.00366 0.00000 0
Header with timestampHeader block

Figure 1, J. Chem. Inf. Comput. Sci., Vol 32, No. 3., 1992
Comment lineHeader block

0 0 0 0 0 999 V3000
V2000-compatibility lineHeader block

M V30 BEGIN CTAB
Connection table

M V30 COUNTS 6 5 0 0 1
Counts lineConnection table
M V30 BEGIN ATOM
M V30 1 C -0.6622 0.5342 0 0 CFG=2
M V30 2 C 0.6622 -0.3 0 0
M V30 3 C -0.7207 2.0817 0 0 MASS=13
M V30 4 N -1.8622 -0.3695 0 0 CHG=1
M V30 5 O 0.622 -1.8037 0 0
M V30 6 O 1.9464 0.4244 0 0 CHG=-1
M V30 END ATOM
Atom blockConnection table
M V30 BEGIN BOND
M V30 1 1 1 2
M V30 2 1 1 3 CFG=1
M V30 3 1 1 4
M V30 4 2 2 5
M V30 5 1 2 6
M V30 END BOND
Bond blockConnection table

M V30 END CTAB
M END
Connection table

Counts line

A counts line is required, and must be first. It specifies the number of atoms, bonds, 3D objects, and Sgroups. It also specifies whether or not the CHIRAL flag is set. Optionally, the counts line can specify molregno. This is only used when the regno exceeds 999999. The format of the counts line is:

SDF

SDF is one of a family of chemical-data file formats developed by MDL; it is intended especially for structural information. "SDF" stands for structure-data file, and SDF files actually wrap the molfile format. Multiple compounds are delimited by lines consisting of four dollar signs. A feature of the SDF format is its ability to include associated data.
Associated data items are denoted as follows:

>
XCA3464366
>
5.825
>
Sigma
>
499.611

Multiple-lines data items are also supported. The MDL SDF-format specification requires that a hard-carriage-return character be inserted if a single line of any text field exceeds 200 characters. This requirement is frequently violated in practice, as many SMILES and InChI strings exceed that length.

Other formats of the family

There are other, less commonly used formats of the family: