OpenDocument technical specification
This article describes the technical specifications of the OpenDocument office document standard, as developed by the OASIS industry consortium. A variety of organizations developed the standard publicly and make it publicly accessible, meaning it can be implemented by anyone without restriction. The OpenDocument format aims to provide an open alternative to proprietary document formats.
Document representation
The OpenDocument format supports the following two ways of document representation:- As a collection of several sub-documents within a package, each of which stores part of the complete document. This is the common representation of OpenDocument documents. It uses filename extensions such as
.odt
,.ott
,.ods
,.odp
... etc. The package is a standard ZIP file with different filename extensions and with a defined structure of sub-documents. Each sub-document within a package has a different document root and stores a particular aspect of the XML document. All types of documents use the same set of document and sub-document definitions. - As a single XML document – also known as Flat XML or Uncompressed XML Files. Single OpenDocument XML files are not widely used, they are also unsupported on some office software which claims to support ODF. Filename extensions for a single OpenDocument XML documents are not defined in the OpenDocument technical specification, but commonly used are
.xml
,.fodt
,.fods
,... etc.
The MIME type is also used in the
office:mimetype
attribute. It is very important to use this attribute in flat XML files/single XML documents, where this is the only way the type of the document can be detected. Its values are the MIME types that are used for the packaged variant of office documents.Documents
The most common file extensions used for OpenDocument documents are.odt
for text documents, .ods
for spreadsheets, .odp
for presentation programs, and .odg
for graphics. These are easily remembered by considering ".od" as being short for "OpenDocument", and then noting that the last letter indicates its more specific type.Here is the complete list of document types, showing the type of file, the recommended file extension, and the MIME Type:
File type | Extension | MIME Type | ODF specification |
Text | .odt | application/vnd.oasis.opendocument.text | 1.0 |
Spreadsheet | .ods | application/vnd.oasis.opendocument.spreadsheet | 1.0 |
Presentation | .odp | application/vnd.oasis.opendocument.presentation | 1.0 |
Drawing | .odg | application/vnd.oasis.opendocument.graphics | 1.0 |
Chart | .odc | application/vnd.oasis.opendocument.chart | 1.0 |
Formula | .odf | application/vnd.oasis.opendocument.formula | 1.0 |
Image | .odi | application/vnd.oasis.opendocument.image | 1.0 |
Master Document | .odm | application/vnd.oasis.opendocument.text-master | 1.0 |
Database | .odb | application/vnd.sun.xml.base | not defined in ODF 1.0/1.1 specifications; used in OpenOffice.org 2.x |
Database | .odb | application/vnd.oasis.opendocument.base | ODF 1.2; used in OpenOffice.org 3.x |
Database | .odb | application/vnd.oasis.opendocument.database | defined in |
all OpenDocument single/flat XML files | not defined | text/xml | 1.0 |
Templates
OpenDocument also supports a set of template types. Templates represent formatting information for documents, withoutthe content themselves. The recommended filename extension begins with ".ot", with the last letter indicating what kind of template. The supported set includes:
File type | Extension | MIME Type | ODF specification |
Text | .ott | application/vnd.oasis.opendocument.text-template | 1.0 |
Spreadsheet | .ots | application/vnd.oasis.opendocument.spreadsheet-template | 1.0 |
Presentation | .otp | application/vnd.oasis.opendocument.presentation-template | 1.0 |
Drawing | .otg | application/vnd.oasis.opendocument.graphics-template | 1.0 |
Chart template | .otc | application/vnd.oasis.opendocument.chart-template | 1.0 |
Formula template | .otf | application/vnd.oasis.opendocument.formula-template | 1.0 |
Image template | .oti | application/vnd.oasis.opendocument.image-template | 1.0 |
Web page template | .oth | application/vnd.oasis.opendocument.text-web | 1.0 |
Capabilities
As noted above, the OpenDocument format can describe text documents, spreadsheets, presentations, drawings/graphics, images, charts, mathematical formulas, and "master documents". It can also represent templates for many of them.The official OpenDocument standard version 1.0 defines OpenDocument's capabilities. The text below provides a brief summary of the format's capabilities.
Metadata
The OpenDocument format supports storing metadata by having a set of pre-definedmetadata elements, as well as allowing user-defined and custom metadata.
The format predefines the following metadata fields:
- Generator
- Title
- Description
- Subject
- Keywords
- Initial Creator
- Creator
- Printed By
- Creation Date and Time
- Modification Date and Time
- Print Date and Time
- Document Template
- Automatic Reload
- Hyperlink Behavior
- Language
- Editing Cycles
- Editing Duration
- Document Statistics
Content
and change tracking are all supported. Page sequences and section attributes can be used to control how the text is displayed. Hyperlinks, ruby text, bookmarks, and references are supported as well. Text fields, and mechanisms for automatically generating
tables such as tables of contents, indexes, and bibliographies, are included as well.
The OpenDocument format implements spreadsheets as sets of tables. Thus it features extensive capabilities for formatting the display of tables and spreadsheets. OpenDocument also supports database ranges, filters, and "data pilots". Change tracking is available for spreadsheets as well.
The graphics format supports a vector graphic representation, in which a set of layers and the contents of each layer is defined. Available drawing shapes include Rectangle, Line, Polyline, Polygon, Regular Polygon, Path, Circle, Ellipse, and Connector. 3D Shapes are also available; the format includes information about the Scene, Light, Cube, Sphere, Extrude, and Rotate. Custom shapes can also be defined.
Presentations are supported. Users can include animations in presentations, with control over the sound, showing a shape or text, hiding a shape or text, or dimming something, and these can be grouped. In OpenDocument, much of the format capabilities are reused from the text format, simplifying implementations. However, tables are not supported within OpenDocument as drawing objects, so may only be included in presentations as embedded tables.
Charts define how to create graphical displays from numerical data. They support titles, subtitles, a footer, and a legend to explain the chart. The format defines the series of data that is to be used for the graphical display, and a number of different kinds of graphical displays.
Forms are specially supported, building on the existing XForms standard.
Objects
A document in OpenDocument format can contain two types of objects, as follows:- Objects that have an OpenDocument representation. These objects are:
- * Formulas
- * Charts
- * Spreadsheets
- * Text documents
- * Drawings
- * Presentations
- Objects that do not have an XML representation. These objects only have a binary representation. An example for this kind of objects are OLE objects.
Formatting
The style and formatting controls are numerous, providing a number of controls over the display of information.Page layout is controlled by a variety of attributes. These include page size, number format, paper tray, print orientation, margins, border, padding, shadow, background, columns, print page order, first page number, scale, table centering, maximum footnote height and separator, and many layout grid properties.
Headers and footer can have defined fixed and minimum heights, margins, border line width, padding, background, shadow, and dynamic spacing.
There are many attributes for specific text, paragraphs, ruby text, sections, tables, columns, lists, and fills. Specific characters can have their fonts, sizes, generic font family names, and other properties set. Paragraphs can have their vertical space controlled through attributes on keep together, widow, and orphan, and have other attributes such as "drop caps" to provide special formatting.
The list is extremely extensive; see the references for details.
Spreadsheet formulas
OpenDocument version 1.2 fully describes mathematical formulas displayable on-screen. It is fully capable of exchanging spreadsheet data, formats, pivot tables, and other information typically included in a spreadsheet. OpenDocument exchanges formulas as values of the attribute table:formula.The allowed syntax of table:formula was not defined in sufficient detail in the OpenDocument version 1.0 specification, which defined spreadsheet formulas using a set of simple examples showing, for example, how to specify ranges and the SUM function. The OASIS OpenDocument Formula sub group therefore standardized the table:formula in the OpenFormula specification. For more information see the OpenFormula article.
Encryption
When OpenDocument file is password protected the file structure of the bundle remains the same, but contents of XML files in the package are encrypted using following algorithm:- The file contents are compressed with the DEFLATE algorithm.
- A checksum of a portion of the compressed file is computed and stored so password correctness can be verified when decrypting.
- A digest of the user entered password in UTF-8 encoding is created and passed to the package component. ODF versions 1.0 and 1.1 only mandate support for the SHA-1 digest here, while version 1.2 recommends SHA-256.
- This digest is used to produce a derived key by undergoing key stretching with PBKDF2 using HMAC-SHA-1 with a salt of arbitrary length generated by the random number generator for an arbitrary iteration count.
- The random number generator is used to generate a random initialization vector for each file.
- The initialization vector and derived key are used to encrypt the compressed file contents. ODF 1.0 and 1.1 use Blowfish in 8-bit cipher feedback mode, while ODF 1.2 considers it a legacy algorithm and allows Triple DES and AES, both in cipher block chaining mode, to be used instead.
Format internals
According to the OpenDocument 1.0 specification, the ZIP file specification is defined in Info-ZIP Application Note 970311, 1997.
The simple compression mechanism used for a package normally makes OpenDocument files significantly smaller than equivalent Microsoft "
.doc
" or ".ppt
" files. This smaller size is important for organizations who store a vast number of documents for long periods of time, and to those organizations who must exchange documents over low bandwidth connections. Once uncompressed, most data is contained in simple text-based XML files, so the uncompressed data contents have the typical ease of modification and processing of XML files. The standard also allows for the creation of a single XML document, which uses The standard allows the inclusion of directories to store images, non-SMIL animations, and other files that are used by the document but cannot be expressed directly in the XML.
Due to the openly specified compression format used, it is possible for a user to extract the container file to manually edit the contained files. This allows repair of a corrupted file or low-level manipulation of the contents.
The zipped set of files and directories includes the following:
- XML files
- * content.xml
- * meta.xml
- * settings.xml
- * styles.xml
- Other files
- * mimetype
- Directories
- * META-INF/
- ** manifest.xml
- * Thumbnails/
- ** thumbnail.png
content.xml
content.xml, the most important file, carries the actual content of the document. The base format is inspired by HTML, and though far more complex, it should be reasonably legible to humans:This is a paragraph. The formatting information is
in the Text_body style. The empty text:p tag above
is a blank paragraph.
styles.xml
styles.xml contains style information. OpenDocument makes heavy use of styles for formatting and layout. Most of the style information is here. Styles types include:- Paragraph styles
- Page styles
- Character styles
- Frame styles
- List styles
meta.xml
meta.xml contains the file metadata. For example, Author, "Last modified by", date of last modification, etc. The contents look somewhat like this:page-count="59" paragraph-count="676"
image-count="2" word-count="16701"
character-count="98757"/>
The names of the
settings.xml
settings.xml includes settings such as the zoom factor or the cursor position. These are properties that are not content or layout.mimetype (file)
mimetype is just a one-line file with the mimetype of the document. One implication of this is that the file extension is actually immaterial to the format. The file extension is only there for the benefit of the user.Thumbnails (directory)
Thumbnails is a separate folder for a document thumbnail. The thumbnail must be saved as “thumbnail.png”. A thumbnail representation of a document should be generated by default when the file is saved. It should be a representation of the first page, first sheet, etc. of the document. The required size for the thumbnails is 128x128 pixel. In order to conform to the at www.freedesktop.org, thumbnails must be saved as 8bit, non-interlaced PNG image with full alpha transparency.META-INF (directory)
META-INF is a separate folder. Information about the files contained in the OpenDocument package is stored in an XML file called the manifest file. The manifest file is always stored at the pathname META-INF/manifest.xml. The main pieces of information stored in the manifest are:- A list of all of the files in the package.
- The media type of each file in the package.
- If a file stored in the package is encrypted, the information required to decrypt the file is stored in the manifest.
Pictures (directory)
Reuse of existing formats
By design, OpenDocument reuses existing open XML standards whenever they are available, and it creates new tags only where no existing standard can provide the needed functionality. Thus OpenDocument uses a subset of DublinCore for metadata, MathML for displayed formulas, SMIL for multimedia, XLink for hyperlinks etc.Although not fully reusing SVG for vector graphics, OpenDocument does use SVG-compatible vector graphics within an ODF-format-specific namespace, but also includes non-SVG graphics.
History
- Version 1.0 became an OASIS Standard on 2005-05-01
- Version 1.1 became an OASIS Standard on 2007-02-07
- Version 1.2 became an OASIS Standard on 2011-09-29
Versions detection
office:version
attribute. The version number is in the format revision.version. The office:version
attribute identifies the version of ODF specification that defined the associated element, its schema, its complete content, and its interpretation.ODF 1.0/1.1
If the file has a version known to an XML processor, it may validate the document. Otherwise, it is optional to validate the document, but the document must be well formed. It is not mandatory to useoffice:version
attribute in ODF 1.0 and ODF 1.1 files.ODF 1.2
Theoffice:version
attribute shall be present in each and every When an element has office:version="1.1" the element and its content are based on the OpenDocument v1.1 specification. For office:version="1.0" the element and its content are based on the OpenDocument v1.0 specification. When an element has office:version omitted, the element is based on a version of the OpenDocument specification earlier than v1.2. In these cases and in the case of values other than "1.2", the elements do not comprise an OpenDocument 1.2 document.
Conformance
ODF 1.0/1.1
The OpenDocument specification does not specify which elements and attributes conforming applications must, should, or may support. Even typical office applications may only support a subset of the elements and attributes defined in the specification. The specification contains a non-normative table that provides an overview which element and attributes usually are supported bytypical office application.
Documents that conform to the OpenDocument specification may contain elements and attributes not specified within the OpenDocument schema. Such elements and attributes must not be part of a namespace that is defined within the specification and are called foreign elements and attributes.
Conforming applications either shall read documents that are valid against the OpenDocument schema if all foreign elements and attributes are removed before validation takes place, or shall write documents that are valid against the OpenDocument schema if all foreign elements and attributes are removed before validation takes place.
Conforming applications that read and write documents may preserve foreign elements and attributes. In addition to this, conforming applications should preserve meta information and the content of styles.
Conforming applications shall read documents containing processing instructions and should preserve them.