Canonical S-expressions

A Canonical S-expression is a binary encoding form of a subset of general S-expression. It was designed for use in SPKI to retain the power of S-expressions and ensure canonical form for applications such as digital signatures while achieving the compactness of a binary form and maximizing the speed of parsing.
The particular subset of general S-expressions applicable here is composed of atoms, which are byte strings, and parentheses used to delimit lists or sub-lists. These S-expressions are fully recursive.
While S-expressions are typically encoded as text, with spaces delimiting atoms and quotation marks used to surround atoms that contain spaces, when using the canonical encoding each atom is encoded as a length-prefixed byte string. No whitespace separating adjacent elements in a list is permitted. The length of an atom is expressed as an ASCII decimal number followed by a ":".

Example

The sexp

becomes the csexp

No quotation marks are required to escape the space character internal to the atom "Canonical S-expression", because the length prefix clearly points to the end of the atom. There is no whitespace separating an atom from the next element in the list.

Properties

Uniqueness of canonical encoding: Forbidding whitespace between list elements and providing just one way of encoding atoms ensures that every S-expression has exactly one encoded form. Thus, we can decide whether two S-expressions are equivalent by comparing their encodings.
Support for binary data: Atoms can be any binary string. So, a cryptographic hash value or a public key modulus that would otherwise have to be encoded in base64 or some other printable encoding can be expressed in csexp as its binary bytes.
Support for type-tagging encoded information: A csexp includes a non-S-expression construct for indicating the encoding of a string, when that encoding is not obvious. Any atom in csexp can be prefixed by a single atom in square brackets such as "" or "".
Interpretation and restrictions

While csexps generally permit empty lists, empty atoms, and so forth, certain uses of csexps impose additional restrictions. For example, csexps as used in SPKI have one limitation compared to csexps in general: every list must start with an atom, and therefore there can be no empty lists.
Typically, a list's first atom is treated as one treats an element name in XML.

Comparisons to other encodings

There are other encodings in common use:

XML
ASN.1
JSON

Generally, csexp has a parser one or two decimal orders of magnitude smaller than that of either XML or ASN.1. This small size and corresponding speed give csexp its main advantage. In addition to the parsing advantage, there are other differences.

csexp vs. XML

csexp and XML differ in that csexp is a data-representation format, while XML includes a data-representation format and also a schema mechanism. Thus, XML can be "configured" for particular kinds of data, which conform to some grammar. It has languages for defining document grammars: DTD is defined by the XML standard itself, while XSD, RelaxNG, and Schematron are commonly used with XML for additional features, and XML can also work with no schema. csexp data can of course be operated on by schemas implemented at a higher level, but provides no such mechanism itself.
In terms of characters and bytes, a csexp "string" may have any byte sequence whatsoever, while XML, requires alternate representations for a few characters. This, however, has no effect on the range of structures and semantics that can be represented. XML also provides mechanisms to specify how a given byte sequence is intended to be interpreted: Say, as a Unicode UTF-8 string, a JPEG file, or an integer; csexp leaves such distinctions to external mechanisms.
At the most basic level, both csexp and XML represent trees. This is not surprising, since XML can be described as a differently-punctuated form for LISP-like S-expressions, or vice versa.
However, XML includes additional semantics, which are commonly achieved in csexp by various conventions rather than as part of the language. First, every XML element has a name. Second, XML provide datatyping, firstly via the schema grammar. A schema can also, however, distinguish integers, strings, data objects with types and.
An XML element may also have attributes, a construct that csexp does not share. To represent XML data in csexp, one must choose a representation for such attributes; an obvious one is to reserve the second item in each S-expression for a list of pairs, analogous to the LISP association list. The XML ID and IDREF attributes have no equivalent in csexp, but can be easily implemented by a csexp application program.
Finally, an XML element may contain comments and/or processing instructions. csexp has no specific equivalents, but they are trivial to represent, merely by reserving a name for each. For example, naming them "*COM" and "*PI" :

Both csexp and XML are fully recursive.
The first atom in a csexp list, by convention roughly corresponds to an XML element type name in identifying the "type" of the list. However, in csexp this can be any atom in any encoding, while XML element names are identifiers, constrained to certain characters, like programming-language identifiers. csexp's method is obviously more general; on the other hand, Identifying what encoding such an item is in, and thus how to interpret it, is determined only by a particular user's conventions, meaning that a csexp application must build such conventions for itself, in code, documentation, and so forth.
Similarly, csexp atoms are binary, while XML is designed to be human-readable so arbitrary bytes in XML must be encoded somehow. This means that storing large amounts of non-readable information in uncompressed XML takes more space; on the other hand, it will survive translation between alternate character sets.
It has been suggested that XML "merges" a sequence of strings within one element into a single string, while csexp allows a sequence of atoms within a list and those atoms remain separate from one another; but this is incorrect. Exactly like S-expressions and csexp, XML has a notion of a "sequence of strings" only if the "strings" are separated somehow:
<s>String A</s><s>String B</s>
versus
<s>String AString B</s>

versus

versus

csexp vs. ASN.1

ASN.1 is a popular binary encoding form. However, it expresses only syntax, not semantics. Two different structures each a SEQUENCE of two INTEGERS have identical representations on the wire. To parse an ASN.1 structure, one must tell the parser what set of structures one is expecting and the parser must match the data type being parsed against the structure options. This adds to the complexity of an ASN.1 parser.
A csexp structure carries some indication of its own semantics, and the parser for a csexp structure does not care what structure is being parsed. Once a wire-format expression has been parsed into an internal tree form, the consumer of that structure can examine it for conformance to what was expected. An XML document without a schema works just like csexp in this respect, while an XML document with them can work more like ASN.1.

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...