Text Encoding Initiative

The Text Encoding Initiative is a text-centric community of practice in the academic field of digital humanities, operating continuously since the 1980s. The community currently runs a mailing list, meetings and conference series, and maintains an eponymous technical standard, a journal, a wiki, a GitHub repository and a toolchain.

TEI guidelines

The TEI Guidelines collectively define a type of XML format, and are the defining output of the community of practice. The format differs from other well-known open formats for text in that it's primarily semantic rather than presentational; the semantics and interpretation of every tag and attribute are specified.
Some 500 different textual components and concepts
; each is grounded in one or more academic disciplines and examples are given.

Technical details

The standard is split into two parts, a discursive textual description with extended examples and discussion and set of tag-by-tag definitions. Schemata in most of the modern formats are generated automatically from the tag-by-tag definitions. A number of tools support the production of the guidelines and the application of the guidelines to specific projects.
A number of special tags are used to circumvent restrictions imposed by the underlying Unicode; glyph to allow representation of characters that don't qualify for Unicode inclusion and choice to allow overcome the required strict linearity.
Most users of the format do not use the complete range of tags but produce a customisation, using a project-specific subset of the tags and attributes defined by the Guidelines. The TEI defines a sophisticated customization mechanism known as ODD for this purpose. In addition to documenting and describing each TEI tag, an ODD specification specifies its content model and other usage constraints, which may be expressed using schematron.
TEI Lite is an example of such a customization. It defines an XML-based file format for exchanging texts. It is a manageable selection from the extensive set of elements available in the full TEI Guidelines.
As an XML-based format, TEI cannot directly deal with overlapping markup and non-hierarchical structures. A variety of options to represent this sort of data is suggested by the guidelines.

Examples

The text of the TEI guidelines is rich in examples. There is also a samples page on the TEI wiki which gives examples of real-world projects which expose their underlying TEI.

Prose tags

TEI allows texts to be marked up syntactically at any level of granularity, or mixture of granularities. For example, this paragraph has been marked up into sentences and clauses.

It was about the beginning of September, 1664,
that I, among the rest of my neighbours,
heard in ordinary discourse
that the plague was returned again to Holland;

for it had been very violent there, and particularly at
Amsterdam and Rotterdam, in the year 1663,
whither, they say, it was brought,
some said from Italy, others from the Levant, among some goods
which were brought home by their Turkey fleet;

others said it was brought from Candia;
others from Cyprus.

It mattered not from whence it came;

but all agreed it was come into Holland again.

Verse

TEI has tags for marking up verse. This example shows a sonnet

Les amoureux fervents et les savants austères
Aiment également, dans leur mûre saison,
Les chats puissants et doux, orgueil de la maison,
Qui comme eux sont frileux et comme eux sédentaires.

Amis de la science et de la volupté
Ils cherchent le silence et l'horreur des ténèbres ;
L'Érèbe les eût pris pour ses coursiers funèbres,
S'ils pouvaient au servage incliner leur fierté.

Ils prennent en songeant les nobles attitudes
Des grands sphinx allongés au fond des solitudes,
Qui semblent s'endormir dans un rêve sans fin ;

Leurs reins féconds sont pleins d'étincelles magiques,
Et des parcelles d'or, ainsi qu'un sable fin,
Étoilent vaguement leurs prunelles mystiques.

Choice tag

The choice tag is used to represent sections of text which might be encoded or tagged in more than one possible way. In the following example, based on one in the standard, choice is used twice, once to indicate an original and a corrected year and once to indicate an original and regularised spelling.

Lastly, That, upon his solemn oath to observe all the above
articles, the said man-mountain shall have a daily allowance of
meat and drink sufficient for the support of
1724
1728
of our subjects,
with free access to our royal person, and other marks of our

favour
favor
.

ODD

One Document Does it all is a literate programming language for XML schemas.
In literate-programming style, ODD documents combine human-readable documentation and machine-readable models using the Documentation Elements module of the Text Encoding Initiative. Tools generate localised and internationalised HTML, ePub, or PDF human-readable output and DTDs, W3C XML Schema, Relax NG Compact Syntax, or Relax NG XML Syntax machine-readable output.
The Roma web application is built around the ODD format and can use it to generate schemas in DTD, W3C XML Schema, Relax NG Compact Syntax, or Relax NG XML Syntax formats, as used by many XML validation tools and services.
ODD is the format used internally by the Text Encoding Initiative for their eponymous technical standard. Although ODD files generally describe the difference between a customized XML format and the full TEI model, ODD also can be used to describe XML formats that are entirely separate from the TEI. One example of this is the W3C's Internationalization Tag Set which uses the ODD format to generate schemas and document its vocabulary.

TEI customizations

TEI customizations are specializations of the TEI XML specification for use in particular fields or by specific communities.

EpiDoc

Customization in the TEI is done through the ODD mechanism mentioned above. In truth since its P5 version, all so-called 'TEI Conformant' uses of the TEI Guidelines are based on a TEI customization documented in a TEI ODD file. Even when users choose one of the off-the-shelf pre-generated schemas to validate against, these have been created from freely available customization files.

Projects

The format is used by many projects worldwide. Practically all projects are associated with one or more universities. Some well-known projects that encode texts using TEI include:

Project	URL	Strengths
British National Corpus		100 million word snapshot of current English
Oxford Text Archive		>1 GB of Linguistic data and electronic texts in 25 languages
Perseus Project		Greek and Latin texts
EpiDoc		Epigraphy and Papyrology
Women Writers Project		Early modern women writers
New Zealand Electronic Text Centre		New Zealand and Pacific Islands texts
The SWORD Project		Bible software, dictionaries, Christian literature
FreeDict		Bilingual dictionaries
Text Creation Partnership		Early English and American books
CELT		Ancient and Medieval Irish Manuscripts
ISTEX		Archives of scientific publications
CAB		An Edition of the Zoroastrian Rituals in the Avestan Language

History

Prior to the creation of TEI, humanities scholars had no common standards for encoding electronic texts in a manner which would serve their academic goals. In 1987, a group of scholars representing fields in humanities, linguistics, and computing convened at Vassar College to put forth a set of guidelines known as the “Poughkeepsie Principles”. These guidelines directed the development of the first TEI standard, "P1"

1987 Work on what would become the TEI started by the Association for Computers and the Humanities, the Association for Computational Linguistics, and the Association for Literary and Linguistic Computing. This culminated in the Closing statement of the Vassar Planning Conference
1994 TEI P3 released co-edited by Lou Burnard and Michael Sperberg-McQueen.
1999 TEI P3 updated.
2002 TEI P4 released, moving from SGML to XML; adoption of Unicode, which XML parsers are required to support.
2007 TEI P5 released, including integration with the xml:lang and xml:id attributes from the W3C, regularization of local pointing attributes to use the hash and unification of the ptr and xptr tags. Together these changes with many more new additions make P5 more regular and bring it closer to current xml practice as promoted by the W3C and as used by other XML variants. Maintenance and feature update versions of TEI P5 have been released at least twice a year since 2007.
2011 TEI P5 v2.0.1 released with support for Genetic editing.
2017 TEI was awarded the Antonio Zampolli Prize from the Alliance of Digital Humanities Organizations.

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...