SGML entity


In the Standard Generalized Markup Language, an entity is a primitive data type, which associates a string with either a unique alias or an SGML reserved word. Entities are foundational to the organizational structure and definition of SGML documents. The SGML specification defines numerous entity types, which are distinguished by keyword qualifiers and context. An entity string value may variously consist of plain text, SGML tags, and/or references to previously defined entities. Certain entity types may also invoke external documents. Entities are called by reference.

Entity types

Entities are classified as general or parameter:
Entities are also further classified as parsed or unparsed:
An internal entity has a value that is either a literal string, or a parsed string comprising markup and entities defined in the same document. In contrast, an external entity has a declaration that invokes an external document, thereby necessitating the intervention of an entity manager to resolve the external document reference.

System entities

An entity declaration may have a literal value, or may have some combination of an optional SYSTEM identifier, which allows SGML parsers to process an entity's string referent as a resource identifier, and an optional PUBLIC identifier, which identifies the entity independent of any particular representation. In XML, a subset of SGML, an entity declaration may not have a PUBLIC identifier without a SYSTEM identifier.

SGML document entity

When an external entity references a complete SGML document, it is known in the calling document as an SGML document entity. An SGML document is a text document with SGML markup defined in an SGML prologue. A complete SGML document comprises not only the document instance itself, but also the prologue and, optionally, the SGML declaration.

Syntax

An entity is defined via an entity declaration in a document's document type definition. For example:






This DTD markup declares the following:
Names for entities must follow the rules for SGML names, and there are limitations on where entities can be referenced.
Parameter entities are referenced by placing the entity name between % and ;. Parsed general entities are referenced by placing the entity name between "&" and ";". Unparsed entities are referenced by placing the entity name in the value of an attribute declared as type ENTITY.
The general entities from the example above might be referenced in a document as follows:


'&greeting1;' is a common test string.
The content of hello.txt is: &greeting2;
In Spanish, &greeting4;


When parsed, this document would be reported to the downstream application the same as if it has been written as follows, assuming the hello.txt file contains the text Salutations:


'Hello world' is a common test string.
The content of hello.txt is: Salutations
In Spanish, ¡Hola! means Hello!


A reference to an undeclared entity is an error unless a default entity has been defined. For example:



Additional markup constructs and processor options may affect whether and how entities are processed. For example, a processor may optionally ignore external entities.

Character entities

Standard entity sets for SGML and some of its derivatives have been developed as mnemonic devices, to ease document authoring when there is a need to use characters that are not easily typed or that are not widely supported by legacy character encodings. Each such entity consists of just one character from the Universal Character Set. Although any character can be referenced using a numeric character reference, a character entity reference allows characters to be referenced by name instead of code point.
For example, HTML 4 has 252 built-in character entities that do not need to be explicitly declared, while XML has five. XHTML has the same five as XML, but if its DTDs are explicitly used, then it has 253.