Grapheme


In linguistics, a grapheme is the smallest functional unit of a writing system.
There exist two main opposing grapheme concepts. In the so-called referential conception, graphemes are interpreted as the smallest units of writing that correspond with sounds. In this concept, the sh in the written English word shake would be a grapheme because it represents the phoneme ʃ. This referential concept is linked to the dependency hypothesis that claims that writing merely depicts speech. By contrast, the analogical concept defines graphemes analogously to phonemes, i.e. via written minimal pairs such as shake vs. snake. In this example, h and n are graphemes because they distinguish two words. This analogical concept is associated with the autonomy hypothesis which holds that writing is a system in its own right and should be studied independently from speech. Both concepts have weaknesses.
Some models adhere to both concepts simultaneously by including two individual units, which are given names such as graphemic grapheme for the grapheme according to the analogical conception, and phonological-fit grapheme for the grapheme according to the referential concept.
In newer concepts, in which the grapheme is interpreted semiotically as a dyadic linguistic sign, it is defined as a minimal unit of writing that is both lexically distinctive and corresponds with a linguistic unit.
The word grapheme, coined in analogy with phoneme, is derived, and the suffix -eme by analogy with phoneme and other names of emic units. The study of graphemes is called graphemics.
The concept of graphemes is abstract and similar to the notion in computing of a character. By comparison, a specific shape that represents any particular grapheme in a specific typeface is called a glyph. For example, the grapheme corresponding to the abstract concept of "the Arabic numeral one" has a distinct glyph with identical meaning in each of many typefaces.

Notation

Graphemes are often notated within angle brackets:,, etc. This is analogous to both the slash notation used for phonemes, and the square bracket notation used for phonetic transcriptions.

Glyphs

In the same way that the surface forms of phonemes are speech sounds or phones, the surface forms of graphemes are glyphs, namely concrete written representations of symbols, and different glyphs representing the same grapheme are called allographs.
Thus, a grapheme can be regarded as an abstraction of a collection of glyphs that are all functionally equivalent.
For example, in written English, there are two different physical representations of the lowercase latin letter "a": "a" and "ɑ". Since, however, the substitution of either of them for the other cannot change the meaning of a word, they are considered to be allographs of the same grapheme, which can be written. Italic and bold face are also allographic.
There is some disagreement as to whether capital and lower case letters are allographs or distinct graphemes. Capitals are generally found in certain triggering contexts that do not change the meaning of a word: a proper name, for example, or at the beginning of a sentence, or all caps in a newspaper headline. In other contexts, capitalization can determine meaning: compare, for example Polish and polish: the former is a language, the latter is for shining shoes. Some linguists consider digraphs like the in ship to be distinct graphemes, but these are generally analyzed as sequences of graphemes. Non-stylistic ligatures, however, such as, are distinct graphemes, as are various letters with distinctive diacritics, such as.

Types of grapheme

The principal types of graphemes are logograms, which represent words or morphemes ; syllabic characters, representing syllables ; and alphabetic letters, corresponding roughly to phonemes. For a full discussion of the different types, see.
There are additional graphemic components used in writing, such as punctuation marks, mathematical symbols, word dividers such as the space, and other :Category:Typographical symbols|typographic symbols. Ancient logographic scripts often used silent determinatives to disambiguate the meaning of a neighboring word.

Relationship with phonemes

As mentioned in the previous section, in languages that use alphabetic writing systems, many of the graphemes stand in principle for the phonemes of the language. In practice, however, the orthographies of such languages entail at least a certain amount of deviation from the ideal of exact grapheme–phoneme correspondence. A phoneme may be represented by a multigraph, as the digraph sh represents a single sound in English. Some graphemes may not represent any sound at all, and often the rules of correspondence between graphemes and phonemes become complex or irregular, particularly as a result of historical sound changes that are not necessarily reflected in spelling. "Shallow" orthographies such as those of standard Spanish and Finnish have relatively regular correspondence between graphemes and phonemes, while those of French and English have much less regular correspondence, and are known as deep orthographies.
Multigraphs representing a single phoneme are normally treated as combinations of separate letters, not as graphemes in their own right. However, in some languages a multigraph may be treated as a single unit for the purposes of collation; for example, in a Czech dictionary, the section for words that start with comes after that for. For more examples, see.