Khmer script


The Khmer script is an abugida script used to write the Khmer language. It is also used to write Pali in the Buddhist liturgy of Cambodia and Thailand.
The Khmer script was adapted from the Pallava script, which ultimately descended from the Brahmi script, which was used in southern India and South East Asia during the 5th and 6th centuries AD. The oldest dated inscription in Khmer was found at Angkor Borei District in Takéo Province south of Phnom Penh and dates from 611. The modern Khmer script differs somewhat from precedent forms seen on the inscriptions of the ruins of Angkor. The Thai and Lao scripts are descendants of an older form of the Khmer script.
Khmer is written from left to right. Words within the same sentence or phrase are generally run together with no spaces between them. Consonant clusters within a word are "stacked", with the second consonant being written in reduced form under the main consonant. Originally there were 35 consonant characters, but modern Khmer uses only 33. Each character represents a consonant sound together with an inherent vowel, either â or ô; in many cases, in the absence of another vowel mark, the inherent vowel is to be pronounced after the consonant.
There are some [|independent vowel] characters, but vowel sounds are more commonly represented as [|dependent vowels], additional marks accompanying a consonant character, and indicating what vowel sound is to be pronounced after that consonant. Most dependent vowels have two different pronunciations, depending in most cases on the inherent vowel of the consonant to which they are added. There are also a number of [|diacritics] used to indicate further modifications in pronunciation. The script also includes its own numerals and [|punctuation marks].

Consonants

There are 35 Khmer consonant symbols, although modern Khmer only uses 33, two having become obsolete. Each consonant has an inherent vowel: â or ô ; equivalently, each consonant is said to belong to the a-series or o-series. A consonant's series determines the pronunciation of the [|dependent vowel] symbols which may be attached to it, and in some positions the sound of the inherent vowel is itself pronounced. The two series originally represented voiceless and voiced [|consonants] respectively ; sound changes during the Middle Khmer period affected vowels following voiceless consonants, and these changes were preserved even though the distinctive voicing was lost.
Each consonant, with one exception, also has a subscript form. These may also be called "sub-consonants"; the Khmer phrase is ជើងអក្សរ cheung âksâr, meaning "foot of a letter". Most subscript consonants resemble the corresponding consonant symbol, but in a smaller and possibly simplified form, although in a few cases there is no obvious resemblance. Most subscript consonants are written directly below other consonants, although subscript r appears to the left, while a few others have ascending elements which appear to the right. Subscripts are used in writing consonant clusters. Clusters in Khmer normally consist of two consonants, although occasionally in the middle of a word there will be three. The first consonant in a cluster is written using the main consonant symbol, with the second attached to it in subscript form. Subscripts were previously also used to write final consonants; in modern Khmer this may be done, optionally, in some words ending -ng or -y, such as ឲ្យ aôy.
The consonants and their subscript forms are listed in the following table. Usual phonetic values are given using the International Phonetic Alphabet ; variations are described below the table. The sound system is described in detail at Khmer phonology. The spoken name of each [|consonant letter] is its value together with its inherent vowel. Transliterations are given using the UNGEGN system; for other systems see Romanization of Khmer.
The letter appears in somewhat modified form when combined with certain dependent vowels.
The letter ញ nhô is written without the lower curve when a subscript is added. When it is subscripted to itself, the subscript is a smaller form of the entire letter: ញ្ញ -nhnh-.
Note that ដ and ត have the same subscript form. In initial clusters this subscript is always pronounced, but in medial positions it is in some words and in others.
The series ដ , ឋ thâ, ឌ , ឍ thô, ណ originally represented retroflex consonants in the Indic parent scripts. The second, third and fourth of these are rare, and occur only for etymological reasons in a few Pali and Sanskrit loanwords. Because the sound /n/ is common, and often grammatically productive, in Mon-Khmer languages, the fifth of this group, ណ, was adapted as an a-series counterpart of ន for convenience.

Variation in pronunciation

The aspirated consonant letters are pronounced with aspiration only before a vowel. There is also slight aspiration with k, ch, t and p sounds before certain consonants, but this is regardless of whether they are spelt with a letter that indicates aspiration.
A Khmer word cannot end with more than one consonant sound, so subscript consonants at the end of words are not pronounced, although they may come to be pronounced when the same word begins a compound.
In some words, a single medial consonant symbol represents both the final consonant of one syllable and the initial consonant of the next.
The letter ប represents only before a vowel. When final or followed by a subscript consonant, it is pronounced . For modification to p by means of a diacritic, see [|Supplementary consonants]. The letter, which represented /p/ in Indic scripts, also often maintains the sound in certain words borrowed from Sanskrit and Pali.
The letters ដ and ឌ are pronounced when final. The letter ត is pronounced in initial position in a weak syllable ending with a nasal.
In final position, letters representing a sound are pronounced as a glottal stop after the vowels,,,,,,,,. The letter រ is silent when final. The letter ស when final is pronounced .

Supplementary consonants

The Khmer writing system includes supplementary consonants, used in certain loanwords, particularly from French and Thai. These mostly represent sounds which do not occur in native words, or for which the native letters are restricted to one of the two vowel series. Most of them are digraphs, formed by stacking a subscript under the letter ហ , with an additional treisâpt diacritic if required to change the inherent vowel to ô. The character for , however, is formed by placing the musĕkâtônd diacritic over the character ប .

Dependent vowels

Most Khmer vowel sounds are written using dependent, or diacritical, vowel symbols, known in Khmer as ស្រៈនិស្ស័យ srăk nissăy or ស្រៈផ្សំ srăk phsâm. These can only be written in combination with a consonant. The vowel is pronounced after the consonant, even though some of the symbols have graphical elements which appear above, below or to the left of the consonant character. Most of the vowel symbols have two possible pronunciations, depending on the inherent vowel of the consonant to which it is added. Their pronunciations may also be different in weak syllables, and when they are shortened. [|Absence of a dependent vowel] often implies that a syllable-initial consonant is followed by the sound of its inherent vowel.
In determining the inherent vowel of a consonant cluster, stops and fricatives are dominant over sonorants. For any consonant cluster including a combination of these sounds, a following dependent vowel is pronounced according to the dominant consonant, regardless of its position in the cluster. When both members of a cluster are dominant, the subscript consonant determines the pronunciation of a following dependent vowel. A non-dominant consonant will also have its inherent vowel changed by a preceding dominant consonant in the same word, even when there is a vowel between them, although some words do not obey this rule.
The dependent vowels are listed below, in conventional form with a dotted circle as a dummy consonant symbol, and in combination with the a-series letter អ ’â. The IPA values given are representative of dialects from the northwest and central plains regions, specifically from the Battambang area, upon which Standard Khmer is based. Vowel pronunciation varies widely in other dialects such as Northern Khmer, where diphthongs are leveled, and Western Khmer, in which breathy voice and modal voice phonations are still contrastive.
The spoken name of each dependent vowel consists of the word ស្រៈ srăk followed by the vowel's a-series value preceded by a glottal stop.

[|Modification by diacritics]

The addition of some of the [|Khmer diacritics] can modify the length and value of inherent or dependent vowels.
The following table shows combinations with the nĭkkôhĕt and reăhmŭkh diacritics, representing final and. They are shown with the a-series consonant អ ’â.
The first four configurations listed here are treated as dependent vowels in their own right, and have names constructed in the same way as for the other dependent vowels.
Other rarer configurations with the reăhmŭkh are អើះ, pronounced, and អែះ, pronounced. The word ចា៎ះ "yes" is pronounced.
The bântăk has the following effects:
The sanhyoŭk sannha is equivalent to the a dependent vowel with the bântăk. However, its o-series pronunciation becomes before final y, and before final r.
The yŭkôleăkpĭntŭ represents or , followed by a glottal stop.

Consonants with no dependent vowel

There are three environments where a consonant may appear without a dependent vowel. The rules governing the inherent vowel differ for all three environments. Consonants may be written with no dependent vowel as an initial consonant of a weak syllable, an initial consonant of a strong syllable or as the final letter of a written word.
In careful speech, initial consonants without a dependent vowel in weak initial syllables are pronounced with their inherent vowel shortened as if modified by the bantak diacritic. For example the first-series letter "ច" in "ចន្លុះ" is pronounced with the short vowel. The second-series letter "ព" in "ពន្លឺ" is pronounced with the short diphthong. In casual speech, these are most often reduced to for both series.
Initial consonants in strong syllables without written vowels are pronounced with their inherent vowels. The word ចង is pronounced, ជត is pronounced. In some words, however, the inherent vowel is pronounced in its reduced form, as if modified by a bântăk diacritic, even though the diacritic is not written. Such reduction regularly takes place in words ending with a consonant with a silent subscript, although in most such words it is the bântăk-reduced form of the vowel a that is heard, as in សព្ទ "noise". The word អ្នក "you, person" has the highly irregular pronunciation.
Consonants written as the final letter of a word usually represent a word-final sound and are pronounced without any following vowel and, in the case of stops, with no audible release as in the examples above. However, in some words adopted from Pali and Sanskrit, what would appear to be a final consonant under normal rules can actually be the initial consonant of a following syllable and pronounced with a short vowel as if followed by ាក់. For example, according to rules for native Khmer words, សុភ would appear to be a single syllable, but, being derived from Pali subha, it is pronounced.

Ligatures

Most consonants, including a few of the subscripts, form ligatures with the vowel a and with all other dependent vowels that contain the same cane-like symbol. Most of these ligatures are easily recognizable, but a few may not be, particularly those involving the letter ប bâ. This combines with the a vowel in the form បា, created to differentiate it from the consonant symbol ហ hâ and also from the ligature for ច châ with a.
Some more examples of ligatured symbols follow:

Independent vowels

Independent vowels are non-diacritical vowel characters that stand alone. In Khmer they are called ស្រៈពេញតួ srăk pénhtuŏ, which means "complete vowels". They are used in some words to represent certain combinations of a vowel with an initial glottal stop or liquid. The [|independent vowels] are used in a small number of words, mostly of Indic origin, and consequently there is some inconsistency in their use and pronunciations. However, a few words in which they occur are used quite frequently; these include: ឥឡូវ "now", ឪពុក "father", ឬ "or", ឮ "hear", ឲ្យ "give, let", ឯង "oneself, I, you", ឯណា "where".
Independent vowel letters are named similarly to the dependent vowels, with the word ស្រៈ srăk followed by the principal sound of the letter, followed by an additional glottal stop after a short vowel. However the letter ឥ is called.

Diacritics

The Khmer writing system contains several diacritics, used to indicate further modifications in pronunciation.
DiacriticKhmer nameFunction
និគ្គហិត nĭkkôhĕtThe Pali niggahīta, related to the anusvara. A small circle written over a consonant or a following dependent vowel, it nasalizes the inherent or dependent vowel, with the addition of ; long vowels are also shortened. For details see Modification by diacritics.
រះមុខ reăhmŭkh
"shining face"
Related to the visarga. A pair of small circles written after a consonant or a following dependent vowel, it modifies and adds final aspiration to the inherent or dependent vowel. For details see Modification by diacritics.
យុគលពិន្ទុ yŭkôleăkpĭntŭA "pair of dots", a fairly recently introduced diacritic, written after a consonant to indicate that it is to be followed by a short vowel and a glottal stop. See Modification by diacritics.
មូសិកទន្ត musĕkâtônd
"mouse teeth"
Two short vertical lines, written above a consonant, used to convert some o-series consonants to a-series. It is also used with ប to convert it to a p sound.
ត្រីសព្ទ treisâptA wavy line, written above a consonant, used to convert some a-series consonants to o-series.
ក្បៀសក្រោម kbiĕh kraômAlso known as បុកជើង bŏkcheung ; a vertical line written under a consonant, used in place of the diacritics treisâpt and musĕkâtônd when they would be impeded by superscript vowels.
បន្តក់ bântăkA small vertical line written over the last consonant of a syllable, indicating shortening of certain vowels. See Modification by diacritics.
របាទ rôbat
រេផៈ répheăk
This superscript diacritic occurs in Sanskrit loanwords and corresponds to the Devanagari diacritic repha. It originally represented an r sound. Now, in most cases, the consonant above which it appears, and the diacritic itself, are unpronounced. Examples: ធម៌ , កាណ៌ , សួគ៌ា .
ទណ្ឌឃាដ tôndâkhéatWritten over a final consonant to indicate that it is unpronounced.
កាកបាទ kakâbatAlso known as a "crow's foot", used in writing to indicate the rising intonation of an exclamation or interjection; often placed on particles such as,,,, and on ចា៎ះ, a word for "yes" used by females.
អស្តា âsda
"number eight"
Used in a few words to show that a [|consonant with no dependent vowel] is to be pronounced with its inherent vowel, rather than as a final consonant.
សំយោគសញ្ញា sanhyoŭk sannhaUsed in some Sanskrit and Pali loanwords ; it is written above a consonant to indicate that the syllable contains a particular short vowel; see Modification by diacritics.
វិរាម vĭréamA mostly obsolete diacritic, corresponding to the virama, which suppresses a consonant's inherent vowel.

Dictionary order

For the purpose of dictionary ordering of words, main consonants, subscript consonants and dependent vowels are all significant; and when they appear in combination, they are considered in the order in which they would be spoken. The order of the consonants and of the dependent vowels is the order in which they appear in the above tables. A syllable written without any dependent vowel is treated as if it contained a vowel character that precedes all the visible dependent vowels.
As mentioned above, the four [|configurations with diacritics] exemplified in the syllables អុំ អំ អាំ អះ are treated as dependent vowels in their own right, and come in that order at the end of the list of dependent vowels. Other configurations with the reăhmŭkh diacritic are ordered as if that diacritic were a final consonant coming after all other consonants. Words with the bântăk and sanhyoŭk sannha diacritics are ordered directly after identically spelled words without the diacritics.
Vowels precede consonants in the ordering, so a combination of main and subscript consonants comes after any instance in which the same main consonant appears unsubscripted before a vowel.
Words spelled with an independent vowel whose sound begins with a glottal stop follow after words spelled with the equivalent combination of អ ’â plus dependent vowel. Words spelled with an independent vowel whose sound begins or follow after all words beginning with the consonants រ and ល respectively.
Words spelled with a consonant modified by a diacritic follow words spelled with the same consonant and dependent vowel symbol but without the diacritic. However, words spelled with ប៉ follow all words with unmodified ប . Sometimes words in which ប is pronounced p are ordered as if the letter were written ប៉..

Numerals

The numerals of the Khmer script, similar to that used by other civilizations in Southeast Asia, are also derived from the southern Indian script. Western-style Arabic numerals are also used, but to a lesser extent.
[|Khmer numerals]
Arabic numerals0123456789

In large numbers, groups of three digits are delimited with Western-style periods. The decimal point is represented by a comma. The Cambodian currency, the riel, is abbreviated using the symbol or simply the letter .

Spacing and punctuation

are not used between all words in written Khmer. Spaces are used within sentences in roughly the same places as commas might be in English, although they may also serve to set off certain items such as numbers and proper names.
Western-style punctuation marks are quite commonly used in modern Khmer writing, including French-style guillemets for quotation marks. However, traditional Khmer punctuation marks are also used; some of these are described in the following table.
MarkKhmer nameFunction
ខណ្ឌ khănUsed as a period. However, consecutive sentences on the same theme are often separated only by spaces.
ល៉ៈ lăkEquivalent to etc.
លេខទោ lékhtoŭ
Duplication sign. It indicates that the preceding word or phrase is to be repeated, a common feature in Khmer syntax.
បរិយោសាន bâriyaôsanA period used to end an entire text or a chapter.
គោមូត្រ koŭmot
A period used at the end of poetic or religious texts.
ភ្នែកមាន់ phnêkmoăn
A symbol used at the start of poetic or religious texts.
ចំណុចពីរគូស châmnŏch pi kus
"two dots line"
Used similarly to a colon.

A hyphen is commonly used between components of personal names, and also as in English when a word is divided between lines of text. It can also be used between numbers to denote ranges or dates. Particular uses of Western-style periods include grouping of digits in large numbers and denotation of abbreviations.

Styles

Several styles of Khmer writing are used for varying purposes. The two main styles are âksâr chriĕng and âksâr mul.
The basic Khmer block was added to the Unicode Standard in version 3.0, released in September 1999. It then contained 103 defined code points; this was extended to 114 in version 4.0, released in April 2003. Version 4.0 also introduced an additional block, called Khmer Symbols, containing 32 signs used for writing lunar dates.
The Unicode block for basic Khmer characters is U+1780-U+17FF:
The first 35 characters are the consonant letters. The symbols at U+17A3 and U+17A4 are deprecated. These are followed by the 15 independent vowels. The code points U+17B4 and U+17B5 are invisible combining marks for inherent vowels, intended for use only in special applications. Next come the 16 [|dependent vowel signs] and the 12 diacritics ; these are represented together with a dotted circle, but should be displayed appropriately in combination with a preceding Khmer letter.
The code point U+17D2, called ជើង ceung, meaning "foot", is used to indicate that a following consonant is to be written in subscript form. It is not normally visibly rendered as a character. U+17D3 was originally intended for use in writing lunar dates, but its use is now discouraged. The next seven characters are the punctuation marks listed hereinbefore; these are followed by the riel currency symbol, a rare sign corresponding to the Sanskrit avagraha, and a mostly obsolete version of the vĭréam diacritic. The U+17Ex series contains the Khmer numerals, and the U+17Fx series contains variants of the numerals used in divination lore.
The block with additional lunar date symbols is U+19E0-U+19FF:
The symbols at U+19E0 and U+19F0 represent the first and second "eighth month" in a lunar year containing a leap-month. The remaining symbols in this block denote the days of a lunar month: those in the U+19Ex series for waxing days, and those in the U+19Fx series for waning days.