Standard Chinese phonology


This article summarizes the phonology of Standard Chinese.
Standard Chinese is based on the Beijing dialect of Mandarin. Actual production varies widely among speakers, as they introduce elements of their native varieties. Elements of the sound system include not only the segments – the vowels and consonants – of the language but also the [|tones] that are applied to each syllable. Standard Chinese has four main tones, in addition to a neutral tone used on [|weak syllables].
This article represents phonetic values using the International Phonetic Alphabet, noting correspondences chiefly with the Pinyin system for transcription of Chinese text. For correspondences with other systems, see the relevant articles, such as Wade–Giles, Bopomofo, Gwoyeu Romatzyh, etc., and Romanization of Chinese.

Consonants

The following table shows the consonant sounds of Standard Chinese, transcribed using the International Phonetic Alphabet. The sounds shown in parentheses are sometimes not analyzed as separate phonemes; for more on these, see [|below]. Excluding these, and excluding the glides,, and , there are 19 consonant phonemes in the inventory.
Between pairs of stops or affricates, having the same place and manner of articulation, the primary distinction is not voiced vs. voiceless, but unaspirated vs. aspirated. The unaspirated stops and affricates may however become voiced in [|weak] syllables. Such pairs are represented in the pinyin system mostly using letters which in Romance languages generally denote voiceless/voiced pairs, or in Germanic languages often denotes fortis/lenis pairs. However, in pinyin they denote aspirated/unaspirated pairs, for example and are represented with p and b respectively.
More details about the individual consonant sounds are given in the following table.
Phoneme or soundApproximate descriptionPinyinWade–GilesNotes
Like English p but unaspirated – as in spybp
Like an aspirated English p, as in "pie"pp῾
Like English mmm
Like English fff
Like English t but unaspirated – as in stydtSee.
Like an aspirated English t, as in tiett῾See.
Like English nnnSee. Can occur in the onset and/or coda of a syllable.
Like English clear l, as in RP lay llSee.
Like English k, but unaspirated, as in scargk
Like an aspirated English k, as in carkk῾
Like ng in English singngngOccurs only in the syllable coda.

Varies between h in English hat and ch in Scottish loch.hh
Like English s, but usually with the tongue on the lower teeth.ssSee.
Like English ts in cats, without aspirationztsSee.
As [|above], but with aspirationcts῾See.
Similar to English sh, but with an alveolo-palatal pronunciationxhsSee.
Like an unaspirated English ch, but with an alveolo-palatal pronunciationjchSee.
As above, with aspirationqch῾See.
Similar to English sh, but with a retroflex articulationshshSee.
Similar to ch in English chat, but with a retroflex articulation and no aspirationzhchSee.
As above, but with aspirationchch῾See.

Similar to initial r in English, but with a retroflex articulation; varies to a sound similar to s in Asia' but with retroflex articulation for some speakers.rjFor pronunciation in syllable-final position, see.

All of the consonants may occur as the initial sound of a syllable, with the exception of . Excepting the rhotic coda, the only consonants that can appear in syllable coda position are and . Final, may be pronounced without complete oral closure, resulting in a syllable that in fact ends with a long nasalized vowel. See also, below.

Denti-alveolar and retroflex series

The consonants listed in the first table above as denti-alveolar are sometimes described as alveolars, and sometimes as dentals. The affricates and the fricative are particularly often described as dentals; these are generally pronounced with the tongue on the lower teeth.
The retroflex consonants are actually apical rather than subapical, and so are considered by some authors not to be truly retroflex; they may be more accurately called post-alveolar. Some speakers not from Beijing may lack the retroflexes in their native dialects, and may thus replace them with dentals.

Alveolo-palatal series

The alveolo-palatal consonants are standardly pronounced. Some speakers realize them as palatalized dentals,, ; this is claimed to be especially common among children and women, although officially it is regarded as substandard and as a feature specific to the Beijing dialect.
In phonological analysis, it is often assumed that, when not followed by one of the high front vowels or, the alveolo-palatals consist of a consonant followed by a palatal glide. That is, syllables represented in pinyin as beginning,,,,, are taken to begin,,,,,. The actual pronunciations are more like,,,,, . This is consistent with the general observation that medial glides are realized as palatalization and/or labialization of the preceding consonant.
On the above analysis, the alveolo-palatals are in complementary distribution with the dentals, with the velars, and with the retroflexes, as none of these can occur before high front vowels or palatal glides, whereas the alveolo-palatals occur only before high front vowels or palatal glides. Therefore, linguists often prefer to classify not as independent phonemes, but as allophones of one of the other three series. The existence of the above-mentioned dental variants inclines some to prefer to identify the alveolo-palatals with the dentals, but identification with any of the three series is possible. The Yale and Wade–Giles systems mostly treat the alveolo-palatals as allophones of the retroflexes; Tongyong Pinyin mostly treats them as allophones of the dentals; and Mainland Chinese Braille treats them as allophones of the velars. In standard pinyin and bopomofo, however, they are represented as a separate sequence.
The alveolo-palatals arose historically from a merger of the dentals and velars before high front vowels and glides. Previously, some instances of modern were instead, and others were. The change took place in the last two or three centuries at different times in different areas. This explains why some European transcriptions of Chinese names contain,,, where an alveolo-palatal might be expected in modern Chinese. Examples are Peking for Beijing, Chungking for Chongqing, Fukien for Fujian, Tientsin for Tianjin; Sinkiang for Xinjiang, and Sian for Xi'an. The complementary distribution with the retroflex series arose when syllables that had a retroflex consonant followed by a medial glide lost the medial glide.

Zero onset

A full syllable such as ai, in which the vowel is not preceded by any of the standard initial consonants or glides, is said to have a null initial or zero onset. This may be realized as a consonant sound: and are possibilities, as are and in some non-standard varieties. It has been suggested that such an onset be regarded as a special phoneme, or as an instance of the phoneme, although it can also be treated as no phoneme. By contrast, in the case of the particle a, which is a weak onsetless syllable, linking occurs with the previous syllable.
When a stressed vowel-initial Chinese syllable follows a consonant-final syllable, the consonant does not directly link with the vowel. Instead, the zero onset seems to intervene in between. becomes,. However, in connected speech none of these output forms is natural. Instead, when the words are spoken together the most natural pronunciation is rather similar to, in which there is no nasal closure or any version of the zero onset, and instead nasalization of the vowel occurs.

Glides

The glides,, and sound respectively like the y in English yes, the u in French huit, and the w in English we. The glides are commonly analyzed not as independent phonemes, but as consonantal allophones of the high vowels:. This is possible because there is no ambiguity in interpreting a sequence like yao/-iao as, and potentially problematic sequences such as do not occur.
The glides may occur in initial position in a syllable. This occurs with in the syllables written yu, yuan, yue, and yun in pinyin; with in other syllables written with initial y in pinyin ; and with in syllables written with initial w in pinyin. When a glide is followed by the vowel of which that glide is considered an allophone, the glide may be regarded as epenthetic, and not as a separate realization of the phoneme. Hence the syllable yi, pronounced, may be analyzed as consisting of the single phoneme, and similarly yin may be analyzed as, yu as, and wu as. It is also possible to hear both from the same speaker, even in the same conversation. For example, one may hear the number "one" as either or.
The glides can also occur in medial position, that is, after the initial consonant but before the main vowel. Here they are represented in pinyin as vowels: for example, the i in bie represents, and the u in duan represents. There are some restrictions on the possible consonant-glide combinations: does not occur after labials ; does not occur after retroflexes and velars ; and occurs medially only in lüe and nüe and after alveolo-palatals. A consonant-glide combination at the start of a syllable is articulated as a single sound – the glide is not in fact pronounced after the consonant, but is realized as palatalization, labialization, or both, of the consonant.
The glides and are also found as the final element in some syllables. These are commonly analyzed as diphthongs rather than vowel-glide sequences. For example, the syllable bai is assigned the underlying representation.

Syllabic consonants

The syllables written in pinyin as zi, ci, si, zhi, chi, shi, ri may be described as having a syllabic consonant instead of a vowel:
Alternatively, the nucleus may be described not as a syllabic consonant, but as a vowel:
Phonologically, these syllables may be analyzed as having their own vowel phoneme,. However, it is possible to merge this with the phoneme , since the two are in complementary distribution – provided that the is either left unmerged, or is merged with the velars rather than the retroflex or alveolar series.
Another approach is to regard the syllables assigned above to as having an empty nuclear slot, i.e. as not containing a vowel phoneme at all. This is more consistent with the syllabic consonant description of these syllables. When this is the case, sometimes the phoneme is described as shifting from voiceless to voiced, e.g. becoming.
Syllabic consonants may also arise as a result of weak syllable reduction; see below. Syllabic nasal consonants are also heard in certain interjections; pronunciations of such words include,,,,.

Vowels

Standard Chinese can be analyzed as having five vowel phonemes:. are high vowels, is mid whereas is low.
The precise realization of each vowel depends on its phonetic environment. In particular, the vowel has two broad allophones and . These sounds can be treated as a single underlying phoneme because they are in complementary distribution.

Allophones

Transcriptions of the vowels' allophones differ somewhat between sources. The following table provides one fairly typical set of descriptions.
More details about the individual vowel allophones are given in the following table.
PhonemeAllophoneDescription
Like English ee as in bee
Like English oo as in too
Like English oo in took
Like French u or German ü
Somewhat like English ey as in prey
Somewhat like British English awe or Scottish English oh
Pronounced as a sequence.
Schwa, like English a as in about.
Like English a as in palm
Like English e as in then

As a general rule, vowels in open syllables are pronounced long, while others are pronounced short. This does not apply to weak syllables, in which all vowels are short.

Effect of coda on central vowels

In Standard Chinese, the vowels and harmonize in backness with the resulting coda. For, it is fronted before and backed before. For, it is fronted before and backed before.

Effect of tone on mid vowel

Some native Mandarin speakers may pronounce,, and as,, and respectively in the [|first or second tone].

Alternative analyses

Some linguists prefer to reduce the number of vowel phonemes still further. Edwin G. Pulleyblank has proposed a system which includes underlying glides, but no vowels at all. More common are systems with two vowels; for example, in Mantaro Hashimoto's system, there are just two vowel nuclei,, which may be preceded by a glide, and may be followed by a coda . The various combinations of glide, vowel, and coda have different surface manifestations, as shown in the table below. Any of the three positions may be empty, i.e. occupied by a null meta-phoneme ; for example, the high vowels are analyzed as glide +, and the empty rime, i.e. the syllabic consonant or the vowel, is analyzed as having all three values null, e.g. si is analyzed as an underlying syllabic.
This system of phonemes is used in the bopomofo phonetic transcription system commonly used in Taiwan.

Rhotic coda

Standard Chinese features syllables that end with a rhotic coda. This feature, known in Chinese as erhua, is particularly characteristic of the Beijing dialect; many other dialects do not use it as much, and some not at all. It occurs in two cases:
  1. In a small number of independent words or morphemes pronounced or, written in pinyin as er, such as 二 èr "two", 耳 ěr "ear", and 儿 ér "son".
  2. In syllables in which the rhotic coda is added as a suffix to another morpheme. This suffix is represented by the character 儿 , to which meaning it is historically related, and in pinyin as r. The suffix combines with the final sound of the syllable, and regular but complex sound changes occur as a result.
The r final is pronounced with a relatively lax tongue, and has been described as a "retroflex vowel".
In dialects that do not make use of the rhotic coda, it may be omitted in pronunciation, or in some cases a different word may be selected: for example, Beijing 这儿 zhèr "here" and 那儿 nàr "there" may be replaced by the synonyms 这里 zhèli and 那里 nàli.

Syllables

Syllables in Standard Chinese have the maximal form CGVXT, traditionally analysed as an "initial" consonant C, a "final", and a tone T. The final consists of a "medial" G, which may be one of the glides, a vowel V, and a coda X, which may be one of. The vowel and coda may also be grouped as the "rhyme", sometimes spelled "rime". Any of C, G, and X may be absent.
Many of the possible combinations under the above scheme do not actually occur. There are only some 35 final combinations in actual syllables. In all, there are only about 400 different syllables when tone is ignored, and about 1300 when tone is included. This is a far smaller number of distinct syllables than in a language such as English. Since Chinese syllables usually constitute whole words, or at least morphemes, the smallness of the syllable inventory results in large numbers of homophones. However, in Standard Chinese, the average word length is actually almost exactly two syllables, practically eliminating most homophony issues even when tone is disregarded, especially when context is taken into account as well.
For a list of all Standard Chinese syllables see the pinyin table or zhuyin table.

Full and weak syllables

Syllables can be classified as full, and weak. Weak syllables are usually grammatical markers such as 了 le, or the second syllables of some compound words.
A full syllable carries one of the four main tones, and some degree of [|stress]. Weak syllables are unstressed, and have neutral tone. The contrast between full and weak syllables is distinctive; there are many minimal pairs such as 要事 yàoshì "important matter" and 钥匙 yàoshi "key", or 大意 dàyì "main idea" and dàyi "careless", the second word in each case having a weak second syllable. Some linguists consider this contrast to be primarily one of stress, while others regard it as one of tone. For further discussion, see under Neutral tone and [|Stress], below.

There is also a difference in syllable length. Full syllables can be analyzed as having two morae, the vowel being lengthened if there is no coda. Weak syllables, however, have a single mora, and are pronounced approximately 50% shorter than full syllables. Any weak syllable will usually be an instance of the same morpheme as some corresponding strong syllable; the weak form will often have a modified pronunciation, however, as detailed in the following section.

Syllable reduction

Apart from differences in tone, length, and stress, weak syllables are subject to certain other pronunciation changes.
The example of shénme → shém also involves assimilation, which is heard even in unreduced syllables in quick speech. A particular case of assimilation is that of the sentence-final exclamatory particle 啊 a, a weak syllable, which has different characters for its assimilated forms:
Preceding soundForm of particle Character
,, a
,,,, ya
wa
na
le combines to form la

Tones

Standard Chinese, like all varieties of Chinese, is tonal. This means that in addition to consonants and vowels, the pitch contour of a syllable is used to distinguish words from each other. Many non-native Chinese speakers have difficulties mastering the tones of each character, but correct tonal pronunciation is essential for intelligibility because of the vast number of words in the language that only differ by tone. Statistically, tones are as important as vowels in Standard Chinese.

Tonal categories

The following table shows the four main tones of Standard Chinese, together with the neutral tone.
Tone number12345
Descriptionhighrisinglow fallingneutral
Pinyin diacriticāáǎàa
Tone letter
-
IPA diacritic -
Tone nameyīn píngyáng píngshǎngqīng

The Chinese names of the main four tones are respectively 阴平 yīn píng, 阳平 yáng píng, 上 shǎng or shàng, and 去 . As descriptions, they apply rather to the predecessor Middle Chinese tones than to the modern tones; see below. The modern Standard Chinese tones are produced as follows:
  1. First tone, or high-level tone, is a steady high sound, produced as if it were being sung instead of spoken.
  2. Second tone, or rising tone, or more specifically high-rising, is a sound that rises from middle to high pitch. In a three-syllable expression, if the first syllable has first or second tone and the final syllable is not weak, then a second tone on the middle syllable may change to first tone.
  3. Third tone, low or dipping tone, descends from mid-low to low; between other tones it may simply be low. This tone is often demonstrated as having a rise in pitch after the low fall, however third tone syllables that include the rise are significantly longer than other syllables; when a third-tone syllable is not said in isolation, this rise is normally heard only if it appears at the end of a sentence or before a pause, and then usually only on stressed monosyllables. The third tone without the rise is sometimes called half third tone. Also, two consecutive third tones are avoided by changing the first to second tone; see, below. Unlike the other tones, third tone is pronounced with breathiness or murmur.
  4. Fourth tone, falling tone, or high-falling, features a sharp fall from high to low. When followed by another fourth-tone syllable, the fall may be only from high to mid-level.
  5. For the neutral tone or fifth tone, see the following section.
Most romanization systems, including pinyin, represent the tones as diacritics on the vowels, although some, like Wade–Giles, use superscript numbers at the end of each syllable. The tone marks and numbers are rarely used outside of language textbooks: in particular, they are usually absent in public signs, company logos, and so forth. Gwoyeu Romatzyh is a rare example of a system where tones are represented using normal letters of the alphabet.

Neutral tone

Also called fifth tone or zeroth tone, the neutral tone is sometimes thought of as a lack of tone. It is associated with weak syllables, which are generally somewhat shorter than tonic syllables. The pitch of a syllable with neutral tone is determined by the tone of the preceding syllable. The following table shows the pitch at which the neutral tone is pronounced in Standard Chinese after each of the four main tones. The situation differs by dialect, and in some regions, notably Taiwan, the neutral tone is relatively uncommon.
Tone of preceding syllablePitch of neutral tone
ExamplePinyinMeaningOverall
tone pattern
First 2玻璃bōliglass
Second 3伯伯bóbouncle
Third 4喇叭lǎbahorn
Fourth 1兔子tùzirabbit

Although the contrast between [|weak and full syllables] is often distinctive, the neutral tone is often not described as a full-fledged tone; some linguists feel that it results from a "spreading out" of the tone on the preceding syllable. This idea is appealing because without it, the neutral tone needs relatively complex tone sandhi rules to be made sense of; indeed, it would have to have four allotones, one for each of the four tones that could precede it. However, the "spreading" theory incompletely characterizes the neutral tone, especially in sequences where more than one neutral-tone syllable is found adjacent. In Modern Standard Mandarin as applied in A Dictionary of Current Chinese, the second syllable of words with a 'toneless final syllable variant' can be read with either a neutral tone or with the normal tone.

Relationship between Middle Chinese and modern tones

The four tones of Middle Chinese are not in one-to-one correspondence with the modern tones. The following table shows the development of the traditional tones as reflected in modern Standard Chinese. The development of each tone depends on the initial consonant of the syllable: whether it was a voiceless consonant, a voiced obstruent, or a sonorant.

Tone sandhi

Pronunciation also varies with context according to the rules of tone sandhi. Some such changes have been noted above in the descriptions of the individual tones; however, the most prominent phenomena of this kind relate to consecutive sequences of third-tone syllables. There are also a few common words that have variable tone.

Third tone sandhi

The principal rule of third tone sandhi is:
For example, lǎoshǔ 老鼠 comes to be pronounced láoshǔ. It has been investigated whether the rising contour on the prior syllable is in fact identical to a normal second tone; it has been concluded that it is, at least in terms of auditory perception.
When there are three or more third tones in a row, the situation becomes more complicated, since a third tone that precedes a second tone resulting from third tone sandhi may or may not be subject to sandhi itself. The results may depend on word boundaries, stress, and dialectal variations. General rules for three-syllable third-tone combinations can be formulated as follows:
  1. If the first word is two syllables and the second word is one syllable, then the first two syllables become second tones. For example, bǎoguǎn hǎo 保管好 takes the pronunciation báoguán hǎo.
  2. If the first word is one syllable, and the second word is two syllables, the second syllable becomes second tone, but the first syllable remains third tone. For example: lǎo bǎoguǎn 老保管 takes the pronunciation lǎo báoguǎn.
Some linguists have put forward more comprehensive systems of sandhi rules for multiple third tone sequences. For example, it is proposed that modifications are applied cyclically, initially within rhythmic feet, and that sandhi "need not apply between two cyclic branches".

Tones on special syllables

Special rules apply to the tones heard on the words , it may become neutral in tone.
For 一 :
  1. 一 is pronounced with second tone when followed by a fourth tone syllable.
  2. : Example: wikt:一定 becomes yídìng
  3. Before a first, second or third tone syllable, 一 is pronounced with fourth tone.
  4. : Examples:wikt:一天 becomes yìtiān, wikt:一年 becomes yìnián, wikt:一起 becomes yìqǐ.
  5. When final, or when it comes at the end of a multi-syllable word, 一 is pronounced with first tone. It also has first tone when used as an ordinal number, and when it is immediately followed by any digit.
  6. When 一 is used between two reduplicated words, it may become neutral in tone.
The numbers 七 and 八 sometimes display similar tonal behavior as 一 , but for most modern speakers they are always pronounced with first tone.

Stress, rhythm and intonation

within words is not felt strongly by Chinese speakers, although contrastive stress is perceived easily. One of the reasons for the weaker perception of stress in Chinese may be that variations in the fundamental frequency of speech, which in many other languages serve as a cue for stress, are used in Chinese primarily to realize the tones. Nonetheless, there is still a link between stress and pitch – the range of pitch variation has been observed to be greater on syllables that carry more stress.
As discussed above, weak syllables have neutral tone and are unstressed. Although this property can be contrastive, the contrast is interpreted by some as being primarily one of tone rather than stress.
Apart from this contrast between full and weak syllables, some linguists have also identified differences in levels of stress among full syllables. In some descriptions, a multi-syllable word or compound is said to have the strongest stress on the final syllable, and the next strongest generally on the first syllable. Others, however, reject this analysis, noting that the apparent final-syllable stress can be ascribed purely to natural lengthening of the final syllable of a phrase, and disappears when a word is pronounced within a sentence rather than in isolation. San Duanmu takes this view, and concludes that it is the first syllable that is most strongly stressed. He also notes a tendency for Chinese to produce trochees – feet consisting of a stressed syllable followed by one unstressed syllables. On this view, if the effect of "final-lengthening" is factored out:
The positions described here as lacking stress are the positions in which weak syllables may occur, although full syllables frequently occur in these positions also.
There is a strong tendency for Chinese prose to employ four-syllable 'prosodic words' consisting of alternating stressed and unstressed syllables which are further subdivided into two trochaic feet. This structure, sometimes known as a 'four-character template', is particularly prevalent in chengyu, which are classical idioms that are usually four characters in length. Statistical analysis of chengyu and other idiomatic phrases in vernacular texts indicates that the four-syllable prosodic word had become an important metrical consideration by the Wei/Jin dynasties.
This preference for trochaic feet may even result in polysyllabic words in which the foot and word boundaries do not align. For example, 'Czechoslovakia' is stressed as 克//伐克 and 'Yugoslavia' is stressed as /拉夫, even though the morpheme boundaries are 捷克/斯洛伐克 'Czech/slovak' and 南/斯拉夫 'South/slav', respectively. The preferred stress pattern also has a complex effect on tone sandhi for the various Chinese dialects.
This preference for a trochaic metrical structure is also cited as a reason for certain phenomena of word order variation within complex compounds, and for the strong tendency to use disyllabic words rather than monosyllables in certain positions. Many Chinese monosyllables have alternative disyllabic forms with virtually identical meaning – see.
Another function of voice pitch is to carry intonation. Chinese makes frequent use of particles to express certain meanings such as doubt, query, command, etc., reducing the need to use intonation. However, intonation is still present in Chinese, although there are varying analyses of how it interacts with the lexical tones. Some linguists describe an additional intonation rise or fall at the end of the last syllable of an utterance, while others have found that the pitch of the entire utterance is raised or lowered according to the desired intonational meaning.

Citations

Works cited

*