Alphabetical order
Alphabetical order is a system whereby character strings are placed in order based on the position of the characters in the conventional ordering of an alphabet. It is one of the methods of collation. In mathematics, a lexicographical order is the generalization of the alphabetical order to other data types, such as sequences of digits or numbers.
When applied to strings or sequences that, beside alphabetical characters, may contain also digits, numbers or more elaborate types of elements, the alphabetical order is generally called a lexicographical order.
To determine which of two strings of characters comes first when arranging in alphabetical order, their first letters are compared. If they differ, then the string whose first letter comes earlier in the alphabet comes before the other string. If the first letters are the same, then the second letters are compared, and so on. If a position is reached where one string has no more letters to compare while the other does, then the first string is deemed to come first in alphabetical order.
Capital letters are generally considered to be identical to their corresponding lower case letters for the purposes of alphabetical ordering, though conventions may be adopted to handle situations where two strings differ only in capitalization. Various conventions also exist for the handling of strings containing spaces, modified letters, and non-letter characters such as marks of punctuation.
The result of placing a set of words or strings in alphabetical order is that all the strings beginning with the same letter are grouped together; and within that grouping all words beginning with the same two-letter sequence are grouped together; and so on. The system thus tends to maximize the number of common initial letters between adjacent words.
History
Alphabetical order was first used in the 1st millennium BCE by Northwest Semitic scribes using the Abjad system. However, a range of other methods of classifying and ordering material, including geographical, chronological, hierarchical and by category, were preferred over alphabetical order for centuries.The Bible is dated to the 6th–7th centuries BCE. In the Book of Jeremiah, the prophet utilizes an Atbash substitution cipher, based on alphabetical order. Similarly, biblical authors used acrostics based on the Hebrew alphabet.
The first effective use of alphabetical order as a cataloging device among scholars may have been in ancient Alexandria, in the Great Library of Alexandria, which was founded around 300 BCE. The poet and scholar Callimachus, who worked there, is thought to have created the world's first library catalog, known as the Pinakes, with scrolls shelved in alphabetical order of the first letter of authors' names.
In the 1st century BC, Roman writer Varro compiled alphabetic lists of authors and titles. In the 2nd century CE, Sextus Pompeius Festus wrote an encyclopedic epitome of the works of Verrius Flaccus, De verborum significatu, with entries in alphabetic order. In the 3rd century CE, Harpocration wrote a Homeric lexicon alphabetized by all letters. In the 10th century, the author of the Suda used alphabetic order with phonetic variations.
Alphabetical order as an aid to consultation started to enter the mainstream of Western European intellectual life in the second half of the 12th century, when alphabetical tools were developed to help preachers analyse biblical vocabulary. This led to the compilation of alphabetical concordances of the Bible by the Dominican friars in Paris in the 13th century, under Hugh of Saint Cher. Older reference works such as St. Jerome's Interpretations of Hebrew Names were alphabetized for ease of consultation. The use of alphabetical order was initially resisted by scholars, who expected their students to master their area of study according to its own rational structures; its success was driven by such tools as Robert Kilwardby's index to the works of St. Augustine, which helped readers access the full original text instead of depending on the compilations of excerpts which had become prominent in 12th century scholasticism. The adoption of alphabetical order was part of the transition from the primacy of memory to that of written works. The idea of ordering information by the order of the alphabet also met resistance from the compilers of encyclopaedias in 12th and 13th centuries, who were all devout churchmen. They preferred to organise their material theologically – in the order of God's creation, starting with Deus.
In 1604 Robert Cawdrey had to explain in Table Alphabeticall, the first monolingual English dictionary, "Nowe if the word, which thou art desirous to finde, begin with then looke in the beginning of this Table, but if with looke towards the end". Although as late as 1803 Samuel Taylor Coleridge condemned encyclopedias with "an arrangement determined by the accident of initial letters", many lists are today based on this principle.
Arrangement in alphabetical order can be seen as a force for democratising access to information, as it does not require extensive prior knowledge to find what was needed.
Ordering in the Latin script
Basic order and example
The standard order of the modern ISO basic Latin alphabet is:An example of straightforward alphabetical ordering follows:
- As; Aster; Astrolabe; Astronomy; Astrophysics; At; Ataman; Attack; Baa
- Barnacle; Be; Been; Benefit; Bent
Treatment of multiword strings
When some of the strings being ordered consist of more than one word, i.e., they contain spaces or other separators such as hyphens, then two basic approaches may be taken. In the first approach, all strings are ordered initially according to their first word, as in the sequence:- Oak; Oak Hill; Oak Ridge; Oakley Park; Oakley River
- :where all strings beginning with the separate word Oak precede all those beginning Oakley, because Oak precedes Oakley in alphabetical order.
- Oak; Oak Hill; Oakley Park; Oakley River; Oak Ridge
- :where Oak Ridge now comes after the Oakley strings, as it would if it were written "Oakridge".
Special cases
Modified letters
In French, modified letters are treated the same as the base letter for alphabetical ordering purposes. For example, rôle comes between rock and rose, as if it were written role. However languages that use such letters systematically generally have their own ordering rules. See [|Language-specific conventions] below.Ordering by surname
In most cultures where family names are written after given names, it is still desired to sort lists of names by family name first. In this case, names need to be reordered to be sorted properly. For example, Juan Hernandes and Brian O'Leary should be sorted as "Hernandes, Juan" and "O'Leary, Brian" even if they are not written this way. Capturing this rule in a computer collation algorithm is difficult, and simple attempts will necessarily fail. For example, unless the algorithm has at its disposal an extensive list of family names, there is no way to decide if "Gillian Lucille van der Waal" is "van der Waal, Gillian Lucille", "Waal, Gillian Lucille van der", or even "Lucille van der Waal, Gillian".Ordering by surname is frequently encountered in academic contexts. Within a single multi-author paper, ordering the authors alphabetically by surname, rather than by other methods such as reverse seniority or subjective degree of contribution to the paper, is seen as a way of "acknowledg similar contributions" or "avoid disharmony in collaborating groups". The practice in certain fields of ordering citations in bibliographies by the surnames of their authors has been found to create bias in favour of authors with surnames which appear earlier in the alphabet, while this effect does not appear in fields in which bibliographies are ordered chronologically.
''The'' and other common words
If a phrase begins with a very common word, that word is sometimes ignored or moved to the end of the phrase, but this is not always the case. For example, the book "The Shining" might be treated as "Shining", or "Shining, The" and therefore before the book title "Summer of Sam", although it may also be treated as simply "The Shining" and after "Summer of Sam". Similarly, "A Wrinkle in Time" might be treated as "Wrinkle in Time", "Wrinkle in Time, A", or "A Wrinkle in Time". All three alphabetization methods are fairly easy to create by algorithm, but many programs rely instead on simple lexicographic ordering. Articles generally are ignored when alphabetizing.''Mac'' prefixes
The prefixes M' and Mc in Irish and Scottish surnames are abbreviations for Mac, and are sometimes alphabetized as if the spelling is Mac in full. Thus McKinley might be listed before Mackintosh. Since the advent of computer-sorted lists, this type of alphabetization is less frequently encountered, though it is still used in British telephone directories.Ligatures
which are not considered distinct letters, such as Æ and Œ in English, are typically collated as if the letters were separate—"æther" and "aether" would be ordered the same relative to all other words. This is true even when the ligature is not purely stylistic, such as in loanwords and brand names.Special rules may need to be adopted to sort strings which vary only by whether two letters are joined by a ligature.
Treatment of numerals
When some of the strings contain numerals, various approaches are possible. Sometimes such characters are treated as if they came before or after all the letters of the alphabet. Another method is for numbers to be sorted alphabetically as they would be spelled: for example 1776 would be sorted as if spelled out "seventeen seventy-six", and 24 heures du Mans as if spelled "vingt-quatre...". When numerals or other symbols are used as special graphical forms of letters, as 1337 for leet or the movie Seven, they may be sorted as if they were those letters. Natural sort order orders strings alphabetically, except that multi-digit numbers are treated as a single character and ordered by the value of the number encoded by the digits.Language-specific conventions
Languages which use an extended Latin alphabet generally have their own conventions for treatment of the extra letters. Also in some languages certain digraphs are treated as single letters for collation purposes. For example, the 29-letter alphabet of Spanish treats ñ as a basic letter following n, and formerly treated the digraphs ch and ll as basic letters following c and l, respectively. Ch and ll are still considered letters, but are now alphabetized as two-letter combinations. On the other hand, the digraph rr follows rqu as expected, and did so even before the 1994 alphabetization rule.In a few cases, such as Kiowa, the alphabet has been completely reordered.
Alphabetization rules applied in various languages are listed below.
- In Azerbaijani, there are eight additional letters to the standard Latin alphabet. Five of them are vowels: i, ı, ö, ü, ə and three are consonants: ç, ş, ğ. The alphabet is the same as the Turkish alphabet, with the same sounds written with the same letters, except for three additional letters: q, x and ə for sounds that do not exist in Turkish. Although all the "Turkish letters" are collated in their "normal" alphabetical order like in Turkish, the three extra letters are collated arbitrarily after letters whose sounds approach theirs. So, q is collated just after k, x is collated just after h and ə is collated just after e.
- In Breton, there is no "c", "q", "x" but there are the digraphs "ch" and "c'h", which are collated between "b" and "d". For example: « buzhugenn, chug, c'hoar, daeraouenn ».
- In Bosnian, Croatian and Serbian and other related South Slavic languages, the five accented characters and three conjoined characters are sorted after the originals:..., C, Č, Ć, D, DŽ, Đ, E,..., L, LJ, M, N, NJ, O,..., S, Š, T,..., Z, Ž.
- In Czech and Slovak, accented vowels have secondary collating weight – compared to other letters, they are treated as their unaccented forms, but then they are sorted after the unaccented letters. Accented consonants have primary collating weight and are collocated immediately after their unaccented counterparts, with exception of Ď, Ň and Ť, which have again secondary weight. CH is considered to be a separate letter and goes between H and I. In Slovak, DZ and DŽ are also considered separate letters and are positioned between Ď and E.
- In the Danish and Norwegian alphabets, the same extra vowels as in Swedish are also present but in a different order and with different glyphs. Also, "Aa" collates as an equivalent to "Å". The Danish alphabet has traditionally seen "W" as a variant of "V", but today "W" is considered a separate letter.
- In Dutch the combination IJ was formerly to be collated as Y, but is currently mostly collated as 2 letters. Exceptions are phone directories; IJ is always collated as Y here because in many Dutch family names Y is used where modern spelling would require IJ. Note that a word starting with ij that is written with a capital I is also written with a capital J, for example, the town IJmuiden, the river IJssel and the country IJsland.
- In Esperanto, consonants with circumflex accents, as well as ŭ, are counted as separate letters and collated separately.
- In Estonian õ, ä, ö and ü are considered separate letters and collate after w. Letters š, z and ž appear in loanwords and foreign proper names only and follow the letter s in the Estonian alphabet, which otherwise does not differ from the basic Latin alphabet.
- The Faroese alphabet also has some of the Danish, Norwegian, and Swedish extra letters, namely Æ and Ø. Furthermore, the Faroese alphabet uses the Icelandic eth, which follows the D. Five of the six vowels A, I, O, U and Y can get accents and are after that considered separate letters. The consonants C, Q, X, W and Z are not found. Therefore, the first five letters are A, Á, B, D and Ð, and the last five are V, Y, Ý, Æ, Ø
- In Filipino and other Philippine languages, the letter Ng is treated as a separate letter. It is pronounced as in sing, ping-pong, etc. By itself, it is pronounced nang, but in general Filipino orthography, it is spelled as if it were two separate letters. Also, letter derivatives immediately follow the base letter. Filipino also is written with diacritics, but their use is very rare.
- The Finnish alphabet and collating rules are the same as those of Swedish.
- For French, the last accent in a given word determines the order. For example, in French, the following four words would be sorted this way: cote < côte < coté < côté.
- In German letters with umlaut are treated generally just like their non-umlauted versions; ß is always sorted as ss. This makes the alphabetic order Arg, Ärgerlich, Arm, Assistant, Aßlar, Assoziation. For phone directories and similar lists of names, the umlauts are to be collated like the letter combinations "ae", "oe", "ue" because a number of German surnames appear both with umlaut and in the non-umlauted form with "e". This makes the alphabetic order Udet, Übelacker, Uell, Ülle, Ueve, Üxküll, Uffenbach.
- The Hungarian vowels have accents, umlauts, and double accents, while consonants are written with single, double or triple characters. In collating, accented vowels are equivalent with their non-accented counterparts and double and triple characters follow their single originals. Hungarian alphabetic order is: A=Á, B, C, Cs, D, Dz, Dzs, E=É, F, G, Gy, H, I=Í, J, K, L, Ly, M, N, Ny, O=Ó, Ö=Ő, P, Q, R, S, Sz, T, Ty, U=Ú, Ü=Ű, V, W, X, Y, Z, Zs. It means that e.g. nádcukor should precede nádcsomó, since c precedes cs in the collation. Difference in vowel length should only be taken into consideration if the two words are otherwise identical. Spaces and hyphens within phrases are ignored in collation. Ch also occurs as a digraph in certain words but it is not considered as a grapheme on its own right in terms of collation.
- :A particular feature of Hungarian collation is that contracted forms of double di- and trigraphs should be collated as if they were written in full. For example, kaszinó should precede kassza, because the fourth "character" of the word kassza is considered a second sz, which does follow i.
- In Icelandic, Þ is added, and D is followed by Ð. Each vowel is followed by its correspondent with acute: Á, É, Í, Ó, Ú, Ý. There is no Z, so the alphabet ends:... X, Y, Ý, Þ, Æ, Ö.
- * Both letters were also used by Anglo-Saxon scribes who also used the Runic letter Wynn to represent /w/.
- * Þ is also a Runic letter.
- * Ð is the letter D with an added stroke.
- Kiowa is ordered on phonetic principles, like the Brahmic scripts, rather than on the historical Latin order. Vowels come first, then stop consonants ordered from the front to the back of the mouth, and from negative to positive voice-onset time, then the affricates, fricatives, liquids, and nasals:
- In Lithuanian, specifically Lithuanian letters go after their Latin originals. Another change is that Y comes just before J:... G, H, I, Į, Y, J, K...
- In Polish, specifically Polish letters derived from the Latin alphabet are collated after their originals: A, Ą, B, C, Ć, D, E, Ę,..., L, Ł, M, N, Ń, O, Ó, P,..., S, Ś, T,..., Z, Ź, Ż. The digraphs for collation purposes are treated as if they were two separate letters.
- In Portuguese, the collating order is just like in English: A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, U, V, W, X, Y, Z. Digraphs and letters with diacritics are not included in the alphabet.
- In Romanian, special characters derived from the Latin alphabet are collated after their originals: A, Ă, Â,..., I, Î,..., S, Ș, T, Ț,..., Z.
- Spanish treated "CH" and "LL" as single letters, giving an ordering of ,, and ,,. This is not true any more since in 1994 the RAE adopted the more conventional usage, and now LL is collated between LK and LM, and CH between CG and CI. The six characters with diacritics Á, É, Í, Ó, Ú, Ü are treated as the original letters A, E, I, O, U, for example: ,,,,. The only Spanish-specific collating question is Ñ as a different letter collated after N.
- In the Swedish alphabet, there are three extra vowels placed at its end, similar to the Danish and Norwegian alphabet, but with different glyphs and a different collating order. The letter "W" has been treated as a variant of "V", but in the 13th edition of Svenska Akademiens ordlista "W" was considered a separate letter.
- In the Turkish alphabet there are 6 additional letters: ç, ğ, ı, ö, ş, and ü. They are collated with ç after c, ğ after g, ı before i, ö after o, ş after s, and ü after u. Originally, when the alphabet was introduced in 1928, ı was collated after i, but the order was changed later so that letters having shapes containing dots, cedilles or other adorning marks always follow the letters with corresponding bare shapes. Note that in Turkish orthography the letter I is the majuscule of dotless ı, whereas İ is the majuscule of dotted i.
- In many Turkic languages, there used to be the letter Gha, which came between G and H. It is now in disuse.
- In Vietnamese, there are 7 additional letters: ă, â, đ, ê, ô, ơ, ư while f, j, w, z are absent, even though they are still in some use. "f" is replaced by the combination "ph". The same as for "w" is "qu".
- In Volapük ä, ö and ü are counted as separate letters and collated separately while q and w are absent.
- In Welsh the digraphs CH, DD, FF, NG, LL, PH, RH, and TH are treated as single letters, and each is listed after the first character of the pair, producing the order A, B, C, CH, D, DD, E, F, FF, G, NG, H, and so on. It can sometimes happen, however, that word compounding results in the juxtaposition of two letters which do not form a digraph. An example is the word LLONGYFARCH. This results in such an ordering as, for example, LAWR, LWCUS, LLONG, LLOM, LLONGYFARCH. The letter combination R+H may similarly arise by juxtaposition in compounds, although this tends not to produce any pairs in which misidentification could affect the ordering. For the other potentially confusing letter combinations that may occur – namely, D+D and L+L – a hyphen is used in the spelling.
Automation
Similar orderings
The principle behind alphabetical ordering can still be applied in languages that do not strictly speaking use an alphabet – for example, they may be written using a syllabary or abugida – provided the symbols used have an established ordering.For logographic writing systems, such as Chinese hanzi or Japanese kanji, the method of radical-and-stroke sorting is frequently used as a way of defining an ordering on the symbols. Japanese sometimes uses pronunciation order, most commonly with the Gojūon order but sometimes with the older Iroha ordering.
In mathematics, lexicographical order is a means of ordering sequences in a manner analogous to that used to produce alphabetical order.
Some computer applications use a version of alphabetical order that can be achieved using a very simple algorithm, based purely on the ASCII or Unicode codes for characters. This may have non-standard effects such as placing all capital letters before lower-case ones. See ASCIIbetical order.
A rhyming dictionary is based on sorting words in alphabetical order starting from the last to the first letter of the word.