Code page
In computing, a code page is a character encoding and as such it is a specific association of a set of printable characters and control characters with unique numbers. Typically each number represents the binary value in a single byte.
The term "code page" originated from IBM's EBCDIC-based mainframe systems, but Microsoft, SAP, and Oracle Corporation are among the few vendors which use this term. The majority of vendors identify their own character sets by a name. In the case when there is a plethora of character sets, identifying character sets through a number is a convenient way to distinguish them. Originally, the code page numbers referred to the page numbers in the IBM standard character set manual, a condition which has not held for a long time. Vendors that use a code page system allocate their own code page number to a character encoding, even if it is better known by another name; for example, UTF-8 has been assigned page numbers 1208 at IBM, 65001 at Microsoft, and 4110 at SAP.
Hewlett-Packard uses a similar concept in its HP-UX operating system and its Printer Command Language protocol for printers. The terminology, however, is different: What others call a character set, HP calls a symbol set, and what IBM or Microsoft call a code page, HP calls a symbol set code. HP developed a series of symbol sets, each with an associated symbol set code, to encode both its own character sets and other vendors’ character sets.
The multitude of character sets leads many vendors to recommend Unicode.
The code page numbering system
IBM introduced the concept of systematically assigning a small, but globally unique, 16 bit number to each character encoding that a computer system or collection of computer systems might encounter. The IBM origin of the numbering scheme is reflected in the fact that the smallest numbers are assigned to variations of IBM's EBCDIC encoding and slightly larger numbers refer to variations of IBM's extended ASCII encoding as used in its PC hardware.With the release of PC DOS version 3.3 IBM introduced the code page numbering system to regular PC users, as the code page numbers were used in new commands to allow the character encoding used by all parts of the OS to be set in a systematic way.
After IBM and Microsoft ceased to cooperate in the 1990s, the two companies have maintained the list of assigned code page numbers independently from each other, resulting in some conflicting assignments. At least one third-party vendor also has its own different list of numeric assignments. IBM's current assignments are listed in their CCSID repository, while Microsoft's assignments are documented within the MSDN. Additionally, a list of the names and approximate IANA abbreviations for the installed code pages on any given Windows machine can be found in the Registry on that machine.
Most well-known code pages, excluding those for the CJK languages and Vietnamese, fit all their code-points into eight bits and do not involve anything more than mapping each code-point to a single character; furthermore, techniques such as combining characters, complex scripts, etc., are not involved.
The text mode of standard PC graphics hardware is built around using an 8-bit code page, though it is possible to use two at once with some color depth sacrifice, and up to eight may be stored in the display adaptor for easy switching. There was a selection of third-party code page fonts that could be loaded into such hardware. However, it is now commonplace for operating system vendors to provide their own character encoding and rendering systems that run in a graphics mode and bypass this hardware limitation entirely. However the system of referring to character encodings by a code page number remains applicable, as an efficient alternative to string identifiers such as those specified by the IETF and IANA for use in various protocols such as e-mail and web pages.
Relationship to ASCII
The majority of code pages in current use are supersets of ASCII, a 7-bit code representing 128 control codes and printable characters. In the distant past, 8-bit implementations of the ASCII code set the top bit to zero or used it as a parity bit in network data transmissions. When the top bit was made available for representing character data, a total of 256 characters and control codes could be represented. Most vendors used this extended range to encode characters used by various languages and graphical elements that allowed the imitation of primitive graphics on text-only output devices. No formal standard existed for these "extended ASCII character sets" and vendors referred to the variants as code pages, as IBM had always done for variants of EBCDIC encodings.Relationship to Unicode
Unicode is an effort to include all characters from all currently and historically used human languages into single character enumeration, removing the need to distinguish between different code pages when handling digitally stored text. Unicode tries to retain backwards compatibility with many legacy code pages, copying some code pages 1:1 in the design process. An explicit design goal of Unicode was to allow round-trip conversion between all common legacy code pages, although this goal has not always been achieved.Some vendors, namely IBM and Microsoft, have anachronistically assigned code page numbers to Unicode encodings. This convention allows code page numbers to be used as metadata to identify the correct decoding algorithm when encountering binary stored data.
IBM code pages
EBCDIC-based code pages
These code pages are used by IBM in its EBCDIC character sets for mainframe computers.- 1 – USA WP, Original
- 2 – USA
- 3 – USA Accounting, Version A
- 4 – USA
- 5 – USA
- 6 – Latin America
- 7 – Germany F.R. / Austria
- 8 – Germany F.R.
- 9 – France, Belgium
- 10 – Canada
- 11 – Canada
- 12 – Italy
- 13 – Netherlands
- 14 –
- 15 – Switzerland
- 16 – Switzerland
- 17 – Switzerland
- 18 – Sweden / Finland
- 19 – Sweden / Finland WP, version 2
- 20 – Denmark/Norway
- 21 – Brazil
- 22 – Portugal
- 23 – United Kingdom
- 24 – United Kingdom
- 25 – Japan
- 26 – Japan
- 27 – Greece
- 28 –
- 29 – Iceland
- 30 – Turkey
- 31 – South Africa
- 32 – Czechoslovakia
- 33 – Czechoslovakia
- 34 – Czechoslovakia
- 35 – Romania
- 36 – Romania
- 37 – USA/Canada - CECP
- 37-2 – The real 3279 APL codepage, as used by C/370. This is very close to 1047, except for caret and not-sign inverted. It is not officially recognized by IBM, even though SHARE has pointed out its existence.
- 38 – USA ASCII
- 39 – United Kingdom / Israel
- 40 – United Kingdom
- 251 – China
- 252 – Poland
- 254 – Hungary
- 256 – International #1
- 257 – International #2
- 258 – International #3
- 259 – Symbols, Set 7
- 260 – Canadian French - 116
- 264 – Print Train & Text processing extended
- 273 – Germany F.R./Austria - CECP
- 274 – Old Belgium Code Page
- 275 – Brazil - CECP
- 276 – Canada - 94
- 277 – Denmark, Norway - CECP
- 278 – Finland, Sweden - CECP
- 279 – French - 94
- 280 – Italy - CECP
- 281 – Japan - CECP
- 282 – Portugal - CECP
- 283 – Spain - 190
- 284 – Spain/Latin America - CECP
- 285 – United Kingdom - CECP
- 286 – Austria / Germany F.R. Alternate
- 287 – Denmark / Norway Alternate
- 288 – Finland / Sweden Alternate
- 289 – Spain Alternate
- 290 – Japanese Extended
- 293 – APL
- 297 – France
- 298 – Japan
- 300 – Japan DBCS
- 310 – Graphic Escape APL/TN
- 320 – Hungary
- 321 – Yugoslavia
- 322 – Turkey
- 330 – International #4
- 351 – GDDM default
- 352 – Printing and publishing option
- 353 – BCDIC-A
- 355 – PTTC/BCD standard option
- 357 – PTTC/BCD H option
- 358 – PTTC/BCD Correspondence option
- 359 – PTTC/BCD Monocase option
- 360 – PTTC/BCD Duocase option
- 361 – EBCDIC Publishing International
- 363 – Symbols, set 8
- 382 – EBCDIC Publishing Austria, Germany F.R. Alternate
- 383 – EBCDIC Publishing Belgium
- 384 – EBCDIC Publishing Brazil
- 385 – EBCDIC Publishing Canada
- 386 – EBCDIC Publishing Denmark, Norway
- 387 – EBCDIC Publishing Finland, Sweden
- 388 – EBCDIC Publishing France
- 389 – EBCDIC Publishing Italy
- 390 – EBCDIC Publishing Japan
- 391 – EBCDIC Publishing Portugal
- 392 – EBCDIC Publishing Spain, Philippines
- 393 – EBCDIC Publishing Latin America
- 394 – EBCDIC Publishing China, UK, Ireland
- 395 – EBCDIC Publishing Australia, New Zealand, USA, Canada
- 410 – Cyrillic
- 420 – Arabic
- 421 – Maghreb/French
- 423 – Greek
- 424 – Hebrew
- 425 – Arabic / Latin for OS/390 Open Edition
- 435 – Teletext Isomorphic
- 500 – International #5
- 803 – Hebrew Character Set A
- 829 – Host Math Symbols- Publishing
- 833 – Korean Extended
- 834 – Korean Hangul
- 835 – Traditional Chinese DBCS
- 836 – Simplified Chinese Extended
- 837 – Simplified Chinese DBCS
- 838 – Thai with Low Marks & Accented Characters
- 839 – Thai DBCS
- 870 – Latin 2
- 871 – Iceland
- 875 – Greek
- 880 – Cyrillic
- 881 – United States - 5080 Graphics System
- 882 – United Kingdom - 5080 Graphics System
- 883 – Sweden - 5080 Graphics System
- 884 – Germany - 5080 Graphics System
- 885 – France - 5080 Graphics System
- 886 – Italy - 5080 Graphics System
- 887 – Japan - 5080 Graphics System
- 888 – France AZERTY - 5080 Graphics System
- 889 – Thailand
- 890 – Yugoslavia
- 892 – EBCDIC, OCR A
- 893 – EBCDIC, OCR B
- 905 – Latin 3
- 918 – Urdu Bilingual
- 924 – Latin 9
- 930 – Japan MIX
- 931 – Japan MIX
- 933 – Korea MIX
- 935 – Simplified Chinese MIX
- 937 – Traditional Chinese MIX
- 939 – Japan MIX
- 1001 – MICR
- 1002 – EBCDIC DCF Release 2 Compatibility
- 1003 – EBCDIC DCF, US Text subset
- 1005 – EBCDIC Isomorphic Text Communication
- 1007 – EBCDIC Arabic
- 1024 – EBCDIC T.61
- 1025 – Cyrillic, Multilingual
- 1026 – EBCDIC Turkey
- 1027 – Japanese Extended
- 1028 – EBCDIC Publishing Hebrew
- 1030 – Japanese Extended
- 1031 – Japanese Extended
- 1032 – MICR, E13-B Combined
- 1033 – MICR, CMC-7 Combined
- 1037 – Korea - 5080/6090 Graphics System
- 1039 – GML Compatibility
- 1047 – Latin 1/Open Systems
- 1068 – DCF Compatibility
- 1069 – Latin 4
- 1070 – USA / Canada Version 0
- 1071 – Germany F.R. / Austria
- 1073 – Brazil
- 1074 – Denmark, Norway
- 1075 – Finland, Sweden
- 1076 – Italy
- 1077 – Japan
- 1078 – Portugal
- 1079 – Spain / Latin America Version 0
- 1080 – United Kingdom
- 1081 – France Version 0
- 1082 – Israel
- 1083 – Israel
- 1084 – International #5 Version 0
- 1085 – Iceland
- 1087 – Symbol Set
- 1091 – Modified Symbols, Set 7
- 1093 – IBM Logo
- 1097 – Farsi Bilingual
- 1110 – Latin 2
- 1112 – Baltic Multilingual
- 1113 – Latin 6
- 1122 – Estonia
- 1123 – Cyrillic, Ukraine
- 1130 – Vietnamese
- 1132 – Lao EBCDIC
- 1136 – Hitachi Katakana
- 1137 – Devanagari EBCDIC
- 1140 – USA, Canada, etc. ECECP
- 1141 – Austria, Germany ECECP
- 1142 – Denmark, Norway ECECP
- 1143 – Finland, Sweden ECECP
- 1144 – Italy ECECP
- 1145 – Spain, Latin America ECECP
- 1146 – UK ECECP
- 1147 – France ECECP with euro
- 1148 – International ECECP with euro
- 1149 – Icelandic ECECP with euro
- 1150 – Korean Extended with box characters
- 1151 – Simplified Chinese Extended with box characters
- 1152 – Traditional Chinese Extended with box characters
- 1153 – Latin 2 Multilingual with euro
- 1154 – Cyrillic, Multilingual with euro
- 1155 – Turkey with euro
- 1156 – Baltic Multi with euro
- 1157 – Estonia with euro
- 1158 – Cyrillic, Ukraine with euro
- 1159 – T-Chinese EBCDIC
- 1160 – Thai with Low Marks & Accented Characters with euro
- 1164 – Vietnamese with euro
- 1165 – Latin 2/Open Systems
- 1166 – Cyrillic Kazakh
- 1278 – EBCDIC Adobe Standard Encoding
- 1279 – Hitachi Japanese Katakana Host
- 1303 – EBCDIC Bar Code
- 1364 – Korea MIX
- 1371 – Traditional Chinese MIX
- 1376 – Traditional Chinese DBCS Host extension for HKSCS
- 1377 – Mixed Host HKSCS Growing
- 1388 – Simplified Chinese MIX
- 1390 – Simplified Chinese MIX Japan MIX
- 1399 – Japan MIX
DOS code pages
- 301 – IBM-PC Japan DBCS
- 437 – Original IBM PC hardware code page
- 720 – Arabic
- 737 – Greek
- 775 – Latin-7
- 808 – Russian with euro
- 848 – Ukrainian with euro
- 849 – Belorussian with euro
- 850 – Latin-1
- 851 – Greek
- 852 – Latin-2
- 853 – Latin-3
- 855 – Cyrillic
- 856 – Hebrew
- 857 – Latin-5
- 858 – Latin-1 with euro symbol
- 859 – Latin-9
- 860 – Portuguese
- 861 – Icelandic
- 862 – Hebrew
- 863 – Canadian French
- 864 – Arabic
- 865 – Danish/Norwegian
- 866 – Belarusian, Russian, Ukrainian
- 867 – Hebrew + euro
- 868 – Urdu
- 869 – Greek
- 872 – Cyrillic with euro
- 874 – Thai with Low Tone Marks & Ancient Chars
- 876 – OCR A
- 877 – OCR B
- 878 – KOI8-R
- 891 – Korean PC SBCS
- 898 – IBM-PC WP Multilingual
- 899 – IBM-PC Symbol
- 903 – Simplified Chinese PC SBCS
- 904 – Traditional Chinese PC SBCS
- 906 – International Set #5 3812/3820
- 907 – ASCII APL
- 909 – IBM-PC APL2 Extended
- 910 – IBM-PC APL2
- 911 – IBM-PC Japan #1
- 926 – Korean PC DBCS
- 927 – Traditional Chinese PC DBCS
- 928 – Simplified Chinese PC DBCS
- 929 – Thai PC DBCS
- 932 – IBM-PC Japan MIX
- 934 – IBM-PC Korea MIX
- 936 – IBM-PC Simplified Chinese MIX
- 938 – IBM-PC Traditional Chinese MIX
- 942 – IBM-PC Japan MIX
- 943 – IBM-PC Japan OPEN
- 944 – IBM-PC Korea MIX
- 946 – IBM-PC Simplified Chinese
- 948 – IBM-PC Traditional Chinese
- 949 – Korean
- 951 – Korean DBCS
- 1034 – Printer Application - Shipping Label, Set #2
- 1040 – Korean Extended
- 1041 – Japanese Extended
- 1042 – Simplified Chinese Extended
- 1043 – Traditional Chinese Extended
- 1044 – Printer Application - Shipping Label, Set #1
- 1046 – Arabic Extended
- 1086 – IBM-PC Japan #1
- 1088 – Revised Korean
- 1092 – IBM-PC Modified Symbols
- 1098 – Farsi
- 1108 – DITROFF Base Compatibility
- 1109 – DITROFF Specials Compatibility
- 1115 – IBM-PC People's Republic of China
- 1116 – Estonian
- 1117 – Latvian
- 1118 – Lithuanian
- 1119 – Lithuanian and Russian
- 1125 – Cyrillic, Ukrainian
- 1127 – IBM-PC Arabic / French
- 1131 – IBM-PC Data, Cyrillic, Belarusian
- 1139 – Japan Alphanumeric Katakana
- 1161 – Thai with Low Tone Marks & Ancient Chars with euro
- 1167 – KOI8-RU
- 1168 – KOI8-U
- 1300 – ANSI
- 1370 – Traditional Chinese MIX
- 1380 – IBM-PC Simplified Chinese GB PC-DATA
- 1381 – IBM-PC Simplified Chinese
- 1393 – Japanese JIS X 0213 DBCS
- 1394 – IBM-PC Japan
DOS code pages are typically stored in.CPI files.
IBM AIX code pages
These code pages are used by IBM in its AIX operating system. They emulate several character sets, namely those ones designed to be used accordingly to ISO, such as UNIX-like operating systems.- 367 – 7-bit US-ASCII
- 371 – 7-bit US-ASCII APL
- 806 – ISCII
- 813 – ISO 8859-7
- 819 – ISO 8859-1
- 895 – 7-bit Japan Latin
- 896 – 7-bit Japan Katakana Extended
- 901 – Extension of ISO 8859-13 with euro
- 902 – ISO Estonian with euro
- 912 – Extension of ISO 8859-2
- 913 – ISO 8859-3
- 914 – ISO 8859-4
- 915 – Extension of ISO 8859-5
- 916 – ISO 8859-8
- 919 – ISO 8859-10
- 920 – ISO 8859-9
- 921 – Extension of ISO 8859-13
- 922 – ISO Estonian
- 923 – ISO 8859-15
- 952 – EUC Japanese for JIS X 0208
- 953 – EUC Japanese for JIS X 0212
- 954 – EUC Japanese
- 955 – TCP Japanese, JIS X 0208-1978
- 956 – TCP Japanese
- 957 – TCP Japanese
- 958 – TCP Japanese
- 959 – TCP Japanese
- 960 – Traditional Chinese DBCS-EUC SICGCC Primary Set
- 961 – Traditional Chinese DBCS-EUC SICGCC Full Set + IBM Select + UDC
- 963 – Traditional Chinese TCP, CNS 11643 plane 2 only
- 964 – EUC Traditional Chinese
- 965 – TCP Traditional Chinese
- 970 – EUC Korean
- 971 – EUC Korean DBCS
- 1006 – ISO 8-bit Urdu
- 1008 – ISO 8-bit Arabic
- 1009 – 7-bit ISO IRV
- 1010 – 7-bit France
- 1011 – 7-bit Germany F.R.
- 1012 – 7-bit Italy
- 1013 – 7-bit United Kingdom
- 1014 – 7-bit Spain
- 1015 – 7-bit Portugal
- 1016 – 7-bit Norway
- 1017 – 7-bit Denmark
- 1018 – 7-bit Finland/Sweden
- 1019 – 7-bit Netherlands
- 1029 – Arabic Extended
- 1036 – CCITT T.61
- 1089 – ISO 8859-6
- 1111 – ISO 8859-2
- 1124 – ISO Ukrainian, similar to ISO 8859-5
- 1129 – ISO Vietnamese
- 1133 – ISO Lao
- 1163 – ISO Vietnamese with euro
- 1350 – EUC Japanese
- 1382 – EUC Simplified Chinese
- 1383 – EUC Simplified Chinese
IBM OS/2 code pages
These code pages are used by IBM in its OS/2 operating system.- 1004 – Latin-1 Extended, Desk Top Publishing/Windows
Windows emulation code pages
- 897 – IBM-PC SBCS Japanese
- 941 – IBM-PC Japanese DBCS for Open environment
- 947 – IBM-PC DBCS for
- 950 – Traditional Chinese MIX
- 1114 – IBM-PC SBCS
- 1126 – IBM-PC Korean SBCS
- 1162 – Windows Thai
- 1169 – Windows Cyrillic Asian
- 1174 – Windows Kazakh
- 1250 – Windows Central Europe
- 1251 – Windows Cyrillic
- 1252 – Windows Western
- 1253 – Windows Greek
- 1254 – Windows Turkish
- 1255 – Windows Hebrew
- 1256 – Windows Arabic
- 1257 – Windows Baltic
- 1258 – Windows Vietnamese
- 1361 – Korean
- 1362 – Korean Hangul DBCS
- 1363 – Windows Korean
- 1372 – IBM-PC MS T Chinese Big5 encoding
- 1373 – Windows Traditional Chinese
- 1374 – IBM-PC DB Big5 encoding extension for HKSCS
- 1375 – Mixed Big5 encoding extension for HKSCS
- 1385 – IBM-PC Simplified Chinese DBCS
- 1386 – IBM-PC Simplified Chinese GBK
- 1391 – Simplified Chinese 4 Byte
- 1392 – IBM-PC Simplified Chinese MIX
Macintosh emulation code pages
- 1275 – Apple Roman
- 1280 – Apple Greek
- 1281 – Apple Turkish
- 1282 – Apple Central European
- 1283 – Apple Cyrillic
- 1284 – Apple Croatian
- 1285 – Apple Romanian
- 1286 – Apple Icelandic
Adobe emulation code pages
- 1038 – Adobe Symbol Encoding
- 1276 – Adobe Standard Encoding
- 1277 – Adobe Latin 1
HP emulation code pages
- 1050 – HP Roman Extension
- 1051 – HP Roman-8
- 1052 – HP Gothic Legal
- 1053 – HP Gothic-1
- 1054 – HP ASCII
- 1055 – HP PC-Line
- 1056 – HP Line Draw
- 1057 – HP PC-8
- 1058 – HP PC-8DN
- 1351 – Japanese DBCS HP character set
- 5039 – Japanese MIX
DEC emulation code pages
- 1020 – 7-bit Canadian NRC Set
- 1021 – 7-bit Switzerland NRC Set
- 1023 – 7-bit Spanish NRC Set
- 1090 – Special Characters and Line Drawing Set
- 1100 – DEC Multinational
- 1101 – 7-bit British NRC Set
- 1102 – 7-bit Dutch NRC Set
- 1103 – 7-bit Finnish NRC Set
- 1104 – 7-bit French NRC Set
- 1105 – 7-bit Norwegian/Danish NRC Set
- 1106 – 7-bit Swedish NRC Set
- 1107 – 7-bit Norwegian/Danish NRC Alternate
- 1287 – DEC Greek
- 1288 – DEC Turkish
IBM Unicode code pages
- 1200 – UTF-16BE Unicode with IBM Private Use Area
- 1201 – UTF-16BE Unicode
- 1202 – UTF-16LE Unicode with IBM PUA
- 1203 – UTF-16LE Unicode
- 1208 – UTF-8 Unicode with IBM PUA
- 1209 – UTF-8 Unicode
- 1400 – ISO 10646 UCS-BMP
- 1401 – ISO 10646 UCS-SMP
- 1402 – ISO 10646 UCS-SIP
- 1414 – ISO 10646 UCS-SSP
- 1445 – IBM AFP PUA No. 1
- 1446 – ISO 10646 UCS-PUP15
- 1447 – ISO 10646 UCS-PUP16
- 1448 – UCS-BMP
- 1449 – IBM default PUA
Microsoft code pages
Windows code pages
These code pages are used by Microsoft in its own Windows operating system. Microsoft defined a number of code pages known as the ANSI code pages. Code page 1252 is built on ISO 8859-1 but uses the range 0x80-0x9F for extra printable characters rather than the C1 control codes used in ISO-8859-1. Some of the others are based in part on other parts of ISO 8859 but often rearranged to make them closer to 1252.- 874 – Windows Thai
- 1250 – Windows Central Europe
- 1251 – Windows Cyrillic
- 1252 – Windows Western
- 1253 – Windows Greek
- 1254 – Windows Turkish
- 1255 – Windows Hebrew
- 1256 – Windows Arabic
- 1257 – Windows Baltic
- 1258 – Windows Vietnamese
DBCS code pages
These code pages represent DBCS character encodings for various CJK languages. In Microsoft operating systems, these are used as both the "OEM" and "Windows" code page for the applicable locale.- 932 – Supports Japanese Shift-JIS
- 936 – Supports Simplified Chinese GBK
- 949 – Supports Korean Unified Hangul Code
- 950 – Supports Traditional Chinese Big5
MS-DOS code pages
- 708 – Arabic
- 709 – Arabic
- 710 – Arabic
- 720 – Arabic
- 737 – Greek
- 850 – Latin-1
- 851 – Greek
- 852 – Latin-2
- 855 – Cyrillic
- 857 – Latin-5
- 858 – Latin-1 with euro symbol
- 859 – Latin-9
- 860 – Portuguese
- 861 – Icelandic
- 862 – Hebrew
- 863 – Canadian French
- 864 – Arabic
- 865 – Danish/Norwegian
- 866 – Belarusian, Russian, Ukrainian
- 869 – Greek
Macintosh emulation code pages
- 10000 - Apple Macintosh Roman
- 10001 - Apple Japanese
- 10002 - Apple Traditional Chinese
- 10003 - Apple Korean
- 10004 - Apple Arabic
- 10005 - Apple Hebrew
- 10006 - Apple Greek
- 10007 - Apple Macintosh Cyrillic
- 10008 - Apple Simplified Chinese
- 10010 - Apple Romanian
- 10017 - Apple Ukrainian
- 10021 - Apple Thai
- 10029 - Apple Macintosh Central Europe
- 10079 - Apple Icelandic
- 10081 - Apple Turkish
- 10082 - Apple Croatian
Various other Microsoft code pages
- 20000 – Traditional Chinese CNS
- 20001 – Traditional Chinese TCA
- 20002 – Traditional Chinese ETEN
- 20003 – Traditional Chinese IBM5500
- 20004 – Traditional Chinese TeleText
- 20005 – Traditional Chinese Wang
- 20105 – 7-bit IA5 IRV
- 20106 – 7-bit IA5 German
- 20107 – 7-bit IA5 Swedish
- 20108 - 7-bit IA5 Norwegian
- 20127 – 7-bit US-ASCII
- 20261 – CCITT T.61
- 20269 – ISO 6937
- 20273
- 20277
- 20278
- 20284
- 20285
- 20290
- 20297
- 20420
- 20423
- 20424
- 20833
- 20838
- 20866 – KOI8-R
- 20871
- 20880
- 20905
- 20924
- 20932
- 20936
- 20949
- 21025
- 21027
- 21866 – KOI8-U
- 28591 – ISO-8859-1
- 28592 – ISO-8859-2
- 28593 – ISO-8859-3
- 28594 – ISO-8859-4
- 28595 – ISO-8859-5
- 28596 – ISO-8859-6
- 28597 – ISO-8859-7
- 28598 – ISO-8859-8
- 28599 – ISO-8859-9
- 28600 – ISO-8859-10
- 28601 – ISO-8859-11
- 28602 – not used
- 28603 – ISO-8859-13
- 28604 – ISO-8859-14
- 28605 – ISO-8859-15
- 28606 – ISO-8859-16
- 38596 – ISO-8859-6
- 38598 – ISO-8859-8
Microsoft Unicode code pages
- 1200 – UTF-16LE Unicode
- 1201 – UTF-16BE Unicode
- 12000 – UTF-32LE Unicode
- 12001 – UTF-32BE Unicode
- 65000 – UTF-7 Unicode
- 65001 – UTF-8 Unicode
- 65520 – Empty Unicode Plane
HP Symbol Sets
HP own Symbol Sets
- Symbol Set 0E — HP Roman Extension — 7-bit character set with accented letters
- Symbol Set 0G — HP 7-bit German
- Symbol Set 0L — HP Line Draw
- Symbol Set 0M — HP Math-7
- Symbol Set 0T — HP Thai-8
- Symbol Set 1S — HP 7-bit Spanish
- Symbol Set 1U — HP 7-bit Gothic Legal
- Symbol Set 4Q — 7-bit PC Line
- Symbol Set 4U — HP Roman-9 — Roman-8 + €
- Symbol Set 7J — HP Desktop
- Symbol Set 7S — HP 7-bit European Spanish
- Symbol Set 8E — HP East-8
- Symbol Set 8G — HP Greek-8
- Symbol Set 8H — HP Hebrew-8
- Symbol Set 8I — MS LineDraw
- Symbol Set 8K — HP Kana-8
- Symbol Set 8L — HP LineDraw
- Symbol Set 8M — HP Math-8
- Symbol Set 8R — HP Cyrillic-8
- Symbol Set 8S — HP 7-bit Latin American Spanish
- Symbol Set 8T — HP Turkish-8
- Symbol Set 8U — HP Roman-8
- Symbol Set 8V — HP Arabic-8
- Symbol Set 9K — HP Korean-8
- Symbol Set 9T — PC 8T
- Symbol Set 9V — Latin / Arabic for Windows
- Symbol Set 11U — PC 8D/N
- Symbol set 14G — PC-8 Greek Alternate
- Symbol Set 18K —
- Symbol Set 18T —
- Symbol Set 19C —
- Symbol Set 19K —
Symbol Sets from other vendors
- Symbol Set 0D — ISO 60: 7-bit Norwegian
- Symbol Set 0F — ISO 25: 7-bit French
- Symbol Set 0H — HP 7-bit Hebrew — Practically the same as Israeli Standard SI 960
- Symbol Set 0I — ISO 15: 7-bit Italian
- Symbol Set 0K — ISO 14: 7-bit Japanese Katakana
- Symbol Set 0N — ISO 8859-1 Latin 1
- Symbol Set 0R — ISO 8859-5 Latin/Cyrillic
- Symbol Set 0S — ISO 11: 7-bit Swedish
- Symbol Set 0U — ISO 6: 7-bit U.S.
- Symbol Set 0V — Arabic
- Symbol Set 1D — ISO 61: 7-bit Norwegian
- Symbol Set 1E — ISO 4: 7-bit U. K.
- Symbol Set 1F — ISO 69: 7-bit French
- Symbol Set 1G — ISO 21: 7-bit German
- Symbol Set 1K — ISO 13: 7-bit Japanese Latin
- Symbol Set 1T — Windows Thai
- Symbol Set 2K — ISO 57: 7-bit Simplified Chinese Latin
- Symbol Set 2N — ISO 8859-2 Latin 2
- Symbol Set 2S — ISO 17: 7-bit Spanish
- Symbol Set 2U — ISO 2: 7-bit International Reverence Version
- Symbol Set 3N — ISO 8859-3 Latin 3
- Symbol Set 3R — PC-866 Russia
- Symbol Set 3S — ISO 10: 7-bit Swedish
- Symbol Set 4N — ISO 8859-4 Latin 4
- Symbol Set 4S — ISO 16: 7-bit Portuguese
- Symbol Set 5M — PS Math Symbol
- Symbol Set 5N — ISO 8859-9 Latin 5
- Symbol Set 5S — ISO 84: 7-bit Portuguese
- Symbol Set 5T — Windows 3.1 Latin-5
- Symbol Set 6J — Microsoft Publishing
- Symbol Set 6M — Ventura Math
- Symbol Set 6N — ISO 8859-10 Latin 6
- Symbol Set 6S — ISO 85: 7-bit Spanish
- Symbol Set 7H — ISO 8859-8 Latin/Hebrew
- Symbol Set 9E — Windows 3.1 Latin 2
- Symbol Set 9G — Windows 98 Greek
- Symbol Set 9J — PC 1004
- Symbol Set 9L — Ventura ITC Zapf Dingbats
- Symbol Set 9N — ISO 8859-15 Latin 9
- Symbol Set 9R — Windows 98 Cyrillic
- Symbol Set 9U — Windows 3.0
- Symbol Set 10G — PC-851 Latin/Greek
- Symbol Set 10J — PS Text
- Symbol Set 10L — PS ITC Zapf Dingbats
- Symbol Set 10N — ISO 8859-5 Latin/Cyrillic
- Symbol Set 10R — PC-855 Cyrillic
- Symbol Set 10T — Teletex
- Symbol Set 10U — PC-8
- Symbol Set 10V — CP-864
- Symbol Set 11G — CP-869
- Symbol Set 11J — PS ISO Latin-1
- Symbol Set 11N — ISO 8859-6 Latin/Arabic
- Symbol Set 12G — PC Latin/Greek
- Symbol Set 12J — MC Text
- Symbol Set 12N — ISO 8859-7 Latin/Greek
- Symbol Set 12R — PC Gost
- Symbol Set 12U — PC-850 Latin 1
- Symbol Set 13J — Ventura International
- Symbol Set 13R — PC Bulgarian
- Symbol Set 13U — PC-858 Latin 1 + €
- Symbol Set 14J — Ventura U. S.
- Symbol Set 14L — Windows Dingbats
- Symbol Set 14P — ABICOMP International
- Symbol Set 14R — PC Ukrainian
- Symbol Set 15H — PC-862 Israel
- Symbol Set 16U — PC-857 Latin 5
- Symbol Set 17U — PC-852 Latin 2
- Symbol Set 18N — UTF-8
- Symbol Set 18U — PC-853 Latin 3
- Symbol Set 19L — Windows 98 Baltic
- Symbol Set 19M — Windows Symbol
- Symbol Set 19U — Windows 3.1 Latin 1
- Symbol Set 20U — PC-860 Portugal
- Symbol Set 21U — PC-861 Iceland
- Symbol Set 23U — PC-863 Canada - French
- Symbol Set 24Q — PC-Polish Mazowia
- Symbol Set 25U — PC-865 Denmark/Norway
- Symbol Set 26U — PC-775 Latin 7
- Symbol Set 27Q — PC-8 PC Nova
- Symbol Set 27U — PC Latvian Russian
- Symbol Set 28U — PC Lithuanian/Russian
- Symbol Set 29U — PC-772 Lithuanian/Russian
Code pages from other vendors
These code pages number assignments are not official neither by IBM, neither by Microsoft and almost none of them is referred as a usable character set by IANA. The numbers assigned to these code pages are arbitrary and may clash to registered numbers in use by IBM or Microsoft. Some of them may predate codepage switching being added in DOS 3.3.
- 100 – DOS Hebrew hardware fontpage
- 111 – DOS Greek
- 112 – DOS Turkish
- 113 – DOS Yugoslavian
- 151 – DOS Nafitha Arabic
- 152 – DOS Nafitha Arabic
- 161 – DOS Arabic
- 162 – DOS Arabic
- 163 – DOS Arabic
- 164 – DOS Arabic
- 165 – DOS Arabic
- 166 – IBM Arabic PC
- 210 – DEC DOS Greek
- 220 – DEC DOS Spanish
- 489 – Czechoslovakian
- 620 – DOS Polish
- 667 – DOS Polish
- 668 – DOS Polish
- 707 – MS-DOS Arabic Sakhr
- 711 – MS-DOS Arabic Nafitha Enhanced
- 714 – MS-DOS Arabic Sakr
- 715 – MS-DOS Arabic APTEC
- 721 – MS-DOS Arabic Nafitha International
- 768 – Arabic Al-Arabi
- 770 – DOS Estonian, Latvian, Lithuanian
- 771 – DOS Lithuanian/Cyrillic — KBL
- 772 – DOS Lithuanian/Cyrillic
- 773 – DOS Latin-7 — KBL
- 774 – DOS Lithuanian
- 775 – DOS Latin-7 Baltic Rim
- 776 – DOS Lithuanian
- 777 – DOS Accented Lithuanian — KBL
- 778 – DOS Accented Lithuanian
- 790 – DOS Polish
- 854 – Spanish
- 881 – Latin 1
- 882 – Latin 2
- 883 – Latin 3
- 884 – Latin 4
- 885 – Latin 5
- 895 – Czech,
- 896 – DOS Polish
- 900 – DOS Russian
- 928 – Greek ; same as Greek National Standard ELOT 928
- 966 – Saudi Arabian
- 991 – DOS Polish
- 999 – DOS Serbo-Croatian I ; also known as PC Nova and CroSCII; lower part is JUSI.B1.002, upper part is code page 437; supports Slovenian and Serbo-Croatian
- 1001 – Arabic
- 1261 – Windows Korean IBM-1261 LMBCS-17, similar to 1363
- 1270 – Windows Sámi
- 2001 – Lithuanian KBL ; same as code page 771
- 3001 – Estonian 1 ; same as code page 1116
- 3002 – Estonian 2 ; same as code page 922
- 3011 – Latvian 1 ; same as code page 437-Latvian
- 3012 – Latvian-2 ; same as code page 866-Latvian
- 3021 – Bulgarian ; same as MIK
- 3031 – Hebrew ; same as code page 862
- 3041 – Maltese ; same as ISO 646 Maltese
- 3840 – IBM-Russian ; nearly the same as CP 866
- 3841 – Gost-Russian ; GOST 13052 plus characters for Central Asian languages
- 3843 – Polish ; same as Mazovia
- 3844 – CS2 ; same as Kamenický
- 3845 – Hungarian ; same as CWI
- 3846 – Turkish ; same as PC-8 Turkish + old Turkish Lira sign at code point A8
- 3847 – Brazil-ABNT ; same as the Brazilian National Standard NBR-9614:1986
- 3848 – Brazil-ABICOMP ; same as ABICOMP
- 3850 – Standard KU ; variation of the Kasetsart University encoding for Thai
- 3860 – Rajvitee KU ; variation of the Kasetsart University encoding for Thai
- 3861 – Microwiz KU ; variation of the Kasetsart University encoding for Thai
- 3863 – STD988 TIS ; variation of the TIS 620 encoding for Thai
- 3864 – Popular TIS ; variation of the TIS 620 encoding for Thai
- 3865 – Newsic TIS ; variation of the TIS 620 encoding for Thai
- – CWI-2 supports Hungarian
- – MIK supports Bulgarian
- – DOS Serbo-Croatian II; supports Slovenian and Serbo-Croatian
- — Russian Alternative code page ; this is the origin for IBM CP 866
List of code page assignments
ID | Names | Description | Origin | Platform | DOS | OS/2 | Windows | Mac | Else | Encoding | Comment |
0 | N/A | Reserved | IBM, Microsoft | N/A | 3.3+ | 1.0+ | ? | ? | ? | Internal OS use | |
437 | CP437, IBM437 | PC US | IBM | IBM PC | 3.3+ | 1.0+ | Yes | ? | Yes | 8-bit SBCS | |
57344 - 61439 | N/A | Private use derivations | IBM | N/A | N/A | N/A | N/A | N/A | N/A | various | Private use code page derivations |
65280 - 65533 | N/A | Private use definitions | IBM | N/A | N/A | N/A | N/A | N/A | N/A | various | Private use code page definitions |
65534 | N/A | Reserved | IBM, Microsoft | N/A | ? | ? | ? | ? | ? | various | Internal OS use |
65535 | N/A | Reserved | IBM, Microsoft | N/A | 3.3+ | 1.0+ | ? | ? | ? | various | Internal OS use |
Criticism
Many older character encodings suffer from several problems. Some code page vendors insufficiently document the meaning of all code point values, which decreases the reliability of handling textual data through various computer systems consistently. Some vendors add proprietary extensions to some code pages to add or change certain code point values; for example, byte 0x5C in Shift JIS can represent either a back slash or a yen currency symbol depending on the platform. Finally, in order to support several languages in a program that does not use Unicode, the code page used for each string/document needs to be stored.Due to Unicode's extensive documentation, vast repertoire of characters and stability policy of characters, the problems listed above are rarely a concern for Unicode. Applications may also mislabel text in Windows-1252 as ISO-8859-1. Fortunately, the only difference between these code pages is that the code point values used by ISO-8859-1 for control characters are instead used as additional printable characters in Windows-1252. Since control characters have no function in HTML, web browsers tend to use Windows-1252 rather than ISO-8859-1. In HTML5, treating ISO-8859-1 as Windows-1252 is even codified as standard. Later, UTF-8 has succeeded both encodings in terms of popularity on the Internet.
Private code pages
When, early in the history of personal computers, users did not find their character encoding requirements met, private or local code pages were created using Terminate and Stay Resident utilities or by re-programming BIOS EPROMs. In some cases, unofficial code page numbers were invented.When more diverse character set support became available most of those code pages fell into disuse, with some exceptions such as the Kamenický or KEYBCS2 encoding for the Czech and Slovak alphabets. Another character set is Iran System encoding standard that was created by Iran System corporation for Persian language support. This standard was in use in Iran in DOS-based programs and after introduction of Microsoft code page 1256 this standard became obsolete. However some Windows and DOS programs using this encoding are still in use and some Windows fonts with this encoding exist.
In order to overcome such problems, the IBM Character Data Representation Architecture level 2 specifically reserves ranges of code page IDs for user-definable and private-use assignments. Whenever such code page IDs are used, the user must not assume that the same functionality and appearance can be reproduced in another system configuration or on another device or system unless the user takes care of this specifically.
The code page range 57344-61439 is officially reserved for user-definable code pages, whereas the range 65280-65533 is reserved for any user-definable "private use" assignments.
For example, a non-registered custom variant of code page 437 or 28591 could become 57781 or 61359, respectively, in order to avoid potential conflicts with other assignments and maintain the sometimes existing internal numerical logic in the assignments of the original code pages. An unregistered private code page not based on an existing code page, a device specific code page like a printer font, which just needs a logical handle to become addressable for the system, a frequently changing download font, or a code page number with a symbolic meaning in the local environment could have an assignment in the private range like 65280.
The code page IDs 0, 65534 and 65535 are reserved for internal use by operating systems such as DOS and must not be assigned to any specific code pages.