Hong Kong Supplementary Character Set


The Hong Kong Supplementary Character Set is a set of Chinese characters – 4,702 in total in the initial release—used in Cantonese, as well as when writing the names of some places in Hong Kong. It evolved from the preceding Government Chinese Character Set or GCCS. GCCS is a set of supplementary Chinese characters coded in the user-defined areas of the Big5 character set. It was originally used within the Hong Kong Government and later used by the public. It later evolved into Hong Kong Supplementary Character Set when the characters in the set were submitted to ISO-10646 for coding.

Development history

Due to the inherent differences between standard written Chinese and written Cantonese, the Government of Hong Kong recognised the need for a standardised set of proprietary characters that would allow for the streamlining of electronic communication; at the time, the Big5 Chinese encoding scheme did not contain a vast majority of these characters.
The Government Chinese Character Set or GCCS was thus developed by the government. The character set consists of Chinese characters commonly used in Hong Kong. Some characters are Cantonese-specific, while some are alternative forms of characters. The set is not well-organised and the characters are not closely examined.
Subsequently, the HKSCS-1999 was developed. Following its acceptance, newer revisions were released in 2001 and in 2004, totalling 4,941 characters. 106 GCCS characters were removed in HKSCS-1999 as a result of unification, and their Big5 code points are reserved for compatibility. Retired "not verifiable" GCCS characters are found in UTC Sources, where they are sourced from Adobe-CNS1-1, an Adobe-CNS1 supplement implemented to support GCCS.
The HKSCS is encoded in Big5 and ISO 10646. Starting from HKSCS-2004, all characters previously using the Private Use Area section of Unicode are remapped, with many of them reassigned to Extension B Block or Supplementary Ideographic Plane Compatibility Block. However, to preserve compatibility with programs that generated PUA code points, the allocated code points are reserved, and no new characters will be mapped to PUA.

Version history

The HKSCS has gone through a few iterations.
VersionTotal charactersPublish date
GCCS3,0491995
HKSCS-19994,70209/1999
HKSCS-20014,81812/2001
HKSCS-20044,94105/2005
HKSCS-20085,00912/2009
HKSCS-20165,03305/2017

The last edition of HKSCS to encode all of its characters in Big5 was HKSCS-2008, while the characters added in HKSCS-2016 are mapped to Unicode only.

Macao Supplementary Character Set

Similarly to Hong Kong's situation, there are also characters that are needed by Macao but included in neither Big5 nor HKSCS, hence, the Macao Supplementary Character Set was developed, building on HKSCS with additional Unicode-mapped characters. The first batch of 121 MSCS characters were submitted for addition to or horizontal extension in Unicode in 2009, and the first final version of MSCS was established in 2020.

Compatibility

Operating systems

Microsoft Windows

In Microsoft Windows 98, NT 4.0, 2000, XP, HKSCS support can be enabled using Microsoft's patch. In Microsoft's implementation, application using code page 950 automatically uses a hidden code page 951 table for the Big5 encoding of the HKSCS extensions. The table supports all code points in HKSCS-2001, except for the compatibility code points specified by the standard. In addition, the MingLiU font is altered using Microsoft's patch. This patch is known to create conflicts in applications such as Microsoft Office, or any application using fonts supporting simplified Chinese characters. If the target environment contains custom font mapped to the code points affected by Microsoft's patch, the custom fonts can undo Microsoft's patch. Furthermore, the patch breaks EUDC Editor supplied with the affected versions of Windows.
Starting with Windows Vista, HKSCS-2004 characters are only supported as Unicode 4.1 or later. All characters are assigned standard, non-PUA codepoints. The characters are displayed with the MingLiU font, and these characters can be entered via the keyboard. The patch that provides Big5 encoding of HKSCS is unsupported in Windows Vista and later. A utility provided by Microsoft is available to convert HKSCS and Unicode PUA-encoded characters to Unicode 4.1 version.
In 2010, Microsoft published a HKSCS-2004 patch for Windows XP and Windows Server 2003. It replaces Windows XP version of MingLiu, PMingLiu, and MingLiu_HKSCS with Windows 7 version of MingLiu, PMingLiu and MingLiu_HKSCS. In addition, MingLiU-ExtB, MingLiU_HKSCS-ExtB and PMingLiU-ExtB fonts will be added onto target system. However, IME is not updated as it was in the case of HKSCS-2001 patch, and the fonts are from pre-release of Windows 7.
For earlier versions of the OS, HKSCS support requires the use of Microsoft's patch, or the Hong Kong government's Digital 21's utilities.

IBM

IBM number the Big5 form of HKSCS-2001 as code page 5471.

Linux

HKSCS support was added to glibc in 2000, but it has not been updated since then. HKSCS-2004 support is handled as Unicode 4.1 and later.
For freedesktop.org setup, AR PL ShanHeiSun Uni font fully supports HKSCS-2004 since 0.1-0.dot.1, with latest revision of HKSCS-2004 supported in version 0.1.20060903-1.
Modern desktop distributions include Arphic Technology's HKSCS-compliant UKai and UMing fonts out of the box when Traditional Chinese Language support is selected during installation. They can also be installed manually at a later time.

Mac OS

10.0–10.2 supports HKSCS-1999. 10.3–10.4 supports HKSCS-2001. Some of the letters added to HKSCS-2004 is supported via Unicode PUA in OS X 10.4. Starting with OS X 10.5, all the HKSCS-2004 characters are supported via standard Unicode 4.1 code points.

Applications and the Web

1.5 and above supports HKSCS, with HKSCS-2004 support added into Gecko 1.8.1 code base. Unlike the above-mentioned patch, Mozilla uses its own code page table. However, the fix for bug 343129 does not support characters mapped to code points above Basic Multilingual Plane.
QT 3.x-based applications only support characters mapped to code points FFFF or lower. In QT4, characters outside BMP are supported via surrogates. Big5-HKSCS Text Codec supports HKSCS-1999 back in Qt-2.3.x, but it was too late in Qt development schedule to be officially included in the Qt-2.3.x series, so it was officially supported in Qt-3.0.1. HKSCS-2001 support was added in Qt-3.0.5.
GNOME supports HKSCS characters in Unicode ranges, except those mapped to the Basic Multilingual Plane compatibility block. Patches to support characters mapped to above Basic Multilingual Plane was introduced during Pango 1.1.
The WHATWG Encoding Standard includes HKSCS in its definition of Big5. However, only its decoder uses all HKSCS extensions, while its encoder explicitly excludes those with lead bytes below 0xA1. Newer browsers follow this standard, including Firefox.