Ideographic characters refer to those characters with appearance associated with the meaning of the characters. The International Organization for Standardization (ISO) has developed an international coding standard called ISO/IEC 10646. In ISO/IEC 10646, Chinese characters, together with characters of other languages such as Japanese (Kanji) and Korean (Hanja), are referred as Han ideographic characters.
5 major blocks for the Han characters are defined in the ISO/IEC 10646, namely the CJK Unified Ideographs block, the CJK Unified Ideographs Extension A block, the CJK Unified Ideographs Extension B block, the CJK Unified Ideographs Extension C block and the CJK Unified Ideographs Extension D block. The characters of the Extension A together with the CJK Unified Ideographs were released in 2000 as part of ISO/IEC 10646-1:2000. Thereafter in November 2001, the Extension B was released as part of ISO/IEC 10646-2:2001. The Extension C was released on December 2008 in ISO/IEC 10646:2003/Amd 5:2008. The Extension D was released in March 2011 as part of ISO/IEC 10646:2011.
The Extension C and the Extension D contains 4,149 and 222 additional ideographic characters. Architecturally, each character in Extension C and D is represented by a 32-bit code point, in the same way as Extension B.
The original design of Unicode was to use 16-bit code point to represent about 65,000 characters only. After years of development, it is known that 16-bit code point is insufficient to represent all the common scripts used worldwide. With the adoption of 32-bit code point, the limit is extended to 1 million characters which are enough to represent all the common scripts.
The adoption of 32-bit code point extends the capability to use all ideographic characters encoded in the ISO/IEC 10646. ISO/IEC 10646 contains more than 70,000 ideographic characters including the characters of the Kangxi Dictionary, Hanyu Dazidian and Hanyu Dacidian. The adoption of 32-bit code point provides more commonly used ideographic characters to facilitate the daily electronic communication conducted in Chinese more accurately and efficiently.
The 32-bit code point is a pair of 16-bit code points, called surrogates. Surrogates are code points from two special ranges of Unicode values called lead and trail surrogates. Lead surrogates are from D800 to DBFF and trail surrogates are from DC00 to DFFF. Through an algorithm specified in the Unicode Standard (https://unicode.org/faq/utf_bom.html#utf16-2), the code point of the resulting 32-bit encoded character can be computed by the surrogate pair.
Architecturally, an ideographic character in the CJK Unified Ideographs block or the CJK Unified Ideographs Extension A block can be represented by a 16-bit code point. However, an ideographic character in the CJK Unified Ideographs Extension B block and later extension blocks of the ISO/IEC 10646:2012 requires a 32-bit code point for accurate representation.
More information on the reference font and input software as well as viewing the ideographic characters in the 32-bit code point of the ISO/IEC 10646:2003 are available at the 32-bit code point webpage.
The stories below illustrate the example of adopting 32-bit code point (e.g. ISO/IEC 10646 Extension B) and the flexibility of its adoption for daily electronic communication conducted in Chinese.