ISO/IEC 10646 Questions and Answers
Q1. What are the benefits of unifying coding standards?
Q2. How does a unified coding standard benefit the development of a common Chinese language interface?
Q3. What is the International Organization for Standardization (ISO)?
Q4. What is the ISO/IEC 10646?
Q5. When was the ISO/IEC 10646 released?
Q6. What is the current development status of the ISO/IEC 10646?
Q7. What is ideographic character?
Q8. What is the Ideographic Research Group (IRG)?
Q9. Which countries/regions are members of the Ideographic Research Group?
Q10. What is Unicode?
Q11. What is the relationship between Unicode and the ISO/IEC 10646?
Q12. What is ISO/IEC 10646 Extension B and what benefit does it bring?
Q13. Is the new version of ISO/IEC 10646 and HKSCS (collectively referred to as the "Standard") backward compatible with its corresponding old version?
Q14. How can I browse the version of ISO/IEC 10646?
Q15. My computer platform supports ISO/IEC 10646. Why the glyphs of Chinese characters on some documents or certificates are not exactly the same as those displayed on my computer platform?
Q16. Some Chinese characters may have multiple glyphs, such as “悦” and “悅” . To support the glyph “悦”, can we simply change the glyph of “悅” to “悦” in the font file technically?
Q17. What is Web Open Font Format (WOFF)? What are the benefits of using WOFF?
ISO/IEC 10646 Questions and Answers
Q1.

What are the benefits of unifying coding standards?

A1.

With a unified coding standard, computers are capable of accurately processing and displaying electronic information in different languages. Users no longer need conversion tools to handle electronic information encoded in different coding standards. Distortion of information can be reduced during electronic communication, thus facilitating the exchange of electronic information across geographical areas.

TOP
Q2.

How does a unified coding standard benefit the development of a common Chinese language interface?

A2.

With a unified coding standard, computers in different parts of the world can display electronic information encoded in the same coding standard. Computers in Mainland China, Hong Kong and Taiwan can become capable of accurately displaying electronic information in traditional Chinese, simplified Chinese and Chinese characters specific to Hong Kong. Users no longer need to use different coding standards for the different sets of Chinese characters, thus avoiding the problems in electronic communication conducted in Chinese.

TOP
Q3.

What is the International Organization for Standardization (ISO)?

A3.

The ISO is a non-governmental organization established in 1947 (https://www.iso.org/). It comprises members from more than 160 countries. Its mission is to develop different international standards for facilitating the exchange in various areas (e.g. trade, information and technologies) among different parts of the world.

TOP
Q4.

What is the ISO/IEC 10646?

A4.

ISO/IEC 10646 is an international coding standard developed under the aegis of the International Organization for Standardization (ISO). It encodes the characters of the major languages of the world into a common character set.

TOP
Q5.

When was the ISO/IEC 10646 released?

A5.

The ISO released the first version of the ISO/IEC 10646 in 1993. It was called ISO/IEC 10646-1:1993.
In 2000, the ISO released ISO/IEC 10646-1:2000, which is an updated version of ISO/IEC 10646-1:1993. ISO/IEC 10646-1:2000 contains 27,484 ideographic characters consisting of the 20,902 ideographic characters of ISO/IEC 10646-1:1993 plus 6,582 newly defined ideographic characters in the CJK Unified Ideographs Extension A block.
In November 2001, the ISO released ISO/IEC 10646-2:2001 as a supplement to ISO/IEC 10646-1:2000. ISO/IEC 10646-2:2001 contains 42,711 newly defined ideographic characters in the CJK Unified Ideographs Extension B block, bringing the total number of ideographic characters contained in the ISO/IEC 10646 to exceed 70,000. All the characters in the Kangxi Dictionary, Hanyu Dazidian and Hanyu Dacidian are now included in the ISO/IEC 10646.
In April 2004, ISO published the ISO/IEC 10646:2003. It is a single publication as the result of the merger of the ISO/IEC 10646-1:2000 and ISO/IEC 10646-2:2001. Therefore, the ideographic characters in the ISO/IEC 10646:2003 are the same as those in ISO/IEC 10646-1:2000 cum ISO/IEC 10646-2:2001.
In December 2008, ISO published the CJK Unified Ideographs Extension C block in ISO/IEC 10646:2003/Amd 5:2008. The CJK Unified Ideographs Extension C block contains 4,149 additional ideographic characters.
In October 2009, ISO published the ISO/IEC 10646:2003/Amd 6:2009.
In March 2011, ISO published the ISO/IEC 10646:2011, which is a single publication as the result of the merger of the previous releases of ISO/IEC 10646:2003 and its Amendments 1 through 7. ISO/IEC 10646:2011 contains 222 newly defined ideographic characters in the CJK Unified Ideographs Extension D block, in addition to the CJK Unified Ideographs block, CJK Unified Ideographs Extension A block, CJK Unified Ideographs Extension B block, and CJK Unified Ideographs Extension C block which contain 20,940, 6,582, 42,711, and 4,149 characters respectively.
In September 2014, ISO published the ISO/IEC 10646:2014, which includes 5,762 ideographic characters in the CJK Unified Ideographs Extension E block.
In December 2017, ISO published the ISO/IEC 10646:2017, which includes 7,473 ideographic characters in the CJK Unified Ideographs Extension F block.
In December 2020, ISO published the ISO/IEC 10646:2020, which includes 4,939 ideographic characters in the CJK Unified Ideographs Extension G block.

TOP
Q6.

What is the current development status of the ISO/IEC 10646?

A6.

Ideographic characters refer to those characters with appearance related to the meaning of the characters, such as the Han characters. Inclusion of ideographic characters into the ISO/IEC 10646 is carried out in phases: i.e. CJK Unified Ideographs Extension A block, CJK Unified Ideographs Extension B block, CJK Unified Ideographs Extension C block, CJK Unified Ideographs Extension D block, etc.
The CJK Unified Ideographs Extension A block which includes 6,582 ideographic characters was released as part of ISO/IEC 10646-1:2000.
The CJK Unified Ideographs Extension B block which includes 42,711 ideographic characters was released as part of ISO/IEC 10646-2:2001.
The CJK Unified Ideographs Extension C block which includes 4,149 ideographic characters was released as part of ISO/IEC 10646:2003/Amd 5:2008.
The CJK Unified Ideographs Extension D block which includes 222 ideographic characters was released as part of ISO/IEC 10646:2011.
The CJK Unified Ideographs Extension E block which includes 5,762 ideographic characters was released as part of ISO/IEC 10646:2014.
The CJK Unified Ideographs Extension F block which includes 7,473 ideographic characters was released as part of ISO/IEC 10646:2017.
The CJK Unified Ideographs Extension G block which includes 4,939 ideographic characters was released as part of ISO/IEC 10646:2020.

TOP
Q7.

What is ideographic character?

A7.

The International Organization for Standardization categorizes characters from different regions of the world by their characteristics. Ideographic characters refer to those characters with appearance related to the meaning of the characters. An example of ideographic character is Han characters mainly used in South East Asia countries or territories such as Mainland China, Hong Kong, Taiwan, Macao, Japan, South Korea, North Korea, Vietnam and Singapore.

TOP
Q8.

What is the Ideographic Research Group (IRG)?

A8.

The IRG is a working group under the International Organization for Standardization. Its mission is to develop ideographic characters in the ISO/IEC 10646. The IRG has developed CJK Unified Ideographs Block, the Extension A Block, the Extension B Block, the Extension C Block, the Extension D Block, the Extension E Block, the Extension F Block and the Extension G Block.

TOP
Q9.

Which countries/regions are members of the Ideographic Research Group?

A9.

IRG members include Mainland China, Hong Kong, Macao, Taipei Computer Association, Singapore, Japan, South Korea, North Korea, Vietnam and USA. Representatives from the Unicode Consortium also attend IRG meetings for coordinating the synchronization between the ISO/IEC 10646 and Unicode.

TOP
Q10.

What is Unicode?

A10.

Unicode is a character coding system designed by the Unicode Consortium to support the interchange, processing and display of the written texts of many languages in the world. The Unicode Consortium comprises mainly hardware and software vendors.

TOP
Q11.

What is the relationship between Unicode and the ISO/IEC 10646?

A11.

In 1991, the ISO and the Unicode Consortium decided to cooperate in defining a universal coding standard for multilingual texts. Since then, the two organizations have been working very closely to extend the ISO/IEC 10646 and Unicode, and to keep them synchronized. The ISO releases information of characters and code points in the ISO/IEC 10646, while the Unicode Consortium supplements the characters and code points with implementation algorithms and semantics information. The ISO/IEC 10646 and the corresponding version of Unicode are code-to-code identical. Unicode can be regarded as the implementation version of the ISO/IEC 10646. Therefore, products supporting Unicode also support the ISO/IEC 10646.

TOP
Q12.

What is ISO/IEC 10646 Extension B and what benefit does it bring?

A12.

The 32-bit code point is a pair of 16-bit code points, called surrogates. Surrogates are code points from two special ranges of Unicode values called lead and trail surrogates.
Lead surrogates are from D800 to DBFF and trail surrogates are from DC00 to DFFF. Through an algorithm specified in the Unicode Standard (https://unicode.org/faq/utf_bom.html#utf16-2). The code point of the resulting 32-bit encoded character can be computed by the surrogate pair.

The original design of Unicode was to use 16-bit code point to represent about 65,000 characters only. After years of development, it is known that 16-bit code point is insufficient to represent all the common scripts used worldwide. With the adoption of 32-bit code point, the limit is extended to 1 million characters which are enough to represent all the common scripts.

The adoption of 32-bit code point extends the capability to use all ideographic characters encoded in the ISO/IEC 10646. The latest version of ISO/IEC 10646 contains more than 70,000 ideographic characters including the characters of the Kangxi Dictionary, Hanyu Dazidian and Hanyu Dacidian. The adoption of 32-bit code point provides more commonly used ideographic characters to facilitate the daily electronic communication conducted in Chinese more accurately and efficiently.

TOP
Q13.

Is the new version of ISO/IEC 10646 and HKSCS (collectively referred to as the "Standard") backward compatible with its corresponding old version?

A13.

The new version of the "Standard" is backward compatible with its corresponding old version. However, in respect of software implementation, newly included characters in the new version of the "Standard" may not be properly viewed or displayed on software platforms that support previous version of the "Standard". In addition, existing software applications that support previous version of the "Standard" may not be able to handle properly newly included characters, including those HKSCS characters with code points assigned by the ISO in the new version of the "Standard".

When users encounter problems in handling Chinese characters in the course of using GovHK Online Services, they may make reference to the FAQ section.

TOP
Q14.

How can I browse the version of ISO/IEC 10646?

A14.

The ISO/IEC 10646 version of this Chinese website is encoded with UTF-8, which is supported by the most commonly used web browsers such as Google Chrome and Mozilla Firefox. To browse the ISO/IEC 10646 version of this website, please refer to the following steps:

  1. Set up your Internet Browser so that it displays UTF-8 web pages by using Chinese Characters, or install Chinese Characters display support programme on your Internet Browser.
  2. When you browse UTF-8 web pages, change the encoding setting of your Internet Browser to "UTF-8".
TOP
Q15.

My computer platform supports ISO/IEC 10646. Why are some Chinese characters in the documents or certificates issued by some institutions not exactly the same as those displayed on my computer platform?

A15.

ISO/IEC 10646 provides a unified character coding standard for the communication and exchange of electronic information. How the glyphs, i.e. shapes of characters, represented by the character codes are displayed or printed depends on the fonts selected by the application software.

The Ideographic Research Group under ISO/IEC 10646 examines for unification glyphs from different character sources according to the Procedure for the unification and arrangement of CKJ Ideographs (Annex S of the ISO/IEC 10646 document). Unifiable glyphs are assigned the same code point. This means a code point may represent one or more glyphs. The table below exemplifies the different glyph shapes of a single code point affected by the use of different fonts.

a code point may represent one or more glyphs

Notes

  1. PMingLiU (新細明體), Meiryo (a Japanese font) and Batang (a Korean font) are fonts supplied with Windows
  2. Major differences of the unified glyphs are highlighted in blue.

Since the documents or certificates in question may be printed by computer systems or equipment with a default font different from that of your computer, it is possible that some of the glyphs therein are not exactly the same as those displayed on your computer platform.

There is a video available at https://www.youtube.com/watch?v=WEvJqfUZwcEwhich demonstrates how to find the ISO/IEC 10646 code point of a Chinese character.

For further information on character sources and unification, please refer to the ISO/IEC 10646 document available for download at https://standards.iso.org/ittf/PubliclyAvailableStandards/. The document may help you determine whether some similar glyphs are unifiable.

TOP
Q16.

Some Chinese characters may have multiple glyphs, such as “悦” and “悅” . To support the glyph “悦”, can we simply change the glyph of “悅” to “悦” in the font file technically?

A16.

Some Chinese characters may have multiple glyphs but the form of certain glyphs cannot be changed arbitrarily. It is because the different glyphs of a character may have been assigned separate code points in the ISO/IEC 10646. Changing the form of a glyph may result in identical glyphs in two different code points. For example, “悦” and “悅” are assigned separate code points U+60A6 and U+6085 respectively:

Changing the form of a glyph may result in identical glyphs in two different code points

To support the glyph “悦” , a font developer should work on the glyph with code point U+60A6, instead of changing the glyph of U+6085 from “悅” to “悦” . Otherwise, there will be identical glyphs in the two code points U+60A6 and U+6085, which will be confusing and undesirable for electronic data interchange.

More examples can be found in Annex S of the ISO/IEC 10646 document, which is available for downloading at https://standards.iso.org/ittf/PubliclyAvailableStandards/.

TOP
Q17.

What is Web Open Font Format (WOFF)? What are the benefits of using WOFF?

A17.

WOFF is an open format which is standardized by the World Wide Web Consortium (W3C) for using fonts on the Web. After using WOFF, web browsers will automatically download and temporarily install fonts when accessing the server for web pages. Users are not required to separately download and install fonts to their operating system for the display of content.

TOP