Unicode中汉字的完整范围是什么?

时间:2022-07-04 03:05:17

U+4E00..U+9FFF is part of the complete set,but not all

U + 4 e00 . .U+9FFF是完整集合的一部分,但不是全部。

4 个解决方案

#1


79  

May be you would find a complete list through the CJK Unicode FAQ (which does include "Chinese, Japanese, and Korean" characters)

可以通过CJK Unicode FAQ(包括“中文、日语和韩语”字符)找到完整的列表。

The "East Asian Script" document does mention:

“东亚文字”文件确实提到:

Blocks Containing Han Ideographs

块包含汉族象形文字

Han ideographic characters are found in five main blocks of the Unicode Standard, as shown in Table 12-2

在Unicode标准的五个主要模块中发现了汉字字符,如表12-2所示。

Table 12-2. Blocks Containing Han Ideographs

表12 - 2。块包含汉族象形文字

Block                                   Range       Comment
CJK Unified Ideographs                  4E00-9FFF   Common
CJK Unified Ideographs Extension A      3400-4DBF   Rare
CJK Unified Ideographs Extension B      20000-2A6DF Rare, historic
CJK Unified Ideographs Extension C      2A700–2B73F Rare, historic
CJK Unified Ideographs Extension D      2B740–2B81F Uncommon, some in current use
CJK Unified Ideographs Extension E      2B820–2CEAF Rare, historic
CJK Compatibility Ideographs            F900-FAFF   Duplicates, unifiable variants, corporate characters
CJK Compatibility Ideographs Supplement 2F800-2FA1F Unifiable variants

Note: the block ranges can evolve over time: latest is in CJK Unified Ideographs.

注意:块范围可以随时间变化:最新的是CJK统一的表意文字。

See also Wikipedia:

参见*:

#2


40  

Unicode currently has 74605 CJK characters. CJK characters not only includes characters used by Chinese, but also Japanese Kanji, Korean Hanja, and Vietnamese Chu Nom. Some CJK characters are not Chinese characters.

Unicode目前有74605个CJK字符。CJK字符不仅包括汉字,也包括日本汉字、韩语和越南语。一些CJK字符不是汉字。

1) 20941 characters from the CJK Unified Ideographs block.

Code points U+4E00 to U+9FCC.

代码点U+4E00到U+9FCC。

  1. U+4E00 - U+62FF
  2. U + 4 e00 - U + 62 ff
  3. U+6300 - U+77FF
  4. U + 6300 - U + 77 ff
  5. U+7800 - U+8CFF
  6. U + 7800 - U + 8 cff
  7. U+8D00 - U+9FCC
  8. U + 8 d00 - U + 9 fcc

2) 6582 characters from the CJKUI Ext A block.

Code points U+3400 to U+4DB5. Unicode 3.0 (1999).

代码点U+3400到U+4DB5。Unicode 3.0(1999)。

3) 42711 characters from the CJKUI Ext B block.

Code points U+20000 to U+2A6D6. Unicode 3.1 (2001).

代码点U+20000到U+2A6D6。Unicode 3.1(2001)。

  1. U+20000 - U+215FF
  2. U + 20000 - U + 215 ff
  3. U+21600 - U+230FF
  4. U + 21600 - U + 230 ff
  5. U+23100 - U+245FF
  6. U + 23100 - U + 245 ff
  7. U+24600 - U+260FF
  8. U + 24600 - U + 260 ff
  9. U+26100 - U+275FF
  10. U + 26100 - U + 275 ff
  11. U+27600 - U+290FF
  12. U + 27600 - U + 290 ff
  13. U+29100 - U+2A6DF
  14. U + 29100 - U + 2 a6df

3) 4149 characters from the CJKUI Ext C block.

Code points U+2A700 to U+2B734. Unicode 5.2 (2009).

代码点U+2A700到U+2B734。Unicode 5.2(2009)。

4) 222 characters from the CJKUI Ext D block.

Code points U+2B740 to U+2B81D. Unicode 6.0 (2010).

代码点U+2B740到U+2B81D。Unicode 6.0(2010)。

5) CJKUI Ext E block.

Coming soon

即将到来的

If the above is not spaghetti enough, take a look at known issues. Have fun =)

如果上面的内容不够通心粉,就去看看那些已知的问题吧。玩得开心=)

#3


4  

The exact range for Chinese characters (except the extensions) is [\u2E80-\u2FD5\u3400-\u4DBF\u4E00-\u9FCC].

中文字符的确切范围(扩展除外)是[\u2E80-\u2FD5\u3400-\u4DBF\u4E00-\u9FCC]。

  1. [\u2e80-\u2fd5]
  2. [\ u2e80 - \ u2fd5]

CJK Radicals Supplement is a Unicode block containing alternative, often positional, forms of the Kangxi radicals. They are used headers in dictionary indices and other CJK ideograph collections organized by radical-stroke.

CJK*基补充是一种含有替代的,通常是位置的,康熙激进分子形式的Unicode块。它们被用在字典索引和其他CJK表意图集合中,它们是由激进脑卒中组织的。

  1. [\u3400-\u4DBF]
  2. [\ u3400 - \ u4DBF]

CJK Unified Ideographs Extension-A is a Unicode block containing rare Han ideographs.

CJK统一表意文字扩展- a是一种包含稀有汉字的Unicode块。

  1. [\u4E00-\u9FCC]
  2. [\ u4E00 - \ u9FCC]

CJK Unified Ideographs is a Unicode block containing the most common CJK ideographs used in modern Chinese and Japanese.

CJK统一表意文字是一种Unicode的代码块,包含了现代汉语和日语中最常用的CJK表意文字。

For the details please refer to here, and the extensions are provided in other answers.

详情请参阅此处,并提供其他答案的扩展。

#4


1  

The Unicode code blocks that the others answers gave certainly cover most of the Chinese Unicode characters, but check out some of these other code blocks, too.

其他答案的Unicode代码块肯定包含了大部分的中文Unicode字符,但是也要检查一些其他的代码块。

CJK_UNIFIED_IDEOGRAPHS
CJK_UNIFIED_IDEOGRAPHS_EXTENSION_A
CJK_UNIFIED_IDEOGRAPHS_EXTENSION_B
CJK_UNIFIED_IDEOGRAPHS_EXTENSION_C
CJK_UNIFIED_IDEOGRAPHS_EXTENSION_D
CJK_UNIFIED_IDEOGRAPHS_EXTENSION_E
CJK_COMPATIBILITY
CJK_COMPATIBILITY_FORMS
CJK_COMPATIBILITY_IDEOGRAPHS
CJK_COMPATIBILITY_IDEOGRAPHS_SUPPLEMENT
CJK_RADICALS_SUPPLEMENT
CJK_STROKES
CJK_SYMBOLS_AND_PUNCTUATION
ENCLOSED_CJK_LETTERS_AND_MONTHS
ENCLOSED_IDEOGRAPHIC_SUPPLEMENT
KANGXI_RADICALS
IDEOGRAPHIC_DESCRIPTION_CHARACTERS

See my fuller discussion here. And this site is convenient for browsing Unicode.

看我这里更详细的讨论。而且这个网站很方便浏览Unicode。

#1


79  

May be you would find a complete list through the CJK Unicode FAQ (which does include "Chinese, Japanese, and Korean" characters)

可以通过CJK Unicode FAQ(包括“中文、日语和韩语”字符)找到完整的列表。

The "East Asian Script" document does mention:

“东亚文字”文件确实提到:

Blocks Containing Han Ideographs

块包含汉族象形文字

Han ideographic characters are found in five main blocks of the Unicode Standard, as shown in Table 12-2

在Unicode标准的五个主要模块中发现了汉字字符,如表12-2所示。

Table 12-2. Blocks Containing Han Ideographs

表12 - 2。块包含汉族象形文字

Block                                   Range       Comment
CJK Unified Ideographs                  4E00-9FFF   Common
CJK Unified Ideographs Extension A      3400-4DBF   Rare
CJK Unified Ideographs Extension B      20000-2A6DF Rare, historic
CJK Unified Ideographs Extension C      2A700–2B73F Rare, historic
CJK Unified Ideographs Extension D      2B740–2B81F Uncommon, some in current use
CJK Unified Ideographs Extension E      2B820–2CEAF Rare, historic
CJK Compatibility Ideographs            F900-FAFF   Duplicates, unifiable variants, corporate characters
CJK Compatibility Ideographs Supplement 2F800-2FA1F Unifiable variants

Note: the block ranges can evolve over time: latest is in CJK Unified Ideographs.

注意:块范围可以随时间变化:最新的是CJK统一的表意文字。

See also Wikipedia:

参见*:

#2


40  

Unicode currently has 74605 CJK characters. CJK characters not only includes characters used by Chinese, but also Japanese Kanji, Korean Hanja, and Vietnamese Chu Nom. Some CJK characters are not Chinese characters.

Unicode目前有74605个CJK字符。CJK字符不仅包括汉字,也包括日本汉字、韩语和越南语。一些CJK字符不是汉字。

1) 20941 characters from the CJK Unified Ideographs block.

Code points U+4E00 to U+9FCC.

代码点U+4E00到U+9FCC。

  1. U+4E00 - U+62FF
  2. U + 4 e00 - U + 62 ff
  3. U+6300 - U+77FF
  4. U + 6300 - U + 77 ff
  5. U+7800 - U+8CFF
  6. U + 7800 - U + 8 cff
  7. U+8D00 - U+9FCC
  8. U + 8 d00 - U + 9 fcc

2) 6582 characters from the CJKUI Ext A block.

Code points U+3400 to U+4DB5. Unicode 3.0 (1999).

代码点U+3400到U+4DB5。Unicode 3.0(1999)。

3) 42711 characters from the CJKUI Ext B block.

Code points U+20000 to U+2A6D6. Unicode 3.1 (2001).

代码点U+20000到U+2A6D6。Unicode 3.1(2001)。

  1. U+20000 - U+215FF
  2. U + 20000 - U + 215 ff
  3. U+21600 - U+230FF
  4. U + 21600 - U + 230 ff
  5. U+23100 - U+245FF
  6. U + 23100 - U + 245 ff
  7. U+24600 - U+260FF
  8. U + 24600 - U + 260 ff
  9. U+26100 - U+275FF
  10. U + 26100 - U + 275 ff
  11. U+27600 - U+290FF
  12. U + 27600 - U + 290 ff
  13. U+29100 - U+2A6DF
  14. U + 29100 - U + 2 a6df

3) 4149 characters from the CJKUI Ext C block.

Code points U+2A700 to U+2B734. Unicode 5.2 (2009).

代码点U+2A700到U+2B734。Unicode 5.2(2009)。

4) 222 characters from the CJKUI Ext D block.

Code points U+2B740 to U+2B81D. Unicode 6.0 (2010).

代码点U+2B740到U+2B81D。Unicode 6.0(2010)。

5) CJKUI Ext E block.

Coming soon

即将到来的

If the above is not spaghetti enough, take a look at known issues. Have fun =)

如果上面的内容不够通心粉,就去看看那些已知的问题吧。玩得开心=)

#3


4  

The exact range for Chinese characters (except the extensions) is [\u2E80-\u2FD5\u3400-\u4DBF\u4E00-\u9FCC].

中文字符的确切范围(扩展除外)是[\u2E80-\u2FD5\u3400-\u4DBF\u4E00-\u9FCC]。

  1. [\u2e80-\u2fd5]
  2. [\ u2e80 - \ u2fd5]

CJK Radicals Supplement is a Unicode block containing alternative, often positional, forms of the Kangxi radicals. They are used headers in dictionary indices and other CJK ideograph collections organized by radical-stroke.

CJK*基补充是一种含有替代的,通常是位置的,康熙激进分子形式的Unicode块。它们被用在字典索引和其他CJK表意图集合中,它们是由激进脑卒中组织的。

  1. [\u3400-\u4DBF]
  2. [\ u3400 - \ u4DBF]

CJK Unified Ideographs Extension-A is a Unicode block containing rare Han ideographs.

CJK统一表意文字扩展- a是一种包含稀有汉字的Unicode块。

  1. [\u4E00-\u9FCC]
  2. [\ u4E00 - \ u9FCC]

CJK Unified Ideographs is a Unicode block containing the most common CJK ideographs used in modern Chinese and Japanese.

CJK统一表意文字是一种Unicode的代码块,包含了现代汉语和日语中最常用的CJK表意文字。

For the details please refer to here, and the extensions are provided in other answers.

详情请参阅此处,并提供其他答案的扩展。

#4


1  

The Unicode code blocks that the others answers gave certainly cover most of the Chinese Unicode characters, but check out some of these other code blocks, too.

其他答案的Unicode代码块肯定包含了大部分的中文Unicode字符,但是也要检查一些其他的代码块。

CJK_UNIFIED_IDEOGRAPHS
CJK_UNIFIED_IDEOGRAPHS_EXTENSION_A
CJK_UNIFIED_IDEOGRAPHS_EXTENSION_B
CJK_UNIFIED_IDEOGRAPHS_EXTENSION_C
CJK_UNIFIED_IDEOGRAPHS_EXTENSION_D
CJK_UNIFIED_IDEOGRAPHS_EXTENSION_E
CJK_COMPATIBILITY
CJK_COMPATIBILITY_FORMS
CJK_COMPATIBILITY_IDEOGRAPHS
CJK_COMPATIBILITY_IDEOGRAPHS_SUPPLEMENT
CJK_RADICALS_SUPPLEMENT
CJK_STROKES
CJK_SYMBOLS_AND_PUNCTUATION
ENCLOSED_CJK_LETTERS_AND_MONTHS
ENCLOSED_IDEOGRAPHIC_SUPPLEMENT
KANGXI_RADICALS
IDEOGRAPHIC_DESCRIPTION_CHARACTERS

See my fuller discussion here. And this site is convenient for browsing Unicode.

看我这里更详细的讨论。而且这个网站很方便浏览Unicode。