检查字符串是否包含日文/中文字符

时间:2021-02-18 07:22:03

I need a way to check whether a string contains Japanese or Chinese text.

我需要一种方法来检查字符串是否包含日文或中文文本。

Currently I'm using this:

目前我正在使用这个:

string.match(/[\u3400-\u9FBF]/);

but it does not work with this for example: ディアボリックラヴァーズ or バッテリー.

但它不适用于此例如:ディアボリックラヴァーズ或バッテリー。

Could you help me with that?

你能帮帮我吗?

Thanks

1 个解决方案

#1


8  

The ranges of Unicode characters which are routinely used for Chinese and Japanese text are:

常用于中文和日文文本的Unicode字符范围是:

  • U+3040 - U+30FF: hiragana and katakana (Japanese only)
  • U + 3040 - U + 30FF:平假名和片假名(仅限日语)

  • U+3400 - U+4DBF: CJK unified ideographs extension A (Chinese, Japanese, and Korean)
  • U + 3400 - U + 4DBF:CJK统一表意文字扩展名A(中文,日文和韩文)

  • U+4E00 - U+9FFF: CJK unified ideographs (Chinese, Japanese, and Korean)
  • U + 4E00 - U + 9FFF:CJK统一表意文字(中文,日文和韩文)

  • U+F900 - U+FAFF: CJK compatibility ideographs (Chinese, Japanese, and Korean)
  • U + F900 - U + FAFF:CJK兼容性表意文字(中文,日文和韩文)

  • U+FF66 - U+FF9F: half-width katakana (Japanese only)
  • U + FF66 - U + FF9F:半角片假名(仅限日语)

As a regular expression, this would be expressed as:

作为正则表达式,这将表示为:

/[\u3040-\u30ff\u3400-\u4dbf\u4e00-\u9fff\uf900-\ufaff\uff66-\uff9f]/

This does not include every character which will appear in Chinese and Japanese text, but any significant piece of typical Chinese or Japanese text will be mostly made up of characters from these ranges.

这不包括将出现在中文和日文文本中的每个字符,但任何重要的典型中文或日文文本都将主要由这些范围内的字符组成。

Note that this regular expression will also match on Korean text that contains hanja. This is an unavoidable result of Han unification.

请注意,此正则表达式也将匹配包含hanja的韩语文本。这是汉族统一的不可避免的结果。

#1


8  

The ranges of Unicode characters which are routinely used for Chinese and Japanese text are:

常用于中文和日文文本的Unicode字符范围是:

  • U+3040 - U+30FF: hiragana and katakana (Japanese only)
  • U + 3040 - U + 30FF:平假名和片假名(仅限日语)

  • U+3400 - U+4DBF: CJK unified ideographs extension A (Chinese, Japanese, and Korean)
  • U + 3400 - U + 4DBF:CJK统一表意文字扩展名A(中文,日文和韩文)

  • U+4E00 - U+9FFF: CJK unified ideographs (Chinese, Japanese, and Korean)
  • U + 4E00 - U + 9FFF:CJK统一表意文字(中文,日文和韩文)

  • U+F900 - U+FAFF: CJK compatibility ideographs (Chinese, Japanese, and Korean)
  • U + F900 - U + FAFF:CJK兼容性表意文字(中文,日文和韩文)

  • U+FF66 - U+FF9F: half-width katakana (Japanese only)
  • U + FF66 - U + FF9F:半角片假名(仅限日语)

As a regular expression, this would be expressed as:

作为正则表达式,这将表示为:

/[\u3040-\u30ff\u3400-\u4dbf\u4e00-\u9fff\uf900-\ufaff\uff66-\uff9f]/

This does not include every character which will appear in Chinese and Japanese text, but any significant piece of typical Chinese or Japanese text will be mostly made up of characters from these ranges.

这不包括将出现在中文和日文文本中的每个字符,但任何重要的典型中文或日文文本都将主要由这些范围内的字符组成。

Note that this regular expression will also match on Korean text that contains hanja. This is an unavoidable result of Han unification.

请注意,此正则表达式也将匹配包含hanja的韩语文本。这是汉族统一的不可避免的结果。