以简单的方式包含重音字符有什么好的正则表达式?

时间:2022-04-18 03:40:42

Right now my regex is something like this:

现在我的正则表达式是这样的:

[a-zA-Z0-9] but it does not include accented characters like I would want to. I would also like - ' , to be included.

[a-zA-Z0-9]但它不包括我想要的重音字符。我也想 - ',包括在内。

3 个解决方案

#1


7  

Accented Characters: DIY Character Range Subtraction

重音字符:DIY字符范围减法

If your regex engine allows it (and many will), this will work:

如果你的正则表达式引擎允许它(很多人会),这将有效:

(?i)^(?:(?![×Þß÷þø])[-'0-9a-zÀ-ÿ])+$

Please see the demo (you can add characters to test).

请参阅演示(您可以添加要测试的字符)。

Explanation

  • (?i) sets case-insensitive mode
  • (?i)设置不区分大小写的模式

  • The ^ anchor asserts that we are at the beginning of the string
  • ^ anchor断言我们在字符串的开头

  • (?:(?![×Þß÷þø])[-'0-9a-zÀ-ÿ]) matches one character...
  • (?:(?![×Þß÷þø])[ - '0-9a-zÀ-ÿ])匹配一个字符......

  • The lookahead (?![×Þß÷þø]) asserts that the char is not one of those in the brackets
  • 前瞻(?![×Þß÷þø])断言char不是括号中的一个

  • [-'0-9a-zÀ-ÿ] allows dash, apostrophe, digits, letters, and chars in a wide accented range, from which we need to subtract
  • [-'0-9a-zÀ-ÿ]允许在宽重音范围内使用短划线,撇号,数字,字母和字符,我们需要从中减去

  • The + matches that one or more times
  • +匹配一次或多次

  • The $ anchor asserts that we are at the end of the string
  • $ anchor断言我们在字符串的末尾

Reference

Extended ASCII Table

扩展ASCII表

#2


0  

Use a POSIX character class (http://www.regular-expressions.info/posixbrackets.html):

使用POSIX字符类(http://www.regular-expressions.info/posixbrackets.html):

[-'[:alpha:]0-9] or [-'[:alnum:]]

[ - '[:alpha:] 0-9]或[ - '[:alnum:]]

The [:alpha:] character class matches whatever is considered "alphabetic characters" in your locale.

[:alpha:]字符类匹配您的语言环境中被视为“字母字符”的内容。

#3


0  

A version without the exclusion rules:

没有排除规则的版本:

^[-'a-zA-ZÀ-ÖØ-öø-ÿ]+$

Explanation

  • The ^ anchor asserts that we are at the beginning of the string
  • ^ anchor断言我们在字符串的开头

  • [...] allows dash, apostrophe, digits, letters, and chars in a wide accented range,
  • [...]允许在宽重音范围内使用短划线,撇号,数字,字母和字符,

  • The + matches that one or more times
  • +匹配一次或多次

  • The $ anchor asserts that we are at the end of the string
  • $ anchor断言我们在字符串的末尾

Reference

#1


7  

Accented Characters: DIY Character Range Subtraction

重音字符:DIY字符范围减法

If your regex engine allows it (and many will), this will work:

如果你的正则表达式引擎允许它(很多人会),这将有效:

(?i)^(?:(?![×Þß÷þø])[-'0-9a-zÀ-ÿ])+$

Please see the demo (you can add characters to test).

请参阅演示(您可以添加要测试的字符)。

Explanation

  • (?i) sets case-insensitive mode
  • (?i)设置不区分大小写的模式

  • The ^ anchor asserts that we are at the beginning of the string
  • ^ anchor断言我们在字符串的开头

  • (?:(?![×Þß÷þø])[-'0-9a-zÀ-ÿ]) matches one character...
  • (?:(?![×Þß÷þø])[ - '0-9a-zÀ-ÿ])匹配一个字符......

  • The lookahead (?![×Þß÷þø]) asserts that the char is not one of those in the brackets
  • 前瞻(?![×Þß÷þø])断言char不是括号中的一个

  • [-'0-9a-zÀ-ÿ] allows dash, apostrophe, digits, letters, and chars in a wide accented range, from which we need to subtract
  • [-'0-9a-zÀ-ÿ]允许在宽重音范围内使用短划线,撇号,数字,字母和字符,我们需要从中减去

  • The + matches that one or more times
  • +匹配一次或多次

  • The $ anchor asserts that we are at the end of the string
  • $ anchor断言我们在字符串的末尾

Reference

Extended ASCII Table

扩展ASCII表

#2


0  

Use a POSIX character class (http://www.regular-expressions.info/posixbrackets.html):

使用POSIX字符类(http://www.regular-expressions.info/posixbrackets.html):

[-'[:alpha:]0-9] or [-'[:alnum:]]

[ - '[:alpha:] 0-9]或[ - '[:alnum:]]

The [:alpha:] character class matches whatever is considered "alphabetic characters" in your locale.

[:alpha:]字符类匹配您的语言环境中被视为“字母字符”的内容。

#3


0  

A version without the exclusion rules:

没有排除规则的版本:

^[-'a-zA-ZÀ-ÖØ-öø-ÿ]+$

Explanation

  • The ^ anchor asserts that we are at the beginning of the string
  • ^ anchor断言我们在字符串的开头

  • [...] allows dash, apostrophe, digits, letters, and chars in a wide accented range,
  • [...]允许在宽重音范围内使用短划线,撇号,数字,字母和字符,

  • The + matches that one or more times
  • +匹配一次或多次

  • The $ anchor asserts that we are at the end of the string
  • $ anchor断言我们在字符串的末尾

Reference