如何使用正则表达式在PHP中正确删除UTF8字符串中的重复空白字符?

时间:2022-12-23 22:18:28

I'm trying to remove repeating white-space characters from UTF8 string in PHP using regex. This regex

我正在尝试使用正则表达式从PHP中的UTF8字符串中删除重复的空白字符。这个正则表达式

    $txt = preg_replace( '/\s+/i' , ' ', $txt );

usually works fine, but some of the strings have Cyrillic letter "Р", which is screwed after the replacement. After small research I realized that the letter is encoded as \x{D0A0}, and since \xA0 is non-breaking white space in ASCII the regex replaces it with \x20 and the character is no longer valid.

通常工作正常,但有些字符串有西里尔字母“Р”,更换后拧紧。经过小规模的研究,我意识到这个字母被编码为\ x {D0A0},并且因为\ xA0是ASCII中的非破坏空格,正则表达式将其替换为\ x20并且该字符不再有效。

Any ideas how to do this properly in PHP with regex?

有关如何在PHP中使用正则表达式正确执行此操作的任何想法?

2 个解决方案

#1


3  

it is described @ http://www.php.net/manual/en/function.preg-replace.php#106981

它被描述为@ http://www.php.net/manual/en/function.preg-replace.php#106981

If you want to catch characters, as well european, russian, chinese, japanese, korean of whatever, just:

如果你想要捕捉角色,以及欧洲,俄罗斯,中国,日本,韩国的任何东西,只需:

  • use mb_internal_encoding('UTF-8');
  • use preg_replace('...u', '...', $string) with the u (unicode) modifier
  • 将preg_replace('... u','...',$ string)与u(unicode)修饰符一起使用

For further information, the complete list of preg_* modifiers could be found at : http://php.net/manual/en/reference.pcre.pattern.modifiers.php

有关更多信息,可在以下网址找到完整的preg_ *修饰符列表:http://php.net/manual/en/reference.pcre.pattern.modifiers.php

#2


5  

Try the u modifier:

尝试使用u修饰符:

$txt="UTF 字符串 with 空格符號";
var_dump(preg_replace("/\\s+/iu","",$txt));

Outputs:

string(28) "UTF字符串with空格符號"

#1


3  

it is described @ http://www.php.net/manual/en/function.preg-replace.php#106981

它被描述为@ http://www.php.net/manual/en/function.preg-replace.php#106981

If you want to catch characters, as well european, russian, chinese, japanese, korean of whatever, just:

如果你想要捕捉角色,以及欧洲,俄罗斯,中国,日本,韩国的任何东西,只需:

  • use mb_internal_encoding('UTF-8');
  • use preg_replace('...u', '...', $string) with the u (unicode) modifier
  • 将preg_replace('... u','...',$ string)与u(unicode)修饰符一起使用

For further information, the complete list of preg_* modifiers could be found at : http://php.net/manual/en/reference.pcre.pattern.modifiers.php

有关更多信息,可在以下网址找到完整的preg_ *修饰符列表:http://php.net/manual/en/reference.pcre.pattern.modifiers.php

#2


5  

Try the u modifier:

尝试使用u修饰符:

$txt="UTF 字符串 with 空格符號";
var_dump(preg_replace("/\\s+/iu","",$txt));

Outputs:

string(28) "UTF字符串with空格符號"