你能帮我用php preg_replace重写我的javascript regex吗?

时间:2022-12-26 05:34:36

I have created a javascript regular expression in order to validate comments entered by users in my app. The regex allows letters, numbers some special symbols and a range of emojis

我创建了一个javascript正则表达式,以验证用户在我的应用程序中输入的评论

I received help here to correctly format my javascript regular expression and the final expression I am using is as follows:

我在这里得到了帮助,以正确地格式化我的javascript正则表达式,我使用的最终表达式如下:

Javascript Regex:

commentRegex =    /^(?:[A-Za-z0-9\u00C0-\u017F\u20AC\u2122\u2150\u00A9 \/.,\-_$!\'&*()="?#+%:;\<\[\]\r\r\n]|(?:\ud83c[\udf00-\udfff])|(?:\ud83d[\udc00-\ude4f\ude80-\udeff]))*$/;

I was advised to perform the same validation on the server side (with php) and so I am trying to perform a similar process using preg_replace().

建议我在服务器端(使用php)执行相同的验证,因此我尝试使用preg_replace()执行类似的过程。

So I would like to replace all characters (that are not contained in the regex), with the empty string. Here is my attempt however it is not working. thanks for any help

因此,我希望用空字符串替换所有字符(regex中不包含的字符)。这是我的尝试,但是没有成功。感谢任何帮助

PHP

$commentText = preg_replace('#^(?:[A-Za-z0-9\u00C0-\u017F\u20AC\u2122\u2150\u00A9 \/.,\-_$!\'&*()="?#+%:;\<\[\]\r\r\n]|(?:\ud83c[\udf00-\udfff])|(?:\ud83d[\udc00-\ude4f\ude80-\udeff]))*$#', '', $commentText);

Edit:

After taking your advice in the comments I now have the following regex.

在听取了您在评论中的建议之后,我现在有了下面的regex。

$postText = preg_replace('/^(?:[A-Za-z0-9\x{00C0}-\x{017F}\x{20AC}\x{2122}\x{2150}\x{00A9} \/.,\-_$!\'&*()="?\#\+%:;\<\[\]\r\n]|(?:\x{d83c}[\x{df00}-\x{dfff}])|(?:\x{d83d}[\x{dc00}-\x{de4f}\x{de80}-\x{deff}]))*$/', '', $postText);

However I am getting a warning

但是我得到了警告

<b>Warning</b>:  preg_replace(): Compilation failed: character value in \x{} or \o{} is too large at offset 30 in <b>submit_post.php</b> on line <b>37

3 个解决方案

#1


1  

In short: use

简而言之:使用

$re = '/[^A-Za-z0-9\x{00C0}-\x{017F}\x{20AC}\x{2122}\x{2150}\x{00A9} \/.,\-_$!\'&*()="?#+%:;<[\]\r\n\x{1F300}-\x{1F3FF}\x{1F400}-\x{1F64F}\x{1F680}-\x{1F6FF}]+/u';
$text = 'test>><<<®¥§';
echo preg_replace($re, '', $text);

See the PHP demo.

查看演示PHP。

A bit of an explanation:

解释一下:

  • Escape only special regex metacharacters inside the pattern AND the regex delimiters (if you choose a # as a regex delimiter, escape the # in the pattern, and then there is no need to escape /)
  • 在模式和regex分隔符中只转义特定的regex元字符(如果您选择#作为regex分隔符,则转义模式中的#,那么就不需要转义/)
  • \uXXXX in PCRE must be replaced with \x{XXXX} notation
  • 在PCRE上的uXXXX必须用\x{XXXX}符号替换。
  • Since the text to be processed is Unicode and the chars you have in your pattern are out of the ASCII range, you have to use /u UNICODE modifier
  • 由于要处理的文本是Unicode,而在您的模式中所包含的字符是超出ASCII范围的,您必须使用/u Unicode修饰符。
  • As most emojis come outside the BMP plane, and the string now treated as a chain of Unicode code points, these symbols must be written using the extended \x notation, not as two byte notation used in JavaScript
  • 由于大多数表情符号出现在BMP平面之外,并且字符串现在被视为Unicode编码点的链,这些符号必须使用扩展的\x表示法编写,而不是JavaScript中使用的两个字节表示法
  • Your 3 alternatives can be merged into 1 big character class and then you want to negated it by adding ^ at its start to make it a negated character class.
  • 3选择可以合并成1大字符类,然后通过添加^你想否定它在其开始否定字符类。

#2


1  

The regex in PHP has a character, which sourrounds the regex. In your case you are using the hash (#), but the character should not occour in the regex itslef, which it does...

PHP中的regex具有一个字符,该字符可以向regex发送源代码。在您的例子中,您正在使用散列(#),但是字符不应该在regex itslef中occour,它确实…

You have to excape this character inside, or use another char. Why did you not use the same "/" as in the JS Version? The benefit is, it is already escaped.

你必须在里面突出这个字符,或者使用另一个字符。为什么不使用与JS版本相同的“/”?好处是,它已经被忽略了。

I have not looked, if the rest would work, but I think so.

我还没有看,如果其余的可以,但我认为可以。

$commentText = preg_replace('/^(?:[A-Za-z0-9\u00C0-\u017F\u20AC\u2122\u2150\u00A9 \/.,\-_$!\'&*()="?#+%:;\<\[\]\r\r\n]|(?:\ud83c[\udf00-\udfff])|(?:\ud83d[\udc00-\ude4f\ude80-\udeff]))*$/', '', $commentText);

should work.

应该工作。

#3


1  

convert the \u.... sequences to \x{....}, and the result appears to be a valid PHP regular expression.

转换\ u ....序列\ x { ....,结果显示为一个有效的PHP正则表达式。

pattern: \\u(\w{4})

模式:\ \ u(\ w { 4 })

replace: \\x{$1}

替换:\ \ x美元{ 1 }

regex101 demo

regex101演示

#1


1  

In short: use

简而言之:使用

$re = '/[^A-Za-z0-9\x{00C0}-\x{017F}\x{20AC}\x{2122}\x{2150}\x{00A9} \/.,\-_$!\'&*()="?#+%:;<[\]\r\n\x{1F300}-\x{1F3FF}\x{1F400}-\x{1F64F}\x{1F680}-\x{1F6FF}]+/u';
$text = 'test>><<<®¥§';
echo preg_replace($re, '', $text);

See the PHP demo.

查看演示PHP。

A bit of an explanation:

解释一下:

  • Escape only special regex metacharacters inside the pattern AND the regex delimiters (if you choose a # as a regex delimiter, escape the # in the pattern, and then there is no need to escape /)
  • 在模式和regex分隔符中只转义特定的regex元字符(如果您选择#作为regex分隔符,则转义模式中的#,那么就不需要转义/)
  • \uXXXX in PCRE must be replaced with \x{XXXX} notation
  • 在PCRE上的uXXXX必须用\x{XXXX}符号替换。
  • Since the text to be processed is Unicode and the chars you have in your pattern are out of the ASCII range, you have to use /u UNICODE modifier
  • 由于要处理的文本是Unicode,而在您的模式中所包含的字符是超出ASCII范围的,您必须使用/u Unicode修饰符。
  • As most emojis come outside the BMP plane, and the string now treated as a chain of Unicode code points, these symbols must be written using the extended \x notation, not as two byte notation used in JavaScript
  • 由于大多数表情符号出现在BMP平面之外,并且字符串现在被视为Unicode编码点的链,这些符号必须使用扩展的\x表示法编写,而不是JavaScript中使用的两个字节表示法
  • Your 3 alternatives can be merged into 1 big character class and then you want to negated it by adding ^ at its start to make it a negated character class.
  • 3选择可以合并成1大字符类,然后通过添加^你想否定它在其开始否定字符类。

#2


1  

The regex in PHP has a character, which sourrounds the regex. In your case you are using the hash (#), but the character should not occour in the regex itslef, which it does...

PHP中的regex具有一个字符,该字符可以向regex发送源代码。在您的例子中,您正在使用散列(#),但是字符不应该在regex itslef中occour,它确实…

You have to excape this character inside, or use another char. Why did you not use the same "/" as in the JS Version? The benefit is, it is already escaped.

你必须在里面突出这个字符,或者使用另一个字符。为什么不使用与JS版本相同的“/”?好处是,它已经被忽略了。

I have not looked, if the rest would work, but I think so.

我还没有看,如果其余的可以,但我认为可以。

$commentText = preg_replace('/^(?:[A-Za-z0-9\u00C0-\u017F\u20AC\u2122\u2150\u00A9 \/.,\-_$!\'&*()="?#+%:;\<\[\]\r\r\n]|(?:\ud83c[\udf00-\udfff])|(?:\ud83d[\udc00-\ude4f\ude80-\udeff]))*$/', '', $commentText);

should work.

应该工作。

#3


1  

convert the \u.... sequences to \x{....}, and the result appears to be a valid PHP regular expression.

转换\ u ....序列\ x { ....,结果显示为一个有效的PHP正则表达式。

pattern: \\u(\w{4})

模式:\ \ u(\ w { 4 })

replace: \\x{$1}

替换:\ \ x美元{ 1 }

regex101 demo

regex101演示