如何安全地使用用户输入的正则表达式？

My (Perl-based) application needs to let users input regular expressions, to match various strings behind the scenes. My plan so far has been to take the string and wrap it in something like

我的（基于Perl的）应用程序需要让用户输入正则表达式，以匹配幕后的各种字符串。到目前为止，我的计划是取出字符串并将其包装成类似的东西

$regex = eval { qr/$text/ };
if (my $error = $@) { 
   # mangle $error to extract user-facing message

($text having been stripped of newlines ahead of time, since it's actually multiple regular expressions in a multi-line text-field that I split).

（$ text已被提前删除换行符，因为它实际上是我拆分的多行文本字段中的多个正则表达式）。

Are there any potential security risks with doing this - some weird input that could lead to arbitrary code execution? (Besides the buffer overflow vulnarabilities in the regular expression engines like CVE-2007-5116). If so, are there ways to mitigate them?

这样做是否存在任何潜在的安全风险 - 一些可能导致任意代码执行的奇怪输入？（除了CVE-2007-5116等正则表达式引擎中的缓冲区溢出漏洞）。如果是这样，有没有办法减轻它们？

Is there a better way to do this? Any Perl modules which help abstract the operations of turning user input into regular expressions (such as extracting error messages ... or providing modifiers like /i, which I don't strictly need here, but would be nice)? I searched CPAN and didn't find much that was promising, but entertain the possibility that I missed something.

有一个更好的方法吗？任何Perl模块都有助于抽象将用户输入转换为正则表达式的操作（例如提取错误消息......或者提供像/ i这样的修饰符，我在这里并不严格需要，但会很好）？我搜索了CPAN并没有找到很多有希望的东西，但却有可能让我错过了一些东西。

5 个解决方案

#1

With the (?{ code }) construct, user input could be used to execute arbitrary code. See the example in perlre#code and where it says

使用（？{code}）结构，用户输入可用于执行任意代码。请参阅perlre #code中的示例以及它所说的位置

local $cnt = $cnt + 1,

replace it with the expression

用表达式替换它

system("rm -rf /home/fennec"); print "Ha ha.\n";

(Actually, don't do that.)

（实际上，不要这样做。）

#2

Using untrusted input as a regular expression creates denial-of-service vulnerability as described in perlsec:

使用不受信任的输入作为正则表达式会创建拒绝服务漏洞，如perlsec中所述：

Regular expressions - Perl's regular expression engine is so called NFA (Non-deterministic Finite Automaton), which among other things means that it can rather easily consume large amounts of both time and space if the regular expression may match in several ways. Careful crafting of the regular expressions can help but quite often there really isn't much one can do (the book "Mastering Regular Expressions" is required reading, see perlfaq2). Running out of space manifests itself by Perl running out of memory.

正则表达式 - Perl的正则表达式引擎被称为NFA（非确定性有限自动机），除其他外，它意味着如果正则表达式可能以多种方式匹配，它可以相当容易地消耗大量的时间和空间。仔细制作正则表达式可能有所帮助，但通常确实没有太多可以做的事情（需要阅读“掌握正则表达式”一书，请参阅perlfaq2）。 Perl耗尽内存后，空间不足就会显现出来。

#3

the best way, is not to let users have too much privilege. Provide an interface just enough for users to do what they want. (like an ATM machine with only buttons for various options, no need for keyboard input). Of course, if you need user to key in input, then provide text box and then at the back end, use Perl to process the request (eg sanitizing etc). The motive behind letting your users input a regex is to search for string patterns right?? Then in that case, the most simplest and secure way is to tell them to input just the string. Then at the back end, you use Perl's regex to search for it. Is there any other compelling reason to have user input regex themselves?

最好的方法，就是不要让用户拥有太多的特权。提供足以让用户做他们想要的界面。（就像ATM机只有各种选项的按钮，不需要键盘输入）。当然，如果您需要用户键入输入，然后提供文本框，然后在后端，使用Perl处理请求（例如，清理等）。让用户输入正则表达式的动机是搜索字符串模式吗？那么在这种情况下，最简单和最安全的方法是告诉他们只输入字符串。然后在后端，使用Perl的正则表达式来搜索它。还有其他令人信服的理由让用户输入正则表达式吗？

#4

There is some discussion about this over at The Monastery.

在修道院对此进行了一些讨论。

TLDR: use re::engine::RE2 -strict => 1;

TLDR：使用re :: engine :: RE2 -strict => 1;

Make sure to add -strict => 1 to your use statement or re::engine::RE2 will fall back to Perl's re.

确保将-strict => 1添加到您的use语句中，或者re :: engine :: RE2将回退到Perl的重新编译。

The following is a citation from Paul Wankadia (junyer), owner of the project on GitHub:

以下是来自GitHub项目所有者Paul Wankadia（junyer）的引文：

RE2 was designed and implemented with an explicit goal of being able to handle regular expressions from untrusted users without risk. One of its primary guarantees is that the match time is linear in the length of the input string. It was also written with production concerns in mind: the parser, the compiler and the execution engines limit their memory usage by working within a configurable budget – failing gracefully when exhausted – and they avoid stack overflow by eschewing recursion.

RE2的设计和实现具有明确的目标，即能够在没有风险的情况下处理来自不受信任的用户的正则表达式。其主要保证之一是匹配时间在输入字符串的长度上是线性的。它的编写也考虑了生产问题：解析器，编译器和执行引擎通过在可配置的预算内工作来限制其内存使用 - 在耗尽时优雅地失败 - 并且它们通过避免递归来避免堆栈溢出。

To sum up the important points:

总结一下重点：

It's safe from arbitrary code execution by default, but add "no re 'eval';" to prevent PERL5OPT or ??anything else?? from setting it on you. I'm not sure if doing so prevents everything.

默认情况下，任意代码执行都是安全的，但添加“no re'eval';”防止PERL5OPT或其他任何事情？从你的设置。我不确定这样做是否会阻止一切。
Use a sub-process(fork) with BSD::Resource(even on Linux) to ulimit memory and kill the child after some timeout.

使用带有BSD :: Resource（甚至在Linux上）的子进程（fork）来限制内存并在超时后终止子进程。

#5

Perhaps you could use a different regex engine that does not have the dangerous code tag support.

也许您可以使用不具有危险代码标记支持的不同正则表达式引擎。

I haven't tried it but there is a PCRE for perl. You may also be able to limit or remove code support using this info on creating custom regex engines.

我没有尝试过，但perl有一个PCRE。您还可以使用此信息限制或删除代码支持，以创建自定义正则表达式引擎。

#1