如何配置Perl regexes?

时间:2022-03-06 12:16:09

What's the best way to profile Perl regexes to determine how expensive they are?

确定Perl regexe的成本有多高的最好方法是什么?

3 个解决方案

#1


13  

Perl comes with the Benchmark module, which can take a number of code samples, and answer the question of "which one is faster?". I've got a Perl Tip on Benchmarking Basics, and while that doesn't use regexps per se, it does give a quick and useful introduction to the topic, along with further references.

Perl附带了基准模块,它可以获取许多代码示例,并回答“哪个更快?”我已经得到了关于基准测试的Perl技巧,虽然它本身并不使用regexp,但是它确实提供了关于这个主题的快速和有用的介绍,以及进一步的引用。

brian d foy also has an excellent chapter on benchmarking in his Mastering Perl book. He's been kind enough to put the chapter on-line as a draft, which is well worth the read. I really can't recommend it enough.

brian d foy在他的《精通Perl》一书中也有一个关于基准测试的精彩章节。他好心地把这一章作为草稿放在网上,值得一读。我真的不推荐它。

Paul

保罗

#2


3  

Just saying "use the Benchmark" module doesn't really answer the question, though. Benchmarking a regex is different than benchmarking a calculation; you need a large amount of realistic data so you can stress the regex as real data would. If most of your data will match, you'd want a regex that matches quickly; if most will fail, you want a regex that fails quickly. They could wind up being the same regex, but maybe not.

不过,仅仅说“使用基准”模块并不能真正回答这个问题。对regex进行基准测试与对计算进行基准测试不同;您需要大量实际的数据,以便可以像实际数据那样强调regex。如果您的大多数数据都匹配,那么您需要一个匹配迅速的regex;如果大多数都会失败,那么您需要一个快速失败的regex。它们最终可能是相同的regex,但可能不是。

#3


0  

My preferred way would be to have a large set of input data to the RE then process that data N times (e.g., 100,000) to see how long it takes.

我的首选方法是让一组大的输入数据被重新处理N次(例如,100,000次),看看需要多长时间。

Then tweak the RE and try again (keep all the old REs as comments in case you need to benchmark them again in future, who knows what wondrous optimizations may appear in Perl 7?).

然后调整RE并再次尝试(保留所有旧的REs作为注释,以防将来需要再次对它们进行基准测试,谁知道Perl 7中会出现什么奇妙的优化?)

There may well be tools which can analyze REs to give you execution paths for specific inputs (like the analysis tools in DBMS') but, since Perl is the language of the lazy (a commandment handed down by Larry himself), I couldn't be bothered going to find it :-).

可能有一些工具可以分析REs来为特定的输入提供执行路径(比如DBMS中的分析工具),但是,由于Perl是懒惰的语言(Larry自己传下来的一条戒律),我不愿意去找它:-)。

#1


13  

Perl comes with the Benchmark module, which can take a number of code samples, and answer the question of "which one is faster?". I've got a Perl Tip on Benchmarking Basics, and while that doesn't use regexps per se, it does give a quick and useful introduction to the topic, along with further references.

Perl附带了基准模块,它可以获取许多代码示例,并回答“哪个更快?”我已经得到了关于基准测试的Perl技巧,虽然它本身并不使用regexp,但是它确实提供了关于这个主题的快速和有用的介绍,以及进一步的引用。

brian d foy also has an excellent chapter on benchmarking in his Mastering Perl book. He's been kind enough to put the chapter on-line as a draft, which is well worth the read. I really can't recommend it enough.

brian d foy在他的《精通Perl》一书中也有一个关于基准测试的精彩章节。他好心地把这一章作为草稿放在网上,值得一读。我真的不推荐它。

Paul

保罗

#2


3  

Just saying "use the Benchmark" module doesn't really answer the question, though. Benchmarking a regex is different than benchmarking a calculation; you need a large amount of realistic data so you can stress the regex as real data would. If most of your data will match, you'd want a regex that matches quickly; if most will fail, you want a regex that fails quickly. They could wind up being the same regex, but maybe not.

不过,仅仅说“使用基准”模块并不能真正回答这个问题。对regex进行基准测试与对计算进行基准测试不同;您需要大量实际的数据,以便可以像实际数据那样强调regex。如果您的大多数数据都匹配,那么您需要一个匹配迅速的regex;如果大多数都会失败,那么您需要一个快速失败的regex。它们最终可能是相同的regex,但可能不是。

#3


0  

My preferred way would be to have a large set of input data to the RE then process that data N times (e.g., 100,000) to see how long it takes.

我的首选方法是让一组大的输入数据被重新处理N次(例如,100,000次),看看需要多长时间。

Then tweak the RE and try again (keep all the old REs as comments in case you need to benchmark them again in future, who knows what wondrous optimizations may appear in Perl 7?).

然后调整RE并再次尝试(保留所有旧的REs作为注释,以防将来需要再次对它们进行基准测试,谁知道Perl 7中会出现什么奇妙的优化?)

There may well be tools which can analyze REs to give you execution paths for specific inputs (like the analysis tools in DBMS') but, since Perl is the language of the lazy (a commandment handed down by Larry himself), I couldn't be bothered going to find it :-).

可能有一些工具可以分析REs来为特定的输入提供执行路径(比如DBMS中的分析工具),但是,由于Perl是懒惰的语言(Larry自己传下来的一条戒律),我不愿意去找它:-)。