如何使用Perl去除块注释？

I am working on a preprocessor that is analyzing a DSL. My goal is to remove the comments. The block comment facility is demarcated by %% before and after. I do not have to worry about %% being in strings, by the definition of the language.

我正在研究一种分析DSL的预处理器。我的目标是删除评论。块注释工具在%%之前和之后划分。通过语言的定义,我不必担心%%在字符串中。

I am using this s/// regex. Unfortunately, it seems to match everything and wipe it out:

我正在使用这个///正则表达式。不幸的是,它似乎匹配所有内容并将其消除:

#Remove multiline comments.
$text_string =~ s/%%.*%%//msg;

What am I doing wrong?

我究竟做错了什么?

3 个解决方案

#1

the first thing you can do is make it non-greedy:

你能做的第一件事就是让它变得非贪婪:

.*?

otherwise,

%% some text %%

%% some text %%

real content

%% other text %%

%%其他文本%%

will all be wiped out.

一切都将被消灭。

#2

From perlfaq6: What does it mean that regexes are greedy? How can I get around it?

来自perlfaq6:正则表达式贪婪是什么意思?我怎么能绕过它呢?

Most people mean that greedy regexes match as much as they can. Technically speaking, it's actually the quantifiers (?, *, +, {}) that are greedy rather than the whole pattern; Perl prefers local greed and immediate gratification to overall greed. To get non-greedy versions of the same quantifiers, use (??, *?, +?, {}?).

大多数人的意思是贪婪的正则表达式尽可能匹配。从技术上讲,它实际上是量词(?,*,+,{})贪婪而不是整个模式; Perl更喜欢当地的贪婪和对整体贪婪的直接满足。要获得相同量词的非贪婪版本,请使用(??,*?,+?,{}?)。

An example:

$s1 = $s2 = "I am very very cold";
$s1 =~ s/ve.*y //;      # I am cold
$s2 =~ s/ve.*?y //;     # I am very cold

Notice how the second substitution stopped matching as soon as it encountered "y ". The *? quantifier effectively tells the regular expression engine to find a match as quickly as possible and pass control on to whatever is next in line, like you would if you were playing hot potato.

注意第二个替换在遇到“y”时如何停止匹配。 *?量词有效地告诉正则表达式引擎尽快找到匹配并将控制传递给下一行,就像你在玩烫手山芋一样。

#3

assuming that you have read entire code into the variable $str and between %% and %% there is no possibility of a single % occuring, you could use this.

假设您已将整个代码读入变量$ str以及%%和%%之间,则不可能出现单个%,您可以使用它。

$str =~ s/%%([^%]+)%%//g;

$ str = ~s / %%([^%] +)%% // g;

#1