为什么正则表达式“//”和“/ *”不能匹配单个注释和块注释?

时间:2023-01-15 09:43:01

I want to calculate the "empty line","single comment","block comment" about c++ program.

我想计算关于c ++程序的“空行”,“单注释”,“块注释”。

I write the tool use flex.But the tool can't match the c++ block comment.

我写的工具使用flex.But该工具无法匹配c ++块注释。

1 flex code:

1个flex代码:

%{
    int block_flag = 0;
    int empty_num = 0;
    int single_line_num = 0;
    int block_line_num = 0;
    int line = 0;
%}

%%
^[\t ]*\n               {
    empty_num++;
    printf("empty line\n");
}
"//"    {
    single_line_num++;
    printf("single line comment\n");
}
"/*"  {
    block_flag = 1;
    block_line_num++;
    printf("block comment begin.block line:%d\n", block_line_num);
}

"*/"  {
    block_flag = 0;
    printf("block comment end.block line:%d\n", block_line_num);
}
^(.*)\n                 {
    if(block_flag)
    block_line_num++;
    else
    line++;
}

%%
int main(int argc , char *argv[])
{
    yyin = fopen(argv[1], "r");
    yylex();

    printf("lines :%d\n" ,line);
    fclose(yyin);

    return 0;
}

2 hello.c

bbg@ubuntu:~$ cat hello.c 
#include <stdlib.h>

//
//
/*
 */

/*   */

3 output

bbg@ubuntu:~$ ./a.out hello.c 
empty line
empty line
lines :6

Why the "//" and "/*" can't match the single comment and block comment ?

为什么“//”和“/ *”无法匹配单个注释和块注释?

1 个解决方案

#1


4  

Flex:

  1. doesn't search. It matches patterns sequentially, each one starting where the other one ends.

    不搜索。它按顺序匹配模式,每个模式从另一个结束的地方开始。

  2. always picks the pattern with the longest match. (If two or more patterns match exactly the same amount, it picks the first one.

    总是选择最长匹配的模式。 (如果两个或多个模式完全匹配相同的数量,则选择第一个模式。

So, you have

所以你有了

"//"   { /* Do something */ } 

and

^.*\n  { /* Do something else */ }

Suppose it has just matched the second one, so we're at the beginning of a line, and suppose the line starts //. Now, both these patterns match, but the second one matches the whole line, whereas the first one only matches two characters. So the second one wins. That wasn't what you wanted.

假设它刚刚匹配第二个,所以我们在一行的开头,并假设该行开始//。现在,这两个模式都匹配,但第二个匹配整行,而第一个匹配两个字符。所以第二个获胜。那不是你想要的。

Hint 1: You probably want // comments to match to the end of the line

提示1:您可能希望//注释匹配到行尾

Hint 2: There is a regular expression which will match /* comments, although it's a bit tedious: "/*"[^*]*"*"+([^*/][^*]*"*"+)*"/" Unfortunately, if you use that, it won't count line ends for you, but you should be able to adapt it to do what you want.

提示2:有一个正则表达式会匹配/ *注释,虽然它有点单调乏味:“/ *”[^ *] *“*”+([^ * /] [^ *] *“*”+) *“/”不幸的是,如果您使用它,它将不会为您计算行结束,但您应该能够调整它以执行您想要的操作。

Hint 3: You might want to think about comments which start in the middle of a line, possibly having been indented. You rule ^.*\n will swallow an entire line without even looking to see if there is a comment somewhere inside it.

提示3:你可能想要考虑从一行开始的注释,可能是缩进的。您规则^。* \ n将吞下整行,甚至没有查看其中是否有评论。

Hint 4: String literals hide comments.

提示4:字符串文字隐藏注释。

#1


4  

Flex:

  1. doesn't search. It matches patterns sequentially, each one starting where the other one ends.

    不搜索。它按顺序匹配模式,每个模式从另一个结束的地方开始。

  2. always picks the pattern with the longest match. (If two or more patterns match exactly the same amount, it picks the first one.

    总是选择最长匹配的模式。 (如果两个或多个模式完全匹配相同的数量,则选择第一个模式。

So, you have

所以你有了

"//"   { /* Do something */ } 

and

^.*\n  { /* Do something else */ }

Suppose it has just matched the second one, so we're at the beginning of a line, and suppose the line starts //. Now, both these patterns match, but the second one matches the whole line, whereas the first one only matches two characters. So the second one wins. That wasn't what you wanted.

假设它刚刚匹配第二个,所以我们在一行的开头,并假设该行开始//。现在,这两个模式都匹配,但第二个匹配整行,而第一个匹配两个字符。所以第二个获胜。那不是你想要的。

Hint 1: You probably want // comments to match to the end of the line

提示1:您可能希望//注释匹配到行尾

Hint 2: There is a regular expression which will match /* comments, although it's a bit tedious: "/*"[^*]*"*"+([^*/][^*]*"*"+)*"/" Unfortunately, if you use that, it won't count line ends for you, but you should be able to adapt it to do what you want.

提示2:有一个正则表达式会匹配/ *注释,虽然它有点单调乏味:“/ *”[^ *] *“*”+([^ * /] [^ *] *“*”+) *“/”不幸的是,如果您使用它,它将不会为您计算行结束,但您应该能够调整它以执行您想要的操作。

Hint 3: You might want to think about comments which start in the middle of a line, possibly having been indented. You rule ^.*\n will swallow an entire line without even looking to see if there is a comment somewhere inside it.

提示3:你可能想要考虑从一行开始的注释,可能是缩进的。您规则^。* \ n将吞下整行,甚至没有查看其中是否有评论。

Hint 4: String literals hide comments.

提示4:字符串文字隐藏注释。