flex/lex中的字符串文字的正则表达式

时间:2021-06-04 09:38:11

I'm experimenting to learn flex and would like to match string literals. My code currently looks like:

我正在尝试学习flex,并希望匹配字符串文字。我的代码现在看起来是:

"\""([^\n\"\\]*(\\[.\n])*)*"\""        {/*matches string-literal*/;}

I've been struggling with variations for an hour or so and can't get it working the way it should. I'm essentially hoping to match a string literal that can't contain a new-line (unless it's escaped) and supports escaped characters.

我已经和各种变化做了一个小时左右的斗争,但还是无法让它正常工作。我希望匹配不能包含换行(除非它是转义的)并支持转义字符的字符串文字。

I am probably just writing a poor regular expression or one incompatible with flex. Please advise!

我可能只是在编写一个糟糕的正则表达式或者与flex不兼容的表达式。请建议!

5 个解决方案

#1


53  

You'll find these links helpful

你会发现这些链接很有用

#2


93  

A string consists of a quote mark

字符串由引号组成

"

followed by zero or more of either an escaped anything

后面跟着零或更多的转义项

\\.

or a non-quote character

或non-quote字符

[^"\\]

and finally a terminating quote

最后是一个终止引语

"

Put it all together, and you've got

把它们放在一起,你就得到了

\"(\\.|[^"\\])*\"

The delimiting quotes are escaped because they are Flex meta-characters.

分隔引号被转义,因为它们是Flex的元字符。

#3


17  

For a single line... you can use this:

一行…您可以使用:

\"([^\\\"]|\\.)*\"  {/*matches string-literal on a single line*/;}

#4


8  

How about using a start state...

使用起始状态如何……

int enter_dblquotes = 0;

%x DBLQUOTES
%%

\"  { BEGIN(DBLQUOTES); enter_dblquotes++; }

<DBLQUOTES>*\" 
{ 
   if (enter_dblquotes){
       handle_this_dblquotes(yytext); 
       BEGIN(INITIAL); /* revert back to normal */
       enter_dblquotes--; 
   } 
}
         ...more rules follow...

It was similar to that effect (flex uses %s or %x to indicate what state would be expected. When the flex input detects a quote, it switches to another state, then continues lexing until it reaches another quote, in which it reverts back to the normal state.

它类似于这种效果(flex使用%s或%x来指示预期的状态。当flex输入检测到一个引用时,它会切换到另一个状态,然后继续lexing,直到它到达另一个引用,在这个引用中它返回到正常状态。

#5


0  

An answer that arrives late but which can be useful for the next one who will need it:

一个迟来但对下一个需要的人有用的回答:

\"(([^\"]|\\\")*[^\\])?\"

#1


53  

You'll find these links helpful

你会发现这些链接很有用

#2


93  

A string consists of a quote mark

字符串由引号组成

"

followed by zero or more of either an escaped anything

后面跟着零或更多的转义项

\\.

or a non-quote character

或non-quote字符

[^"\\]

and finally a terminating quote

最后是一个终止引语

"

Put it all together, and you've got

把它们放在一起,你就得到了

\"(\\.|[^"\\])*\"

The delimiting quotes are escaped because they are Flex meta-characters.

分隔引号被转义,因为它们是Flex的元字符。

#3


17  

For a single line... you can use this:

一行…您可以使用:

\"([^\\\"]|\\.)*\"  {/*matches string-literal on a single line*/;}

#4


8  

How about using a start state...

使用起始状态如何……

int enter_dblquotes = 0;

%x DBLQUOTES
%%

\"  { BEGIN(DBLQUOTES); enter_dblquotes++; }

<DBLQUOTES>*\" 
{ 
   if (enter_dblquotes){
       handle_this_dblquotes(yytext); 
       BEGIN(INITIAL); /* revert back to normal */
       enter_dblquotes--; 
   } 
}
         ...more rules follow...

It was similar to that effect (flex uses %s or %x to indicate what state would be expected. When the flex input detects a quote, it switches to another state, then continues lexing until it reaches another quote, in which it reverts back to the normal state.

它类似于这种效果(flex使用%s或%x来指示预期的状态。当flex输入检测到一个引用时,它会切换到另一个状态,然后继续lexing,直到它到达另一个引用,在这个引用中它返回到正常状态。

#5


0  

An answer that arrives late but which can be useful for the next one who will need it:

一个迟来但对下一个需要的人有用的回答:

\"(([^\"]|\\\")*[^\\])?\"