如何指示解析器不继续处理未终止的注释?

时间:2021-11-06 09:40:15

I'm working on improving error reporting on my compiler assignment. I'm handling unterminated comments in Flex using the following code:

我正在努力改进编译器分配的错误报告。我正在使用以下代码在Flex中处理未终止的注释:

<INITIAL>"/*"       {BEGIN(COMMENT);}
<COMMENT>"*/"       {BEGIN(INITIAL);}
<COMMENT>([^*]|\n)+   {}
<COMMENT><<EOF>>    {yyerror("UNTERMINATED COMMENT"); BEGIN(INITIAL);}

The issue is that the parser is printing its error message as well:

问题是解析器也在打印它的错误消息:

 $ ./comp tests/comments.cf 
 ERROR: UNTERMINATED COMMENT: 27
 ERROR: syntax error: 27

How can I instruct the parser not to continue its work? Adding an exit after BEGIN(INITIAL) gives me what I want, but it does not seem to be the way to deal with it.

如何指示解析器不继续其工作?在BEGIN(INITIAL)之后添加一个出口给了我我想要的东西,但它似乎不是处理它的方法。

1 个解决方案

#1


You should certainly return 0 (or something) from the <<EOF>> action, because if you don't, the lexer will try to continue scanning (which is undefined behaviour; a scanner should not continue to read input after an EOF has been signalled, unless it has arranged for there to be a new input buffer.)

你当然应该从<< EOF >>动作返回0(或其他东西),因为如果你不这样做,词法分析器会尝试继续扫描(这是未定义的行为;扫描仪不应该在EOF之后继续读取输入已经发出信号,除非它已经安排了新的输入缓冲区。)

It is certainly likely that an unterminated comment will result in a syntax error, since the end of the program has most likely been swallowed in the comment. If you don't want this error to be reported, then you could simply set a flag which yyerror checks before printing an error message. In this simple case, there would be no need to reset that flag, since the unterminated comment error can only occur at the end of input, and no error recovery is possible at that point.

毫无疑问,未终止的评论可能会导致语法错误,因为程序的结尾很可能已被评论中吞没。如果您不希望报告此错误,则可以在打印错误消息之前设置yyerror检查的标志。在这种简单的情况下,不需要重置该标志,因为未终止的注释错误只能在输入结束时发生,并且此时不可能进行错误恢复。

Bison itself has a mechanism to reduce spurious error reporting by suppressing "syntax error" calls to yyerror for three tokens after a syntax error is reported. There is limited access to this feature from within parser actions, but no access from outside of the parser so it cannot be enabled from a scanner action.

Bison本身有一种机制,通过在报告语法错误后抑制对三个令牌的yyerror的“语法错误”调用来减少虚假错误报告。解析器操作中对此功能的访问权限有限,但无法从解析器外部访问,因此无法从扫描程序操作启用此功能。

If you want a solution with a cleaner interface between the scanner and the parser, you might consider the following possibility:

如果您想在扫描仪和解析器之间使用更清晰的接口的解决方案,您可能会考虑以下可能性:

  1. In your lexer, when an unterminated comment is detected, return an otherwise unused token, say UNTERMINATED_COMMENT.

    在词法分析器中,当检测到未终止的注释时,返回一个未使用的令牌,例如UNTERMINATED_COMMENT。

  2. When the parser receives an UNTERMINATED_COMMENT token, it will immediately signal a syntax error (or almost immediately. Under some circumstances, it may perform some reductions before it even checks what the lookahead token is.) When yyerror is called, the value of the yychar global will be the lookahead token, so it will be UNTERMINATED_COMMENT; yyerror can use this fact to produce a more precise error message, rather than the generic "syntax error".

    当解析器收到UNTERMINATED_COMMENT标记时,它会立即发出语法错误信号(或几乎立即发出信号。在某些情况下,它甚至可以在检查前瞻标记之前执行一些减少。)当调用yyerror时,yychar的值全局将是先行标记,因此它将是UNTERMINATED_COMMENT; yyerror可以使用此事实来生成更精确的错误消息,而不是通用的“语法错误”。

  3. It is important to immediately terminate the parse at this point, since calling the scanner again will be undefined behaviour. That can be done by setting yychar to YYEOF in the yyerror function. (An alternative would be to include an error production with UNTERMINATED_COMMENT in the rhs, whose action is YYABORT.)

    此时立即终止解析非常重要,因为再次调用扫描程序将是未定义的行为。这可以通过在yyerror函数中将yychar设置为YYEOF来完成。 (另一种方法是在rhs中包含UNTERMINATED_COMMENT的错误生成,其操作为YYABORT。)

#1


You should certainly return 0 (or something) from the <<EOF>> action, because if you don't, the lexer will try to continue scanning (which is undefined behaviour; a scanner should not continue to read input after an EOF has been signalled, unless it has arranged for there to be a new input buffer.)

你当然应该从<< EOF >>动作返回0(或其他东西),因为如果你不这样做,词法分析器会尝试继续扫描(这是未定义的行为;扫描仪不应该在EOF之后继续读取输入已经发出信号,除非它已经安排了新的输入缓冲区。)

It is certainly likely that an unterminated comment will result in a syntax error, since the end of the program has most likely been swallowed in the comment. If you don't want this error to be reported, then you could simply set a flag which yyerror checks before printing an error message. In this simple case, there would be no need to reset that flag, since the unterminated comment error can only occur at the end of input, and no error recovery is possible at that point.

毫无疑问,未终止的评论可能会导致语法错误,因为程序的结尾很可能已被评论中吞没。如果您不希望报告此错误,则可以在打印错误消息之前设置yyerror检查的标志。在这种简单的情况下,不需要重置该标志,因为未终止的注释错误只能在输入结束时发生,并且此时不可能进行错误恢复。

Bison itself has a mechanism to reduce spurious error reporting by suppressing "syntax error" calls to yyerror for three tokens after a syntax error is reported. There is limited access to this feature from within parser actions, but no access from outside of the parser so it cannot be enabled from a scanner action.

Bison本身有一种机制,通过在报告语法错误后抑制对三个令牌的yyerror的“语法错误”调用来减少虚假错误报告。解析器操作中对此功能的访问权限有限,但无法从解析器外部访问,因此无法从扫描程序操作启用此功能。

If you want a solution with a cleaner interface between the scanner and the parser, you might consider the following possibility:

如果您想在扫描仪和解析器之间使用更清晰的接口的解决方案,您可能会考虑以下可能性:

  1. In your lexer, when an unterminated comment is detected, return an otherwise unused token, say UNTERMINATED_COMMENT.

    在词法分析器中,当检测到未终止的注释时,返回一个未使用的令牌,例如UNTERMINATED_COMMENT。

  2. When the parser receives an UNTERMINATED_COMMENT token, it will immediately signal a syntax error (or almost immediately. Under some circumstances, it may perform some reductions before it even checks what the lookahead token is.) When yyerror is called, the value of the yychar global will be the lookahead token, so it will be UNTERMINATED_COMMENT; yyerror can use this fact to produce a more precise error message, rather than the generic "syntax error".

    当解析器收到UNTERMINATED_COMMENT标记时,它会立即发出语法错误信号(或几乎立即发出信号。在某些情况下,它甚至可以在检查前瞻标记之前执行一些减少。)当调用yyerror时,yychar的值全局将是先行标记,因此它将是UNTERMINATED_COMMENT; yyerror可以使用此事实来生成更精确的错误消息,而不是通用的“语法错误”。

  3. It is important to immediately terminate the parse at this point, since calling the scanner again will be undefined behaviour. That can be done by setting yychar to YYEOF in the yyerror function. (An alternative would be to include an error production with UNTERMINATED_COMMENT in the rhs, whose action is YYABORT.)

    此时立即终止解析非常重要,因为再次调用扫描程序将是未定义的行为。这可以通过在yyerror函数中将yychar设置为YYEOF来完成。 (另一种方法是在rhs中包含UNTERMINATED_COMMENT的错误生成,其操作为YYABORT。)