重置flex和/或bison的状态

时间:2022-11-16 09:33:24

As part of a toy project I've been trying to make a small modification of someone else's parser based on flex/bison. I'm really not experienced with either. You can find the original parser here.

作为玩具项目的一部分,我一直试图根据flex / bison对其他人的解析器做一个小修改。我真的没有经验。你可以在这里找到原始的解析器。

I've been trying to put together a simple function that accepts a string and returns a parse tree, so I can expose this via FFI for use in another programming language. What I have is mostly based on the main() function in the original program, my butchered version is below:

我一直在尝试组合一个接受字符串并返回解析树的简单函数,因此我可以通过FFI公开它以用于另一种编程语言。我所拥有的主要是基于原始程序中的main()函数,我的屠宰版本如下:

TreeNode* parse_string(char *s)
{
    FILE *in = fmemopen(s, strlen(s), "r");
    lex2_initialise();
    parse_file(in);
    fclose(in);
    preprocess_tokens();
    yyparse();
    return top;
}

This actually works fine, at least the first time I call it. The second time it complains about misparsed tokens, and the error reporting function used appears to be called from somewhere inside a maze of goto statements within the generated parser during the call to yyparse(), at which point I don't understand what's going on anymore.

这实际上工作正常,至少我第一次调用它。第二次抱怨错误的令牌,并且在调用yyparse()期间,生成的解析器中的goto语句迷宫中的某个地方似乎调用了错误报告函数,此时我不明白发生了什么了。

The original program itself only appears to be designed to take all its input upfront and then exit, so it doesn't leave me with much clue of what I'm missing. Putting aside the not-altogether-outlandish idea some old state is being retained elsewhere in the rest of the program, my main questions are:

原始程序本身似乎只是为了提前获取所有输入然后退出,所以它不会让我对我所缺少的内容有很多线索。撇开一些不完全古怪的想法,一些旧的国家在其他方案中保留在其他地方,我的主要问题是:

  • Do either Flex or Bison maintain global state between calls to yyparse()
  • Flex或Bison是否在调用yyparse()之间保持全局状态

  • Is there some simple function call I could put at the end of the function above to wipe it all and reset everything back to the initial state?
  • 是否有一些简单的函数调用我可以在上面的函数结束时擦除它并将所有内容重置回初始状态?

2 个解决方案

#1


Do either Flex or Bison maintain global state between calls to yyparse()

Flex或Bison是否在调用yyparse()之间保持全局状态

Flex maintains information about the current input stream. If the parse does not consume the entire input stream (which is quite common for parsers which terminate abnormally on errors), then the next call to yyparse will continue reading from where the previous one left off. Providing a new input buffer will (mostly) reset the lexer's state, but there may be some aspects which have not been reset, notably the current start condition, and the condition stack if that option has been enabled.

Flex维护有关当前输入流的信息。如果解析不消耗整个输入流(这对于在错误上异常终止的解析器很常见),则对yyparse的下一次调用将继续从前一个停止的位置读取。提供新的输入缓冲区将(大部分)重置词法分析器的状态,但可能存在一些尚未重置的方面,特别是当前的启动条件,以及条件堆栈(如果已启用该选项)。

The bison-generated parser does not rely on global state. It is designed to clear its internal state prior to returning from yyparse. However, if a parser action executes a return statement directly (this is not recommended), then the cleanup will be bypassed, which is likely to create a memory leak. Actions which prematurely terminate the parse should use the macros YYACCEPT or YYABORT rather than a return statement.

野牛生成的解析器不依赖于全局状态。它旨在清除其从yyparse返回之前的内部状态。但是,如果解析器操作直接执行return语句(不建议这样做),则将绕过清理,这可能会造成内存泄漏。提前终止解析的操作应使用宏YYACCEPT或YYABORT而不是return语句。

Is there some simple function call I could put at the end of the function above to wipe it all and reset everything back to the initial state?

是否有一些简单的函数调用我可以在上面的函数结束时擦除它并将所有内容重置回初始状态?

The default flex-generated parser, which is designed to be called every time a token is required, is heavily reliant on global variables. Most, but not all, of the flex state is maintained in the current YY_BUFFER_STATE (which is kept in a global variable), and that object can be reset by the yyreset function, or any of the functions which provide a character buffer as lexer input. However, these functions do not reset the start condition nor do they flush the condition stack (if enabled), or the buffer stack. If you want to reset the state completely, you need to flush the stacks manually, and reset the start condition with BEGIN(INITIAL).

默认的flex生成的解析器设计为每次需要令牌时调用,它严重依赖于全局变量。大多数(但不是全部)弹性状态保持在当前YY_BUFFER_STATE(保存在全局变量中),并且该对象可以通过yyreset函数或任何提供字符缓冲区作为词法输入的函数来重置。但是,这些函数不会重置启动条件,也不会刷新条件堆栈(如果启用)或缓冲区堆栈。如果要完全重置状态,则需要手动刷新堆栈,并使用BEGIN(INITIAL)重置启动条件。

One approach to making a more easily restartable scanner is to build a reentrant scanner. A reentrant scanner keeps all of its state (including start conditions and buffer stack) in a scanner structure, which means that you can completely reset the scanner state simply by creating a new scanner structure (and, of course, destroying the old one to avoid leaking memory.)

制作更易于重启的扫描仪的一种方法是构建一个可重入的扫描仪。可重入扫描器将其所有状态(包括启动条件和缓冲区堆栈)保持在扫描仪结构中,这意味着您只需创建一个新的扫描仪结构即可完全重置扫描仪状态(当然,还要破坏旧的扫描仪结构以避免泄漏的记忆。)

There are lots of good reasons to use reentrant scanners [Note 1]. For one thing, it allows you to have more than one parser active at the same time, and it eliminates a reliance on global state. But unfortunately, it's not as simple as just setting a flex options.

使用可重入扫描仪有很多充分的理由[注1]。首先,它允许您同时激活多个解析器,并且它消除了对全局状态的依赖。但不幸的是,它并不像设置弹性选项那么简单。

Reentrant scanners have a different API (which includes a pointer to the scanner state structure). This state structure needs to be passed into yyparse and yyparse needs to pass it to yylex; all of this requires some modifications to the bison options. Also, reentrant scanners cannot use the global yylval to communicate the semantic value of a token to the parser [Note 2].

可重入扫描程序具有不同的API(包括指向扫描程序状态结构的指针)。这种状态结构需要传递给yyparse,yyparse需要将它传递给yylex;所有这些都需要对野牛选项进行一些修改。此外,可重入扫描程序无法使用全局yylval将令牌的语义值传递给解析器[注2]。

If you use the %bison-bridge option and tell bison to generate a reentrant parser, then yylex will expect to be called with another additional parameter (or two, if you use locations), and the reentrant bison parser will supply the additional parameters. That all works fine, but it has the effect of changing yylval (and yylloc, if used) to a pointer, which means that you need to go through all the scanner actions changing yylval.something to yylval->something.

如果您使用%bison-bridge选项并告诉bison生成可重入的解析器,那么yylex将期望使用另一个附加参数(或两个,如果您使用位置)调用,并且可重入的Bison解析器将提供其他参数。一切正常,但它具有将yylval(和yylloc,如果使用)更改为指针的效果,这意味着您需要完成所有扫描程序操作,将yylval.something更改为yylval-> something。

Notes

  1. You can also create a reentrant parser, using some additional bison options. Normally, the only mutable globals used by a bison-generated parser are yylval and yylloc (if you use location reporting). (And yynerrs, but it is rare to refer to that variable outside of a parser action.) Specifying a reentrant parser turns those globals into lexer arguments, but it does not create an externally visible parser state structure. But it also gives you the option of using a "push parser", which does have a persistent parser state structure. In some cases, the flexibility of push parsers can significantly simplify scanners.

    您还可以使用一些额外的野牛选项创建可重入的解析器。通常,bison生成的解析器使用的唯一可变全局变量是yylval和yylloc(如果使用位置报告)。 (和yynerrs一样,但很少在解析器操作之外引用该变量。)指定一个可重入的解析器将这些全局变量转换为lexer参数,但它不会创建外部可见的解析器状态结构。但它也为您提供了使用“推送解析器”的选项,它具有持久的解析器状态结构。在某些情况下,推送解析器的灵活性可以显着简化扫描仪。

  2. Strictly speaking, nothing stops you from creating a reentrant scanner which still uses globals to communicate with the parser, except that it is not really reentrant any more. I wouldn't recommend this option for obvious reasons, but you might want to do it as a transitional strategy, since it requires less modification to the parser and to scanner actions.

    严格来说,没有什么可以阻止你创建一个仍然使用全局变量与解析器通信的可重入扫描器,除了它不再是真正的可重入。出于显而易见的原因,我不建议使用此选项,但您可能希望将其作为过渡策略来执行,因为它需要对解析器和扫描程序操作进行较少的修改。

#2


Even if you are using a non-reentrant parser, you can use yylex_destroy (without arguments) after lexing to force an initialisation, the next time the the lexer is invoked:

即使您使用的是非重入解析器,也可以在lexing之后使用yylex_destroy(不带参数)强制初始化,下次调用词法分析器时:

extern int yylex_destroy(void);
...
// do parsing here
...
yylex_destroy()

For reentrant parsers see here.

对于重入解析器,请参见此处。

#1


Do either Flex or Bison maintain global state between calls to yyparse()

Flex或Bison是否在调用yyparse()之间保持全局状态

Flex maintains information about the current input stream. If the parse does not consume the entire input stream (which is quite common for parsers which terminate abnormally on errors), then the next call to yyparse will continue reading from where the previous one left off. Providing a new input buffer will (mostly) reset the lexer's state, but there may be some aspects which have not been reset, notably the current start condition, and the condition stack if that option has been enabled.

Flex维护有关当前输入流的信息。如果解析不消耗整个输入流(这对于在错误上异常终止的解析器很常见),则对yyparse的下一次调用将继续从前一个停止的位置读取。提供新的输入缓冲区将(大部分)重置词法分析器的状态,但可能存在一些尚未重置的方面,特别是当前的启动条件,以及条件堆栈(如果已启用该选项)。

The bison-generated parser does not rely on global state. It is designed to clear its internal state prior to returning from yyparse. However, if a parser action executes a return statement directly (this is not recommended), then the cleanup will be bypassed, which is likely to create a memory leak. Actions which prematurely terminate the parse should use the macros YYACCEPT or YYABORT rather than a return statement.

野牛生成的解析器不依赖于全局状态。它旨在清除其从yyparse返回之前的内部状态。但是,如果解析器操作直接执行return语句(不建议这样做),则将绕过清理,这可能会造成内存泄漏。提前终止解析的操作应使用宏YYACCEPT或YYABORT而不是return语句。

Is there some simple function call I could put at the end of the function above to wipe it all and reset everything back to the initial state?

是否有一些简单的函数调用我可以在上面的函数结束时擦除它并将所有内容重置回初始状态?

The default flex-generated parser, which is designed to be called every time a token is required, is heavily reliant on global variables. Most, but not all, of the flex state is maintained in the current YY_BUFFER_STATE (which is kept in a global variable), and that object can be reset by the yyreset function, or any of the functions which provide a character buffer as lexer input. However, these functions do not reset the start condition nor do they flush the condition stack (if enabled), or the buffer stack. If you want to reset the state completely, you need to flush the stacks manually, and reset the start condition with BEGIN(INITIAL).

默认的flex生成的解析器设计为每次需要令牌时调用,它严重依赖于全局变量。大多数(但不是全部)弹性状态保持在当前YY_BUFFER_STATE(保存在全局变量中),并且该对象可以通过yyreset函数或任何提供字符缓冲区作为词法输入的函数来重置。但是,这些函数不会重置启动条件,也不会刷新条件堆栈(如果启用)或缓冲区堆栈。如果要完全重置状态,则需要手动刷新堆栈,并使用BEGIN(INITIAL)重置启动条件。

One approach to making a more easily restartable scanner is to build a reentrant scanner. A reentrant scanner keeps all of its state (including start conditions and buffer stack) in a scanner structure, which means that you can completely reset the scanner state simply by creating a new scanner structure (and, of course, destroying the old one to avoid leaking memory.)

制作更易于重启的扫描仪的一种方法是构建一个可重入的扫描仪。可重入扫描器将其所有状态(包括启动条件和缓冲区堆栈)保持在扫描仪结构中,这意味着您只需创建一个新的扫描仪结构即可完全重置扫描仪状态(当然,还要破坏旧的扫描仪结构以避免泄漏的记忆。)

There are lots of good reasons to use reentrant scanners [Note 1]. For one thing, it allows you to have more than one parser active at the same time, and it eliminates a reliance on global state. But unfortunately, it's not as simple as just setting a flex options.

使用可重入扫描仪有很多充分的理由[注1]。首先,它允许您同时激活多个解析器,并且它消除了对全局状态的依赖。但不幸的是,它并不像设置弹性选项那么简单。

Reentrant scanners have a different API (which includes a pointer to the scanner state structure). This state structure needs to be passed into yyparse and yyparse needs to pass it to yylex; all of this requires some modifications to the bison options. Also, reentrant scanners cannot use the global yylval to communicate the semantic value of a token to the parser [Note 2].

可重入扫描程序具有不同的API(包括指向扫描程序状态结构的指针)。这种状态结构需要传递给yyparse,yyparse需要将它传递给yylex;所有这些都需要对野牛选项进行一些修改。此外,可重入扫描程序无法使用全局yylval将令牌的语义值传递给解析器[注2]。

If you use the %bison-bridge option and tell bison to generate a reentrant parser, then yylex will expect to be called with another additional parameter (or two, if you use locations), and the reentrant bison parser will supply the additional parameters. That all works fine, but it has the effect of changing yylval (and yylloc, if used) to a pointer, which means that you need to go through all the scanner actions changing yylval.something to yylval->something.

如果您使用%bison-bridge选项并告诉bison生成可重入的解析器,那么yylex将期望使用另一个附加参数(或两个,如果您使用位置)调用,并且可重入的Bison解析器将提供其他参数。一切正常,但它具有将yylval(和yylloc,如果使用)更改为指针的效果,这意味着您需要完成所有扫描程序操作,将yylval.something更改为yylval-> something。

Notes

  1. You can also create a reentrant parser, using some additional bison options. Normally, the only mutable globals used by a bison-generated parser are yylval and yylloc (if you use location reporting). (And yynerrs, but it is rare to refer to that variable outside of a parser action.) Specifying a reentrant parser turns those globals into lexer arguments, but it does not create an externally visible parser state structure. But it also gives you the option of using a "push parser", which does have a persistent parser state structure. In some cases, the flexibility of push parsers can significantly simplify scanners.

    您还可以使用一些额外的野牛选项创建可重入的解析器。通常,bison生成的解析器使用的唯一可变全局变量是yylval和yylloc(如果使用位置报告)。 (和yynerrs一样,但很少在解析器操作之外引用该变量。)指定一个可重入的解析器将这些全局变量转换为lexer参数,但它不会创建外部可见的解析器状态结构。但它也为您提供了使用“推送解析器”的选项,它具有持久的解析器状态结构。在某些情况下,推送解析器的灵活性可以显着简化扫描仪。

  2. Strictly speaking, nothing stops you from creating a reentrant scanner which still uses globals to communicate with the parser, except that it is not really reentrant any more. I wouldn't recommend this option for obvious reasons, but you might want to do it as a transitional strategy, since it requires less modification to the parser and to scanner actions.

    严格来说,没有什么可以阻止你创建一个仍然使用全局变量与解析器通信的可重入扫描器,除了它不再是真正的可重入。出于显而易见的原因,我不建议使用此选项,但您可能希望将其作为过渡策略来执行,因为它需要对解析器和扫描程序操作进行较少的修改。

#2


Even if you are using a non-reentrant parser, you can use yylex_destroy (without arguments) after lexing to force an initialisation, the next time the the lexer is invoked:

即使您使用的是非重入解析器,也可以在lexing之后使用yylex_destroy(不带参数)强制初始化,下次调用词法分析器时:

extern int yylex_destroy(void);
...
// do parsing here
...
yylex_destroy()

For reentrant parsers see here.

对于重入解析器,请参见此处。