Flex/Bison EOF从stdin vs文件传播

时间:2022-04-30 09:43:06

I have a scanner, parser and a main from which I create an executable via

我有一个扫描器、解析器和一个可以通过它创建可执行文件的主程序

bison -d parser.y; flex scanner.l; gcc main.c parer.tab.c lex.yy.c

野牛- d parser.y;flex scanner.l;gcc主要。c parer.tab。c lex.yy.c

When I run ./a.out it does what I want: If Ctrl+D is pressed an EOF is detected and main can act accordingly. This means: if yyin is stdin then hitting Return ends the parsing of that line and the main loop waits for the next input line. Pressing Ctrl+D ends parsing input with a break in the main loop and exits. If the input comes from a file, e,g, testFile that file can contain 1 expression to be parsed until an EOF. In the file scenario new lines should be eaten up like spaces and tabs. All this should behave like an interpreter when input is from stdin and like a script evaluator when the input is from a file. An example content of such a test file would be:test\n. Here the EOF is not detected. And I have trouble understanding why that is the case. In other words I'd like an extension of the question here to additionally work with input files

当我运行。/。它所做的是我想要的:如果按下Ctrl+D,就会检测到一个EOF, main就会相应操作。这意味着:如果yyin是stdin,那么点击Return就结束了这一行的解析,主循环等待下一个输入行。按下Ctrl+D结束解析输入,在主循环中中断并退出。如果输入来自文件e、g、testFile,则该文件可以包含一个要解析的表达式,直到EOF。在文件场景中,新行应该像空格和制表符一样被占用。当输入来自stdin时,所有这些都应该表现得像一个解释器,当输入来自文件时,应该表现得像一个脚本求值器。这样一个测试文件的示例内容是:test\n。这里没有检测到EOF。我很难理解为什么会这样。换句话说,我想把这个问题扩展到输入文件

parser.y:

parser.y:

%{
#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>

/* stuff from flex that bison needs to know about: */
int yylex();
int yyparse();
FILE *yyin;

static int parseValue;

void yyerror(const char *s);
%}

%token TWORD
%token TEOF
%token TJUNK

%start input 

%%
input: word                         {   printf("W"); parseValue =  1;   }   
    | eof                           {   printf("eof"); parseValue = -11;}
    | /* empty */                   {   printf("_"); parseValue = -1;   }   
    | error                         {   printf("E"); parseValue = -2;   }   
    ;

eof: TEOF
    ;

word: TWORD
    ;
%%

void yyerror(const char *s) {
    printf("nope...");
}

int getWord( FILE *file) {
    int err;

    if (file) {
        yyin = file;
    } else /* error */ {
        printf("file not valid");
        return -3; 
    }   

    err = yyparse();
    if (!err) {
        return parseValue;
    } else /* error */ {
        printf("parse error");
        return -4;
    }
}

scanner.l:

scanner.l:

%{
#include <stdio.h>
#include "parser.tab.h"
#define YYSTYPE int

int yylex();
%}

/* avoid: implicit declaration of function ‘fileno’ */
/*%option always-interactive*/

%option noyywrap
/* to avoid warning: ‘yyunput’ defined but not used */
%option nounput
/* to avoid warning: ‘input’ defined but not used */
%option noinput

%%
<<EOF>>                     {   return TEOF;    }
[ \t]                       {   }
[\n]                        {   if (yyin == stdin) return 0;   }
[a-zA-Z][a-zA-Z0-9]*        {   return TWORD; }
.                           {   return TJUNK;   }
%%

main.c:

c:

#include <stdio.h>
#include <stdlib.h>
#include <stdarg.h>
#include <stdbool.h>

int main(int argc, char *argv[]) {

    int result = 0;
    FILE *fOut = stdout, *fIn = stdin;

    /* skip over program name */
    ++argv, --argc;
    if ( argc > 0 ) { 
        fIn = fopen( argv[0], "r" );
    }   

    while (true) {
        fprintf(fOut, "\nTEST : ", result);

        result = getWord(fIn);

        if (result == -11) {
            printf(" %i ", result); printf("--> EOF");
            break;
        }   
        if (result < 0) {
            printf(" %i ", result); printf("--> <0");
            /*continue;*/
            break;
        }   

        fprintf(fOut, " => %i", result);
    }   

    fprintf(fOut, "\n\n done \n ");
    exit(EXIT_SUCCESS);
}

I have tried to rewrite the parse according to suggestions made here or here, without much success. What is correct way for main to become aware of an EOF when input is read from a file?

我试着根据这里或这里的建议重写解析,但没有取得多大的成功。当从文件中读取输入时,main如何才能意识到EOF ?

Update: One suggestion was that the issue may be due to the return 0; on the \n. As a quick test, I only return 0 if yyin == stin but calling ./a.out testFile still does not catch the EOF. Update 2: I got this to work via using yywrap. I got rid of all the TEOF stuff. The scanner has a part:

更新:一个建议是问题可能是由于返回0;在\ n。作为一个快速测试,如果yyin == stin但调用。out testFile仍然没有捕获EOF。更新2:我是通过使用yywrap实现的。我把所有的东西都处理掉了。扫描仪有一个部分:

extern int eof;

and at the end:

最后:

int yywrap() {
    eof = 1;
    return 1;
}

In the parser there is a:

在解析器中有一个:

int eof = 0;

and further down in the file:

在文件的下方:

err = yyparse();
if (err != 0) return -4;
else if (eof) return -11;
else return parseValue;

If someone can show me a more elegant solution, I'd still appreciate that. This is probably a good way to make a clean version.

如果有人能给我一个更优雅的解决方案,我会很感激。这可能是制作干净版本的好方法。

1 个解决方案

#1


2  

As noted in your links, flex has syntax for recognizing the end of an input file or stream (e.g., an input from a string).

正如在您的链接中所指出的,flex有用于识别输入文件或流的结尾的语法(例如,来自字符串的输入)。

In fact, flex effectively has such a rule operating at all times. By default, the rule calls yywrap. You turned this off (with %noyywrap). That's fine, except...

事实上,flex有效地在任何时候都有这样的规则。默认情况下,规则调用ywrap。你关掉了这个(用% noywrap)。这很好,除了……

The default action on encountering an "EOF token" is to return 0.

遇到“EOF令牌”时的默认操作是返回0。

The parsers generated by bison (and byacc) need to see this zero token. See this answer to END OF FILE token with flex and bison (only works without it).

bison(和byacc)生成的解析器需要看到这个零标记。使用flex和bison(只有在没有它的情况下工作)才能看到这个答案。

Your lexer returns a 0 token on encountering a newline. That will cause all kinds of trouble. and is no doubt leading to what you observe when reading from a file.

遇到换行符时,您的lexer返回一个0令牌。这会引起各种各样的麻烦。毫无疑问,当你阅读一个文件时,你会发现。


Edit: OK, with that out of the way and the update applied, let's consider your grammar.

编辑:好吧,既然已经讲完了,我们来看看你的语法。

Remember that bison adds a special production that looks for the zero-token. Let's represent that with $ (as people generally do, or sometimes it's $end). So your entire grammar (with no actions and with "error" removed since it's also special) is:

请记住,bison添加了一个特殊的产品,寻找零标记。让我们用$来表示它(正如人们通常所做的,或者有时它是$end)。所以你的整个语法(没有动作,并且去掉了“错误”,因为它也是特殊的)是:

$all : input $;

input: word | eof | /* empty */;

word: TWORD;

eof: TEOF;

which means the only sentences your grammar accepts are:

这意味着你的语法所接受的句子只有:

TWORD $

or:

或者:

TEOF $

or:

或者:

$

So when you call yyparse(), the loop inside yyparse() will read-ahead one token from the lexer and accept (and return) the result if the token is the zero-valued end-of-file $. If not, the token needs to be one of TWORD or TEOF (anything else results in a call to yyerror() and an attempt to resync). If the token is one of the two valid tokens, yyparse() will call the lexer once more to verify that the next token is the zero-valued end-of-file $ token.

因此,当您调用yyparse()时,yyparse()内的循环将从lexer中读取一个令牌,并在令牌是文件末值为$的情况下接受(并返回)结果。如果不是,则该令牌必须是TWORD或TEOF之一(任何其他操作都会导致调用yerror()并尝试重新同步)。如果令牌是两个有效的令牌之一,yyparse()将再次调用lexer来验证下一个令牌是零值的结束文件$ token。

If all of that succeeds, yyparse() will return success.

如果所有这些成功,yyparse()将返回成功。

Adding the actions back in, you should see printf output, and get a value stored in parseValue, based on whichever reduction rule is used to recognize the (at most one) token.

将操作添加回,您应该会看到printf输出,并根据用于识别(至多一个)令牌的任何简化规则,获得存储在parseValue中的值。

#1


2  

As noted in your links, flex has syntax for recognizing the end of an input file or stream (e.g., an input from a string).

正如在您的链接中所指出的,flex有用于识别输入文件或流的结尾的语法(例如,来自字符串的输入)。

In fact, flex effectively has such a rule operating at all times. By default, the rule calls yywrap. You turned this off (with %noyywrap). That's fine, except...

事实上,flex有效地在任何时候都有这样的规则。默认情况下,规则调用ywrap。你关掉了这个(用% noywrap)。这很好,除了……

The default action on encountering an "EOF token" is to return 0.

遇到“EOF令牌”时的默认操作是返回0。

The parsers generated by bison (and byacc) need to see this zero token. See this answer to END OF FILE token with flex and bison (only works without it).

bison(和byacc)生成的解析器需要看到这个零标记。使用flex和bison(只有在没有它的情况下工作)才能看到这个答案。

Your lexer returns a 0 token on encountering a newline. That will cause all kinds of trouble. and is no doubt leading to what you observe when reading from a file.

遇到换行符时,您的lexer返回一个0令牌。这会引起各种各样的麻烦。毫无疑问,当你阅读一个文件时,你会发现。


Edit: OK, with that out of the way and the update applied, let's consider your grammar.

编辑:好吧,既然已经讲完了,我们来看看你的语法。

Remember that bison adds a special production that looks for the zero-token. Let's represent that with $ (as people generally do, or sometimes it's $end). So your entire grammar (with no actions and with "error" removed since it's also special) is:

请记住,bison添加了一个特殊的产品,寻找零标记。让我们用$来表示它(正如人们通常所做的,或者有时它是$end)。所以你的整个语法(没有动作,并且去掉了“错误”,因为它也是特殊的)是:

$all : input $;

input: word | eof | /* empty */;

word: TWORD;

eof: TEOF;

which means the only sentences your grammar accepts are:

这意味着你的语法所接受的句子只有:

TWORD $

or:

或者:

TEOF $

or:

或者:

$

So when you call yyparse(), the loop inside yyparse() will read-ahead one token from the lexer and accept (and return) the result if the token is the zero-valued end-of-file $. If not, the token needs to be one of TWORD or TEOF (anything else results in a call to yyerror() and an attempt to resync). If the token is one of the two valid tokens, yyparse() will call the lexer once more to verify that the next token is the zero-valued end-of-file $ token.

因此,当您调用yyparse()时,yyparse()内的循环将从lexer中读取一个令牌,并在令牌是文件末值为$的情况下接受(并返回)结果。如果不是,则该令牌必须是TWORD或TEOF之一(任何其他操作都会导致调用yerror()并尝试重新同步)。如果令牌是两个有效的令牌之一,yyparse()将再次调用lexer来验证下一个令牌是零值的结束文件$ token。

If all of that succeeds, yyparse() will return success.

如果所有这些成功,yyparse()将返回成功。

Adding the actions back in, you should see printf output, and get a value stored in parseValue, based on whichever reduction rule is used to recognize the (at most one) token.

将操作添加回,您应该会看到printf输出,并根据用于识别(至多一个)令牌的任何简化规则,获得存储在parseValue中的值。