我该如何用C语言写一个口译员呢?

时间:2023-01-27 11:21:10

I'd love some references, or tips, possibly an e-book or two. I'm not looking to write a compiler, just looking for a tutorial I could follow along and modify as I go. Thank you for being understanding!

我想要一些推荐信,或者是一本电子书或两本。我不是在写编译器,我只是在寻找一个教程,我可以跟随和修改随着我。感谢您的理解!

BTW: It must be C.

顺便说一下,一定是C。

Any more replies would be appreciated.

如有任何答复,我们将不胜感激。

3 个解决方案

#1


22  

A great way to get started writing an interpreter is to write a simple machine simulator. Here's a simple language you can write an interpreter for:

开始编写解释器的一个好方法是编写一个简单的机器模拟器。这里有一个简单的语言,你可以为:

The language has a stack and 6 instructions:

该语言有一个堆栈和6个说明:

push <num> # push a number on to the stack

推入 #将数字推入堆栈

pop # pop off the first number on the stack

弹出#弹出堆栈上的第一个数字

add # pop off the top 2 items on the stack and push their sum on to the stack. (remember you can add negative numbers, so you have subtraction covered too). You can also get multiplication my creating a loop using some of the other instructions with this one.

添加#弹出堆栈上的前两项,并将它们的和推入堆栈。(记住你可以加负数,所以也可以加减法)你也可以得到乘法我创建一个循环使用一些其他的指令与这个。

ifeq <address> # examine the top of the stack, if it's 0, continue, else, jump to <address> where <address> is a line number

ifeq

#检查堆栈顶部,如果为0,继续,否则跳到
,其中
为行号

jump <address> # jump to a line number

跳转 <地址> #跳转到行号

print # print the value at the top of the stack

打印#打印堆栈顶部的值

dup # push a copy of what's at the top of the stack back onto the stack.

将堆栈顶部的内容的副本推回堆栈。

Once you've written a program that can take these instructions and execute them, you've essentially created a very simple stack based virtual machine. Since this is a very low level language, you won't need to understand what an AST is, how to parse a grammar into an AST, and translate it to machine code, etc. That's too complicated for a tutorial project. Start with this, and once you've created this little VM, you can start thinking about how you can translate some common constructs into this machine. e.g. you might want to think about how you might translate a C if/else statement or while loop into this language.

一旦您编写了一个可以接受这些指令并执行它们的程序,您实际上就创建了一个非常简单的基于堆栈的虚拟机。由于这是一种非常低级的语言,您不需要了解什么是AST,如何将语法解析为AST,并将其转换为机器码,等等。这对于教程项目来说太复杂了。从这个开始,一旦您创建了这个小VM,您就可以开始考虑如何将一些常见的结构转换到这个机器中。你可能想要考虑如何将C if/else语句或while循环翻译成这种语言。

Edit:

编辑:

From the comments below, it sounds like you need a bit more experience with C before you can tackle this task.

从下面的评论来看,在处理这个任务之前,您需要更多的C经验。

What I would suggest is to first learn about the following topics:

我的建议是首先了解以下主题:

  • scanf, printf, putchar, getchar - basic C IO functions
  • scanf, printf, putchar, getchar - basic C IO函数
  • struct - the basic data structure in C
  • 结构——C中的基本数据结构。
  • malloc - how to allocate memory, and the difference between stack memory and heap memory
  • malloc -如何分配内存,堆栈内存和堆内存之间的区别
  • linked lists - and how to implement a stack, then perhaps a binary tree (you'll need to understand structs and malloc first)
  • 链接列表——以及如何实现堆栈,然后可能是一棵二叉树(首先需要理解结构和malloc)

Then it'll be good to learn a bit more about the string.h library as well - strcmp, strdup - a couple useful string functions that will be useful.

那么多了解一些关于弦的知识就好了。h库以及strcmp、strdup——两个有用的字符串函数。

In short, C has a much higher learning curve compared to python, just because it's a lower level language and you have to manage your own memory, so it's good to learn a few basic things about C first before trying to write an interpreter, even if you already know how to write one in python.

简而言之,C有更高的学习曲线与python相比,仅仅因为它是一个低水平的语言,你必须管理自己的记忆,所以最好学习一些基本的关于C之前试图编写一个解释器,即使你已经知道如何在python中编写一个。

#2


15  

The only difference between an interpreter and a compiler is that instead of generating code from the AST, you execute it in a VM instead. Once you understand this, almost any compiler book, even the Red Dragon Book (first edition, not second!), is enough.

解释器和编译器之间的唯一区别是,您不是从AST生成代码,而是在VM中执行。一旦你理解了这一点,几乎任何编译器书籍,甚至红龙图书(第一版,而不是第二版!)都足够了。

#3


4  

I see this is a bit of a late reply, however since this thread showed up at second place in the result list when I did a search for writing an interpreter and no one have mentioned anything very concrete I will provide the following example:

我看到这是一个有点迟的回复,但是由于这个线程出现在结果列表的第二位,当我搜索一个解释器时,没有人提到任何非常具体的东西,我将提供以下示例:

Disclaimer: This is just some simple code I wrote in a hurry in order to have a foundation for the explanation below and are therefore not perfect, but it compiles and runs, and seems to give the expected answers.

免责声明:这只是我匆忙编写的一些简单代码,目的是为下面的解释打下基础,因此并不完美,但它编译并运行,似乎给出了预期的答案。

Read the following C-code from bottom to top:

从下到上阅读下面的C-code:

#include <stdio.h>
#include <stdlib.h>

double expression(void);

double vars[26]; // variables

char get(void) { char c = getchar(); return c; } // get one byte
char peek(void) { char c = getchar(); ungetc(c, stdin); return c; } // peek at next byte
double number(void) { double d; scanf("%lf", &d); return d; } // read one double

void expect(char c) { // expect char c from stream
    char d = get();
    if (c != d) {
        fprintf(stderr, "Error: Expected %c but got %c.\n", c, d);
    }
}

double factor(void) { // read a factor
    double f;
    char c = peek();
    if (c == '(') { // an expression inside parantesis?
        expect('(');
        f = expression();
        expect(')');
    } else if (c >= 'A' && c <= 'Z') { // a variable ?
        expect(c);
        f = vars[c - 'A'];
    } else { // or, a number?
        f = number();
    }
    return f;
}

double term(void) { // read a term
    double t = factor();
    while (peek() == '*' || peek() == '/') { // * or / more factors
        char c = get();
        if (c == '*') {
            t = t * factor();
        } else {
            t = t / factor();
        }
    }
    return t;
}

double expression(void) { // read an expression
    double e = term();
    while (peek() == '+' || peek() == '-') { // + or - more terms
        char c = get();
        if (c == '+') {
            e = e + term();
        } else {
            e = e - term();
        }
    }
    return e;
}

double statement(void) { // read a statement
    double ret;
    char c = peek();
    if (c >= 'A' && c <= 'Z') { // variable ?
        expect(c);
        if (peek() == '=') { // assignment ?
            expect('=');
            double val = expression();
            vars[c - 'A'] = val;
            ret = val;
        } else {
            ungetc(c, stdin);
            ret = expression();
        }
    } else {
        ret = expression();
    }
    expect('\n');
    return ret;
}

int main(void) {
    printf("> "); fflush(stdout);

    for (;;) {
        double v = statement();
        printf(" = %lf\n> ", v); fflush(stdout);
    }
    return EXIT_SUCCESS;
}

This is an simple recursive descend parser for basic mathematical expressions supporting one letter variables. Running it and typing some statements yields the following results:

这是一个简单的递归下降解析器,用于支持一个字母变量的基本数学表达式。运行它并输入一些语句会得到以下结果:

> (1+2)*3
 = 9.000000
> A=1
 = 1.000000
> B=2
 = 2.000000
> C=3
 = 3.000000
> (A+B)*C
 = 9.000000

You can alter the get(), peek() and number() to read from a file or list of code lines. Also you should make a function to read identifiers (basically words). Then you expand the statement() function to be able to alter which line it runs next in order to do branching. Last you add the branch operations you want to the statement function, like

可以将get()、peek()和number()修改为从文件或代码行列表中读取。还应该创建一个函数来读取标识符(基本上是单词)。然后展开statement()函数,以便能够更改接下来运行的行,以便进行分支。最后,将希望的分支操作添加到语句函数中,例如

if "condition" then 
    "statements" 
else 
    "statements" 
endif. 

while "condition" do
    "statements"
endwhile

function fac(x)
   if x = 0 then
      return 1
   else
      return x*fac(x-1) 
   endif
endfunction

Obviously you can decide the syntax to be as you like. You need to think about ways of define functions and how to handle arguments/parameter variables, local variables and global variables. If preferable arrays and data structures. References∕pointers. Input/output? In order to handle recursive function calls you probably need to use a stack.

显然,您可以根据自己的喜好来决定语法。您需要考虑定义函数的方法,以及如何处理参数/参数变量、局部变量和全局变量。如果更好的数组和数据结构。∕指针的引用。输入/输出?为了处理递归函数调用,您可能需要使用堆栈。

In my opinion this would be easier to do all this with C++ and STL. Where for example one std::map could be used to hold local variables, and another map could be used for globals...

在我看来,用c++和STL做这些事情要容易得多。例如,一个std::map可以用于保存局部变量,另一个map可以用于全局变量……

It is of course possible to write a compiler that build an abstract syntax tree out of the code. Then travels this tree in order to produce either machine code or some kind of byte code which executed on a virtual machine (like Java and .Net). This gives better performance than naively parse line by line and executing them, but in my opinion that is NOT writing an interpreter. That is writing both a compiler and its targeted virtual machine.

当然,可以编写一个编译器,用代码构建一个抽象语法树。然后遍历这棵树,以生成机器代码或在虚拟机上执行的某种字节代码(如Java和. net)。这比一行行地解析和执行它们提供了更好的性能,但我认为这不是在编写解释器。它同时编写编译器和目标虚拟机。

If someone wants to learn to write an interpreter, they should try making the most basic simple and practical working interpreter.

如果有人想要学习写翻译,他们应该尝试做最基本的简单实用的工作翻译。

#1


22  

A great way to get started writing an interpreter is to write a simple machine simulator. Here's a simple language you can write an interpreter for:

开始编写解释器的一个好方法是编写一个简单的机器模拟器。这里有一个简单的语言,你可以为:

The language has a stack and 6 instructions:

该语言有一个堆栈和6个说明:

push <num> # push a number on to the stack

推入 #将数字推入堆栈

pop # pop off the first number on the stack

弹出#弹出堆栈上的第一个数字

add # pop off the top 2 items on the stack and push their sum on to the stack. (remember you can add negative numbers, so you have subtraction covered too). You can also get multiplication my creating a loop using some of the other instructions with this one.

添加#弹出堆栈上的前两项,并将它们的和推入堆栈。(记住你可以加负数,所以也可以加减法)你也可以得到乘法我创建一个循环使用一些其他的指令与这个。

ifeq <address> # examine the top of the stack, if it's 0, continue, else, jump to <address> where <address> is a line number

ifeq

#检查堆栈顶部,如果为0,继续,否则跳到
,其中
为行号

jump <address> # jump to a line number

跳转 <地址> #跳转到行号

print # print the value at the top of the stack

打印#打印堆栈顶部的值

dup # push a copy of what's at the top of the stack back onto the stack.

将堆栈顶部的内容的副本推回堆栈。

Once you've written a program that can take these instructions and execute them, you've essentially created a very simple stack based virtual machine. Since this is a very low level language, you won't need to understand what an AST is, how to parse a grammar into an AST, and translate it to machine code, etc. That's too complicated for a tutorial project. Start with this, and once you've created this little VM, you can start thinking about how you can translate some common constructs into this machine. e.g. you might want to think about how you might translate a C if/else statement or while loop into this language.

一旦您编写了一个可以接受这些指令并执行它们的程序,您实际上就创建了一个非常简单的基于堆栈的虚拟机。由于这是一种非常低级的语言,您不需要了解什么是AST,如何将语法解析为AST,并将其转换为机器码,等等。这对于教程项目来说太复杂了。从这个开始,一旦您创建了这个小VM,您就可以开始考虑如何将一些常见的结构转换到这个机器中。你可能想要考虑如何将C if/else语句或while循环翻译成这种语言。

Edit:

编辑:

From the comments below, it sounds like you need a bit more experience with C before you can tackle this task.

从下面的评论来看,在处理这个任务之前,您需要更多的C经验。

What I would suggest is to first learn about the following topics:

我的建议是首先了解以下主题:

  • scanf, printf, putchar, getchar - basic C IO functions
  • scanf, printf, putchar, getchar - basic C IO函数
  • struct - the basic data structure in C
  • 结构——C中的基本数据结构。
  • malloc - how to allocate memory, and the difference between stack memory and heap memory
  • malloc -如何分配内存,堆栈内存和堆内存之间的区别
  • linked lists - and how to implement a stack, then perhaps a binary tree (you'll need to understand structs and malloc first)
  • 链接列表——以及如何实现堆栈,然后可能是一棵二叉树(首先需要理解结构和malloc)

Then it'll be good to learn a bit more about the string.h library as well - strcmp, strdup - a couple useful string functions that will be useful.

那么多了解一些关于弦的知识就好了。h库以及strcmp、strdup——两个有用的字符串函数。

In short, C has a much higher learning curve compared to python, just because it's a lower level language and you have to manage your own memory, so it's good to learn a few basic things about C first before trying to write an interpreter, even if you already know how to write one in python.

简而言之,C有更高的学习曲线与python相比,仅仅因为它是一个低水平的语言,你必须管理自己的记忆,所以最好学习一些基本的关于C之前试图编写一个解释器,即使你已经知道如何在python中编写一个。

#2


15  

The only difference between an interpreter and a compiler is that instead of generating code from the AST, you execute it in a VM instead. Once you understand this, almost any compiler book, even the Red Dragon Book (first edition, not second!), is enough.

解释器和编译器之间的唯一区别是,您不是从AST生成代码,而是在VM中执行。一旦你理解了这一点,几乎任何编译器书籍,甚至红龙图书(第一版,而不是第二版!)都足够了。

#3


4  

I see this is a bit of a late reply, however since this thread showed up at second place in the result list when I did a search for writing an interpreter and no one have mentioned anything very concrete I will provide the following example:

我看到这是一个有点迟的回复,但是由于这个线程出现在结果列表的第二位,当我搜索一个解释器时,没有人提到任何非常具体的东西,我将提供以下示例:

Disclaimer: This is just some simple code I wrote in a hurry in order to have a foundation for the explanation below and are therefore not perfect, but it compiles and runs, and seems to give the expected answers.

免责声明:这只是我匆忙编写的一些简单代码,目的是为下面的解释打下基础,因此并不完美,但它编译并运行,似乎给出了预期的答案。

Read the following C-code from bottom to top:

从下到上阅读下面的C-code:

#include <stdio.h>
#include <stdlib.h>

double expression(void);

double vars[26]; // variables

char get(void) { char c = getchar(); return c; } // get one byte
char peek(void) { char c = getchar(); ungetc(c, stdin); return c; } // peek at next byte
double number(void) { double d; scanf("%lf", &d); return d; } // read one double

void expect(char c) { // expect char c from stream
    char d = get();
    if (c != d) {
        fprintf(stderr, "Error: Expected %c but got %c.\n", c, d);
    }
}

double factor(void) { // read a factor
    double f;
    char c = peek();
    if (c == '(') { // an expression inside parantesis?
        expect('(');
        f = expression();
        expect(')');
    } else if (c >= 'A' && c <= 'Z') { // a variable ?
        expect(c);
        f = vars[c - 'A'];
    } else { // or, a number?
        f = number();
    }
    return f;
}

double term(void) { // read a term
    double t = factor();
    while (peek() == '*' || peek() == '/') { // * or / more factors
        char c = get();
        if (c == '*') {
            t = t * factor();
        } else {
            t = t / factor();
        }
    }
    return t;
}

double expression(void) { // read an expression
    double e = term();
    while (peek() == '+' || peek() == '-') { // + or - more terms
        char c = get();
        if (c == '+') {
            e = e + term();
        } else {
            e = e - term();
        }
    }
    return e;
}

double statement(void) { // read a statement
    double ret;
    char c = peek();
    if (c >= 'A' && c <= 'Z') { // variable ?
        expect(c);
        if (peek() == '=') { // assignment ?
            expect('=');
            double val = expression();
            vars[c - 'A'] = val;
            ret = val;
        } else {
            ungetc(c, stdin);
            ret = expression();
        }
    } else {
        ret = expression();
    }
    expect('\n');
    return ret;
}

int main(void) {
    printf("> "); fflush(stdout);

    for (;;) {
        double v = statement();
        printf(" = %lf\n> ", v); fflush(stdout);
    }
    return EXIT_SUCCESS;
}

This is an simple recursive descend parser for basic mathematical expressions supporting one letter variables. Running it and typing some statements yields the following results:

这是一个简单的递归下降解析器,用于支持一个字母变量的基本数学表达式。运行它并输入一些语句会得到以下结果:

> (1+2)*3
 = 9.000000
> A=1
 = 1.000000
> B=2
 = 2.000000
> C=3
 = 3.000000
> (A+B)*C
 = 9.000000

You can alter the get(), peek() and number() to read from a file or list of code lines. Also you should make a function to read identifiers (basically words). Then you expand the statement() function to be able to alter which line it runs next in order to do branching. Last you add the branch operations you want to the statement function, like

可以将get()、peek()和number()修改为从文件或代码行列表中读取。还应该创建一个函数来读取标识符(基本上是单词)。然后展开statement()函数,以便能够更改接下来运行的行,以便进行分支。最后,将希望的分支操作添加到语句函数中,例如

if "condition" then 
    "statements" 
else 
    "statements" 
endif. 

while "condition" do
    "statements"
endwhile

function fac(x)
   if x = 0 then
      return 1
   else
      return x*fac(x-1) 
   endif
endfunction

Obviously you can decide the syntax to be as you like. You need to think about ways of define functions and how to handle arguments/parameter variables, local variables and global variables. If preferable arrays and data structures. References∕pointers. Input/output? In order to handle recursive function calls you probably need to use a stack.

显然,您可以根据自己的喜好来决定语法。您需要考虑定义函数的方法,以及如何处理参数/参数变量、局部变量和全局变量。如果更好的数组和数据结构。∕指针的引用。输入/输出?为了处理递归函数调用,您可能需要使用堆栈。

In my opinion this would be easier to do all this with C++ and STL. Where for example one std::map could be used to hold local variables, and another map could be used for globals...

在我看来,用c++和STL做这些事情要容易得多。例如,一个std::map可以用于保存局部变量,另一个map可以用于全局变量……

It is of course possible to write a compiler that build an abstract syntax tree out of the code. Then travels this tree in order to produce either machine code or some kind of byte code which executed on a virtual machine (like Java and .Net). This gives better performance than naively parse line by line and executing them, but in my opinion that is NOT writing an interpreter. That is writing both a compiler and its targeted virtual machine.

当然,可以编写一个编译器,用代码构建一个抽象语法树。然后遍历这棵树,以生成机器代码或在虚拟机上执行的某种字节代码(如Java和. net)。这比一行行地解析和执行它们提供了更好的性能,但我认为这不是在编写解释器。它同时编写编译器和目标虚拟机。

If someone wants to learn to write an interpreter, they should try making the most basic simple and practical working interpreter.

如果有人想要学习写翻译,他们应该尝试做最基本的简单实用的工作翻译。