有好的Python库可以解析c++吗?

时间:2022-11-26 15:48:08

Google didn't turn up anything that seemed relevant.

谷歌似乎没有发现任何相关的东西。

I have a bunch of existing, working C++ code, and I'd like to use python to crawl through it and figure out relationships between classes, etc.

我有一堆现成的c++代码,我想用python来爬过它,找出类之间的关系等等。

EDIT: Just wanted to point out: I don't think I need or want to parse every bit of C++; I just need something smart enough to pick up on class, function and member variable declarations, and to skip over function definitions.

编辑:我只是想指出:我不认为我需要或想要解析c++的每一点;我只需要一些足够聪明的东西来获取类、函数和成员变量声明,并跳过函数定义。

13 个解决方案

#1


30  

C++ is notoriously hard to parse. Most people who try to do this properly end up taking apart a compiler. In fact this is (in part) why LLVM started: Apple needed a way they could parse C++ for use in XCode that matched the way the compiler parsed it.

众所周知,c++很难解析。大多数想要正确地完成这个任务的人最终会把编译器拆开。事实上,这就是LLVM开始的部分原因:苹果需要一种解析c++的方法,以便在XCode中使用,这种方法与编译器解析它的方式相匹配。

That's why there are projects like GCC_XML which you could combine with a python xml library.

这就是为什么有像GCC_XML这样的项目可以与python xml库结合使用。

Some non-compiler projects that seem to do a pretty good job at parsing C++ are:

一些非编译器项目似乎在解析c++方面做得很好:

  • Eclipse CDT
  • Eclipse CDT
  • OpenGrok
  • OpenGrok
  • Doxygen
  • Doxygen

#2


43  

Not an answer as such, but just to demonstrate how hard parsing C++ correctly actually is. My favorite demo:

不是这样的答案,只是为了说明正确解析c++有多难。我最喜欢的演示:

template<bool> struct a_t;

template<> struct a_t<true> {
    template<int> struct b {};
};

template<> struct a_t<false> {
    enum { b };
};

typedef a_t<sizeof(void*)==sizeof(int)> a;

enum { c, d };
int main() {
    a::b<c>d; // declaration or expression?
}

This is perfectly valid, standard-compliant C++, but the exact meaning of commented line depends on your implementation. If sizeof(void*)==sizeof(int) (typical on 32-bit platforms), it is a declaration of local variable d of type a::b<c>. If the condition doesn't hold, then it is a no-op expression ((a::b < c) > d). Adding a constructor for a::b will actually let you expose the difference via presence/absence of side effects.

这是完全有效的、符合标准的c++,但是注释行的确切含义取决于您的实现。如果sizeof(void*)= sizeof(int)(典型的32位平台),它是a类型的局部变量d的声明::b 。如果条件不成立,那么它就是一个无op表达式(a::b < c) > d)。

#3


5  

For many years I've been using pygccxml, which is a very nice Python wrapper around GCC-XML. It's a very full featured package that forms the basis of some well used code-generation tools out there such as py++ which is from the same author.

多年来,我一直在使用pygccxml,这是一个围绕GCC-XML的非常好的Python包装器。它是一个非常完整的功能包,它构成了一些使用良好的代码生成工具的基础,比如来自同一作者的py++。

#4


5  

You won't find a drop-in Python library to do this. Parsing C++ is fiddly, and few parsers have been written that aren't part of a compiler. You can find a good summary of the issues here.

您不会找到一个drop-in Python库来实现这一点。解析c++非常复杂,编写的解析器很少不是编译器的一部分。你可以在这里找到对这些问题的一个很好的总结。

The best bet might be clang, as its C++ support is well-established. Though this is not a Python solution, it sounds as though it would be amenable to re-use within a Python wrapper, given the emphasis on encapsulation and good design in its development.

最好的选择可能是clang,因为它的c++支持已经很好了。尽管这不是一个Python解决方案,但考虑到在开发过程中对封装和良好设计的强调,它似乎可以在Python包装器中重用。

#5


4  

Pycparser is a complete and functional parser for ANSI C. Perhaps you can extend it to c++ :-)

Pycparser是ANSI c的一个完整的功能解析器,也许你可以将它扩展到c++:-)

#6


4  

If you've formatted your comments in a compatible way, Doxygen does a fantastic job. It'll even draw inheritance diagrams if you've got graphviz installed.

如果你已经用兼容的方式格式化了你的评论,Doxygen会做得很好。如果安装了graphviz,它甚至会绘制继承关系图。

For example, running DOxygen over the following:

例如,在下面运行DOxygen:

/// <summary>
/// A summary of my class
/// </summary>
public class MyClass
{
protected:
    int m_numOfWidgets; /// Keeps track of the number of widgets stored

public:
    /// <summary>
    /// Constructor for the class.
    /// </summary>
    /// <param paramName="numOfWidgets">Specifies how many widgets to start with</param>
    MyClass(int numOfWidgets)
    {
        m_numOfWidgets = numOfWidgets;
    }

    /// <summary>
    /// Increments the number of widgets stored by the amount supplied.
    /// </summary>
    /// <param paramName="numOfWidgets">Specifies how many widgets to start with</param>
    /// <returns>The number of widgets stored</returns>
    IncreaseWidgets(int numOfWidgetsToAdd)
    {
        m_numOfWidgets += numOfWidgets;
        return m_numOfWidgets;
    }
};

Will turn all those comments into entries in .html files. With more complicated designs, the result is even more beneficial - often much easier than trying to browse through the source.

将所有这些注释转换成.html文件中的条目。对于更复杂的设计,结果甚至更有利——通常比浏览源代码要容易得多。

#7


1  

This page shows a C++ grammar written in Antlr, and you can generate Python code from it.

这个页面显示了用Antlr编写的c++语法,您可以从中生成Python代码。

There also seems to be someone who was working on a C++ parser in pyparsing, but I was not able to find out who or its current status.

似乎也有人在pyparser中处理c++解析器,但是我无法找出谁或者它的当前状态。

#8


1  

There is no (free) good library to parse C++ in any language.
Your best choices are probably Dehydra g++ plugin, clang, or Elsa.

在任何语言中都没有(免费的)解析c++的好库。您最好的选择可能是脱水g++插件、clang或Elsa。

#9


0  

The pyparsing wiki shows this example - all it does is parse struct declarations, so this might give you just a glimpse at the magnitude of the problem.

pyparse wiki展示了这个例子——它所做的就是解析struct声明,因此这可能会让您了解问题的严重性。

I suggest you (or even better, your employer) shell out $200 and buy Enterprise Architect from sparxsystems. This software is amazingly powerful for the price, and includes pretty good code reverse engineering features. You will spend far more than this in your own time to only get about 2% of the job done. In this case, "buys" wins over "make".

我建议你(甚至你的雇主)花200美元从sparxsystems购买企业架构师。这个软件的价格是惊人的强大,包括相当好的代码反向工程特性。在你自己的时间里,你会花更多的时间去完成大约2%的工作。在这种情况下,“购买”胜过“制造”。

#10


0  

Ctypes uses gcc-xml for code generation. It's possible that cpptypes does also. Even if it doesn't, you could use gcc-xml to generate XML from your C++ file, then parse the xml with one of the built-in or third-party Python XML parsers.

Ctypes使用gcc-xml生成代码。cpptypes也可能会这样做。即使没有,您也可以使用gcc-xml从c++文件生成XML,然后使用内置的或第三方的Python XML解析器之一解析XML。

#11


0  

Here's a SourceForge project that claims to parse c++ headers. As the other commenters have pointed out, there's no general solution, but you this sounds like it will do enough for your needs. (I just ran across it for a similar need and haven't tried it myself yet.)

这是一个SourceForge项目,它声称要解析c++头文件。正如其他评论者指出的那样,没有通用的解决方案,但你听起来这似乎足以满足你的需要。(我只是出于类似的需要偶然发现它,还没有自己尝试过。)

http://sourceforge.net/projects/cppheaderparser/

http://sourceforge.net/projects/cppheaderparser/

#12


0  

The Clang project provides libraries for just parsing C++ code.

Clang项目提供了解析c++代码的库。

Either with Clang and GCC you can generate an XML representation of the code

您可以使用Clang和GCC来生成代码的XML表示形式

If you prefer a more Pythonian solution you could also search for a C++ yacc grammar and use py-ply (Yacc for Python), but that seems the solution that needs more work

如果您喜欢更Python化的解决方案,您还可以搜索c++ yacc语法并使用py-ply (Python的yacc),但这似乎是需要更多工作的解决方案

#13


0  

I would keep an eye on the gcc.gnu.org/wiki/plugins as it seems like plugins are the way to go. Also the gcc-python-plugin seems like it has a nice implementation.

我将密切关注gcc.gnu.org/wiki/plugins,因为它看起来像插件。此外,gcc-python-plugin似乎也有一个很好的实现。

#1


30  

C++ is notoriously hard to parse. Most people who try to do this properly end up taking apart a compiler. In fact this is (in part) why LLVM started: Apple needed a way they could parse C++ for use in XCode that matched the way the compiler parsed it.

众所周知,c++很难解析。大多数想要正确地完成这个任务的人最终会把编译器拆开。事实上,这就是LLVM开始的部分原因:苹果需要一种解析c++的方法,以便在XCode中使用,这种方法与编译器解析它的方式相匹配。

That's why there are projects like GCC_XML which you could combine with a python xml library.

这就是为什么有像GCC_XML这样的项目可以与python xml库结合使用。

Some non-compiler projects that seem to do a pretty good job at parsing C++ are:

一些非编译器项目似乎在解析c++方面做得很好:

  • Eclipse CDT
  • Eclipse CDT
  • OpenGrok
  • OpenGrok
  • Doxygen
  • Doxygen

#2


43  

Not an answer as such, but just to demonstrate how hard parsing C++ correctly actually is. My favorite demo:

不是这样的答案,只是为了说明正确解析c++有多难。我最喜欢的演示:

template<bool> struct a_t;

template<> struct a_t<true> {
    template<int> struct b {};
};

template<> struct a_t<false> {
    enum { b };
};

typedef a_t<sizeof(void*)==sizeof(int)> a;

enum { c, d };
int main() {
    a::b<c>d; // declaration or expression?
}

This is perfectly valid, standard-compliant C++, but the exact meaning of commented line depends on your implementation. If sizeof(void*)==sizeof(int) (typical on 32-bit platforms), it is a declaration of local variable d of type a::b<c>. If the condition doesn't hold, then it is a no-op expression ((a::b < c) > d). Adding a constructor for a::b will actually let you expose the difference via presence/absence of side effects.

这是完全有效的、符合标准的c++,但是注释行的确切含义取决于您的实现。如果sizeof(void*)= sizeof(int)(典型的32位平台),它是a类型的局部变量d的声明::b 。如果条件不成立,那么它就是一个无op表达式(a::b < c) > d)。

#3


5  

For many years I've been using pygccxml, which is a very nice Python wrapper around GCC-XML. It's a very full featured package that forms the basis of some well used code-generation tools out there such as py++ which is from the same author.

多年来,我一直在使用pygccxml,这是一个围绕GCC-XML的非常好的Python包装器。它是一个非常完整的功能包,它构成了一些使用良好的代码生成工具的基础,比如来自同一作者的py++。

#4


5  

You won't find a drop-in Python library to do this. Parsing C++ is fiddly, and few parsers have been written that aren't part of a compiler. You can find a good summary of the issues here.

您不会找到一个drop-in Python库来实现这一点。解析c++非常复杂,编写的解析器很少不是编译器的一部分。你可以在这里找到对这些问题的一个很好的总结。

The best bet might be clang, as its C++ support is well-established. Though this is not a Python solution, it sounds as though it would be amenable to re-use within a Python wrapper, given the emphasis on encapsulation and good design in its development.

最好的选择可能是clang,因为它的c++支持已经很好了。尽管这不是一个Python解决方案,但考虑到在开发过程中对封装和良好设计的强调,它似乎可以在Python包装器中重用。

#5


4  

Pycparser is a complete and functional parser for ANSI C. Perhaps you can extend it to c++ :-)

Pycparser是ANSI c的一个完整的功能解析器,也许你可以将它扩展到c++:-)

#6


4  

If you've formatted your comments in a compatible way, Doxygen does a fantastic job. It'll even draw inheritance diagrams if you've got graphviz installed.

如果你已经用兼容的方式格式化了你的评论,Doxygen会做得很好。如果安装了graphviz,它甚至会绘制继承关系图。

For example, running DOxygen over the following:

例如,在下面运行DOxygen:

/// <summary>
/// A summary of my class
/// </summary>
public class MyClass
{
protected:
    int m_numOfWidgets; /// Keeps track of the number of widgets stored

public:
    /// <summary>
    /// Constructor for the class.
    /// </summary>
    /// <param paramName="numOfWidgets">Specifies how many widgets to start with</param>
    MyClass(int numOfWidgets)
    {
        m_numOfWidgets = numOfWidgets;
    }

    /// <summary>
    /// Increments the number of widgets stored by the amount supplied.
    /// </summary>
    /// <param paramName="numOfWidgets">Specifies how many widgets to start with</param>
    /// <returns>The number of widgets stored</returns>
    IncreaseWidgets(int numOfWidgetsToAdd)
    {
        m_numOfWidgets += numOfWidgets;
        return m_numOfWidgets;
    }
};

Will turn all those comments into entries in .html files. With more complicated designs, the result is even more beneficial - often much easier than trying to browse through the source.

将所有这些注释转换成.html文件中的条目。对于更复杂的设计,结果甚至更有利——通常比浏览源代码要容易得多。

#7


1  

This page shows a C++ grammar written in Antlr, and you can generate Python code from it.

这个页面显示了用Antlr编写的c++语法,您可以从中生成Python代码。

There also seems to be someone who was working on a C++ parser in pyparsing, but I was not able to find out who or its current status.

似乎也有人在pyparser中处理c++解析器,但是我无法找出谁或者它的当前状态。

#8


1  

There is no (free) good library to parse C++ in any language.
Your best choices are probably Dehydra g++ plugin, clang, or Elsa.

在任何语言中都没有(免费的)解析c++的好库。您最好的选择可能是脱水g++插件、clang或Elsa。

#9


0  

The pyparsing wiki shows this example - all it does is parse struct declarations, so this might give you just a glimpse at the magnitude of the problem.

pyparse wiki展示了这个例子——它所做的就是解析struct声明,因此这可能会让您了解问题的严重性。

I suggest you (or even better, your employer) shell out $200 and buy Enterprise Architect from sparxsystems. This software is amazingly powerful for the price, and includes pretty good code reverse engineering features. You will spend far more than this in your own time to only get about 2% of the job done. In this case, "buys" wins over "make".

我建议你(甚至你的雇主)花200美元从sparxsystems购买企业架构师。这个软件的价格是惊人的强大,包括相当好的代码反向工程特性。在你自己的时间里,你会花更多的时间去完成大约2%的工作。在这种情况下,“购买”胜过“制造”。

#10


0  

Ctypes uses gcc-xml for code generation. It's possible that cpptypes does also. Even if it doesn't, you could use gcc-xml to generate XML from your C++ file, then parse the xml with one of the built-in or third-party Python XML parsers.

Ctypes使用gcc-xml生成代码。cpptypes也可能会这样做。即使没有,您也可以使用gcc-xml从c++文件生成XML,然后使用内置的或第三方的Python XML解析器之一解析XML。

#11


0  

Here's a SourceForge project that claims to parse c++ headers. As the other commenters have pointed out, there's no general solution, but you this sounds like it will do enough for your needs. (I just ran across it for a similar need and haven't tried it myself yet.)

这是一个SourceForge项目,它声称要解析c++头文件。正如其他评论者指出的那样,没有通用的解决方案,但你听起来这似乎足以满足你的需要。(我只是出于类似的需要偶然发现它,还没有自己尝试过。)

http://sourceforge.net/projects/cppheaderparser/

http://sourceforge.net/projects/cppheaderparser/

#12


0  

The Clang project provides libraries for just parsing C++ code.

Clang项目提供了解析c++代码的库。

Either with Clang and GCC you can generate an XML representation of the code

您可以使用Clang和GCC来生成代码的XML表示形式

If you prefer a more Pythonian solution you could also search for a C++ yacc grammar and use py-ply (Yacc for Python), but that seems the solution that needs more work

如果您喜欢更Python化的解决方案,您还可以搜索c++ yacc语法并使用py-ply (Python的yacc),但这似乎是需要更多工作的解决方案

#13


0  

I would keep an eye on the gcc.gnu.org/wiki/plugins as it seems like plugins are the way to go. Also the gcc-python-plugin seems like it has a nice implementation.

我将密切关注gcc.gnu.org/wiki/plugins,因为它看起来像插件。此外,gcc-python-plugin似乎也有一个很好的实现。