Python中最有效的图形数据结构是什么?

I need to be able to manipulate a large (10^7 nodes) graph in python. The data corresponding to each node/edge is minimal, say, a small number of strings. What is the most efficient, in terms of memory and speed, way of doing this?

我需要操作一个大在python中(10 ^ 7节点)图。每个节点/边缘对应的数据是最小的，例如，少量的字符串。在内存和速度方面，最有效的方法是什么?

A dict of dicts is more flexible and simpler to implement, but I intuitively expect a list of lists to be faster. The list option would also require that I keep the data separate from the structure, while dicts would allow for something of the sort:

命令的命令更灵活、更容易实现，但我直觉地认为列表列表会更快。列表选项还要求我将数据与结构分开，而dicts将允许类似的内容:

graph[I][J]["Property"]="value"

What would you suggest?

你建议什么?

Yes, I should have been a bit clearer on what I mean by efficiency. In this particular case I mean it in terms of random access retrieval.

是的，我应该更清楚地理解我所说的效率。在这个例子中，我指的是随机存取检索。

Loading the data in to memory isn't a huge problem. That's done once and for all. The time consuming part is visiting the nodes so I can extract the information and measure the metrics I'm interested in.

将数据加载到内存中并不是什么大问题。这是一劳永逸的。耗时部分是访问节点，这样我就可以提取信息并度量我感兴趣的指标。

I hadn't considered making each node a class (properties are the same for all nodes) but it seems like that would add an extra layer of overhead? I was hoping someone would have some direct experience with a similar case that they could share. After all, graphs are one of the most common abstractions in CS.

我没有考虑让每个节点都是一个类(所有节点的属性都是相同的)，但是这样做会增加额外的开销吗?我希望有人能有一些直接的经验与类似的情况，他们可以分享。毕竟，图形是CS中最常见的抽象之一。

7 个解决方案

#1

I would strongly advocate you look at NetworkX. It's a battle-tested war horse and the first tool most 'research' types reach for when they need to do analysis of network based data. I have manipulated graphs with 100s of thousands of edges without problem on a notebook. Its feature rich and very easy to use. You will find yourself focusing more on the problem at hand rather than the details in the underlying implementation.

我强烈建议你看看NetworkX。这是一种经过实战考验的战马，也是大多数“研究”类型的第一种工具，当他们需要对基于网络的数据进行分析时，就能找到它。我在笔记本上操作了无数条边的图形。其功能丰富，使用方便。您将发现自己更多地关注手头的问题，而不是底层实现中的细节。

Example of Erdős-Rényi random graph generation and analysis