如何在DAG中找到通过一组给定节点的所有路径?

时间:2022-10-31 17:56:28

I have a list of items (blue nodes below) which are categorized by the users of my application. The categories themselves can be grouped and categorized themselves.

我有一个项目列表(下面的蓝色节点),由我的应用程序的用户分类。可以对类别本身进行分组和分类。

The resulting structure can be represented as a Directed Acyclic Graph (DAG) where the items are sinks at the bottom of the graph's topology and the top categories are sources. Note that while some of the categories might be well defined, a lot is going to be user defined and might be very messy.

得到的结构可以表示为有向无环图(DAG),其中项目是图形拓扑底部的汇点,顶部类别是源。请注意,虽然某些类别可能定义得很好,但很多都是用户定义的,可能非常混乱。

Example:

example data http://theuprightape.net/dag.png

示例数据http://theuprightape.net/dag.png

On that structure, I want to perform the following operations:

在该结构上,我想执行以下操作:

  • find all items (sinks) below a particular node (all items in Europe)
  • 查找特定节点下方的所有项目(汇点)(欧洲所有项目)

  • find all paths (if any) that pass through all of a set of n nodes (all items sent via SMTP from example.com)
  • 查找通过所有n个节点集的所有路径(如果有)(通过SMTP从example.com发送的所有项目)

  • find all nodes that lie below all of a set of nodes (intersection: goyish brown foods)
  • 找到位于所有节点下方的所有节点(交叉点:goyish brown foods)

The first seems quite straightforward: start at the node, follow all possible paths to the bottom and collect the items there. However, is there a faster approach? Remembering the nodes I already passed through probably helps avoiding unnecessary repetition, but are there more optimizations?

第一个似乎很简单:从节点开始,按照所有可能的路径到底部并收集那里的项目。但是,有更快的方法吗?记住我已经通过的节点可能有助于避免不必要的重复,但还有更多的优化吗?

How do I go about the second one? It seems that the first step would be to determine the height of each node in the set, as to determine at which one(s) to start and then find all paths below that which include the rest of the set. But is this the best (or even a good) approach?

我怎么去第二个呢?似乎第一步是确定集合中每个节点的高度,以确定在哪个(s)开始,然后找到包括集合其余部分的所有路径。但这是最好的(甚至是好的)方法吗?

The graph traversal algorithms listed at Wikipedia all seem to be concerned with either finding a particular node or the shortest or otherwise most effective route between two nodes. I think both is not what I want, or did I just fail to see how this applies to my problem? Where else should I read?

*上列出的图遍历算法似乎都关注于找到特定节点或两个节点之间的最短路径或其他最有效路由。我认为两者都不是我想要的,或者我只是没有看到这对我的问题有何影响?我还应该在哪里阅读?

2 个解决方案

#1


2  

It seems to me that its essentially the same operation for all 3 questions. You're always asking "Find all X below node(s) Y, where X is of type Z". All you need is a generic mechanism for 'locate all nodes below node', (solves Q3) and then you can filter the results for 'nodetype=sink' (solves Q1). For Q2, you have the starting-point (your node set) and your ending point (any sink below the starting point) so your solution set is all paths from starting node specified to the sink. So I would suggest that what you basically have a is a tree, and basic tree-traversal algorithms would be the way to go.

在我看来,它对所有3个问题的操作基本相同。你总是问“在节点Y下找到所有X,其中X是Z型”。您只需要一个'定位节点下所有节点'的通用机制,(解决Q3),然后您可以过滤'nodetype = sink'的结果(解决Q1)。对于Q2,您有起始点(您的节点集)和结束点(起点下面的任何接收器),因此您的解决方案集是从指定的起始节点到接收器的所有路径。所以我建议你基本上拥有的是一棵树,基本的树遍历算法将是你要走的路。

#2


1  

Despite the fact that your graph is acyclic, the operations you cite remind me of similar aspects of control flow graph analysis. There is a rich set of algorithms based on dominance that may be applicable. For example, your third operation reminds me od computing dominance frontiers; I believe that algorithm would work directly if you temporarily introduce "entry" and "exit" nodes. The entry node connects the "given set of nodes" and the exit nodes connects the sinks.

尽管您的图表是非循环的,但您引用的操作让我想起了控制流图分析的类似方面。有一组丰富的基于优势的算法可能适用。例如,你的第三个操作提醒我计算优势边界;我相信如果你暂时引入“入口”和“退出”节点,算法将直接工作。入口节点连接“给定节点集”,出口节点连接接收器。

Also see Robert Tarjan's basic algorithms.

另请参阅Robert Tarjan的基本算法。

#1


2  

It seems to me that its essentially the same operation for all 3 questions. You're always asking "Find all X below node(s) Y, where X is of type Z". All you need is a generic mechanism for 'locate all nodes below node', (solves Q3) and then you can filter the results for 'nodetype=sink' (solves Q1). For Q2, you have the starting-point (your node set) and your ending point (any sink below the starting point) so your solution set is all paths from starting node specified to the sink. So I would suggest that what you basically have a is a tree, and basic tree-traversal algorithms would be the way to go.

在我看来,它对所有3个问题的操作基本相同。你总是问“在节点Y下找到所有X,其中X是Z型”。您只需要一个'定位节点下所有节点'的通用机制,(解决Q3),然后您可以过滤'nodetype = sink'的结果(解决Q1)。对于Q2,您有起始点(您的节点集)和结束点(起点下面的任何接收器),因此您的解决方案集是从指定的起始节点到接收器的所有路径。所以我建议你基本上拥有的是一棵树,基本的树遍历算法将是你要走的路。

#2


1  

Despite the fact that your graph is acyclic, the operations you cite remind me of similar aspects of control flow graph analysis. There is a rich set of algorithms based on dominance that may be applicable. For example, your third operation reminds me od computing dominance frontiers; I believe that algorithm would work directly if you temporarily introduce "entry" and "exit" nodes. The entry node connects the "given set of nodes" and the exit nodes connects the sinks.

尽管您的图表是非循环的,但您引用的操作让我想起了控制流图分析的类似方面。有一组丰富的基于优势的算法可能适用。例如,你的第三个操作提醒我计算优势边界;我相信如果你暂时引入“入口”和“退出”节点,算法将直接工作。入口节点连接“给定节点集”,出口节点连接接收器。

Also see Robert Tarjan's basic algorithms.

另请参阅Robert Tarjan的基本算法。