使用XSLT / XPath查找有向无环图(DAG)最小元素(顶点)?

时间:2023-02-02 15:02:30

I have an XML file that encodes a directed acyclic graph (DAG) that represents a partial order. Such graphs are useful for things like specifying dependencies and finding critical paths. For the curious, my current application is to specify component dependencies for a build system, so vertices are components and edges specify compile-time dependencies. Here is a simple example:

我有一个XML文件,它编码一个表示部分顺序的有向无环图(DAG)。这些图对于指定依赖关系和查找关键路径等内容非常有用。对于好奇,我当前的应用程序是为构建系统指定组件依赖项,因此顶点是组件,而边缘指定编译时依赖项。这是一个简单的例子:

<?xml version="1.0"?>
<dag>
    <vertex name="A">
        <directed-edge-to vertex="C"/>
    </vertex>
    <vertex name="B">
        <directed-edge-to vertex="C"/>
        <directed-edge-to vertex="D"/>
    </vertex>
    <vertex name="C">
        <directed-edge-to vertex="E"/>
    </vertex>
    <vertex name="D">
        <directed-edge-to vertex="E"/>
    </vertex>
    <vertex name="E">
        <directed-edge-to vertex="G"/>
    </vertex>
    <vertex name="F">
        <directed-edge-to vertex="G"/>
    </vertex>
    <vertex name="G"/>
</dag>

This DAG may be drawn like this:

此DAG可能如下所示:

http://iparelan.com/dag.png

I'd like to apply an XSLT stylesheet that produces another XML document that contains only the vertices that correspond to minimal elements of the partial order. That is, those vertices that have no incoming edges. The set of minimal vertices for the example graph is {A, B, F}. For my build dependency application, finding this set is valuable because I know that if I build the members of this set, then everything in my project will be built.

我想应用一个XSLT样式表来生成另一个XML文档,该文档只包含与偏序的最小元素对应的顶点。也就是说,那些没有传入边缘的顶点。示例图的最小顶点集是{A,B,F}。对于我的构建依赖项应用程序,查找此集合很有价值,因为我知道如果我构建此集合的成员,那么我的项目中的所有内容都将构建。

Here is my current stylesheet solution (I'm running this with Xalan on Java using Apache Ant's xslt task). A key observation is that a minimal vertex will not be referenced in any directed-edge-to element:

这是我当前的样式表解决方案(我使用Apache Ant的xslt任务在Java上运行Xalan)。一个关键的观察是,在任何有向边到元素中都不会引用最小顶点:

<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns:xalan="http://xml.apache.org/xslt"
                exclude-result-prefixes="xalan">
    <xsl:output method="xml" indent="yes" xalan:indent-amount="4"/>

    <xsl:template match="dag">
        <minimal-vertices>
            <xsl:for-each select="//vertex">
                <xsl:if test="not(//vertex/directed-edge-to[@vertex=current()/@name])">
                    <minimal-vertex name="{@name}"/>
                </xsl:if>
            </xsl:for-each>
        </minimal-vertices>
    </xsl:template>
</xsl:stylesheet>

Applying this stylesheet produces the following output (which I believe is correct):

应用此样式表会产生以下输出(我认为这是正确的):

<?xml version="1.0" encoding="UTF-8"?>
<minimal-vertices>
    <minimal-vertex name="A"/>
    <minimal-vertex name="B"/>
    <minimal-vertex name="F"/>
</minimal-vertices>

The thing is, I'm not completely satisfied with this solution. I'm wondering if there is a way to combine the select of the for-each and the test of the if with XPath syntax.

问题是,我对这个解决方案并不完全满意。我想知道是否有办法结合for-each的选择和if与XPath语法的测试。

I want to write something like:

我想写一些类似的东西:

<xsl:for-each select="//vertex[not(//vertex/directed-edge-to[@vertex=current()/@name])]">

But that does not do what I want because the current() function does not reference the nodes selected by the outer //vertex expression.

但这并不是我想要的,因为current()函数不引用外部//顶点表达式选择的节点。

Thusfar, my solution uses XPath 1.0 and XSLT 1.0 syntax, though I'm open to XPath 2.0 and XSLT 2.0 syntax as well.

因此,我的解决方案使用XPath 1.0和XSLT 1.0语法,尽管我也对XPath 2.0和XSLT 2.0语法持开放态度。

Here's the Ant build script if you like:

如果您愿意,这是Ant构建脚本:

<?xml version="1.0"?>
<project name="minimal-dag" default="default">
    <target name="default">
        <xslt in="dag.xml" out="minimal-vertices.xml" style="find-minimal-vertices.xsl"/>
    </target>
    <target name="dot">
        <xslt in="dag.xml" out="dag.dot" style="xml-to-dot.xsl"/>
    </target>
</project>

The dot target generates Graphviz Dot language code for rendering the graph. Here is xml-to-dot.xsl:

点目标生成用于渲染图形的Graphviz Dot语言代码。这是xml-to-dot.xsl:

<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns:xalan="http://xml.apache.org/xslt"
                exclude-result-prefixes="xalan">
    <xsl:output method="text"/>

    <xsl:template match="dag">
        digraph {
        rankdir="BT";
        node [style="filled", fillcolor="cyan", fontname="Helvetica"];
        <xsl:apply-templates select="//directed-edge-to"/>
        }
    </xsl:template>

    <xsl:template match="directed-edge-to">
        <xsl:value-of select="concat(ancestor::vertex/@name, '->', @vertex, ';')"/>
    </xsl:template>
</xsl:stylesheet>

2 个解决方案

#1


You can take advantage of XPath's implicit existential quantification on the = operator:

您可以在=运算符上利用XPath的隐式存在量化:

<xsl:for-each select="//vertex[not(@name = //vertex/directed-edge-to/@vertex)]">

When you use any of the six comparison operators (=, !=, <, <=, >, and >=) to compare a node-set, the expression will return true if any node in the node-set satisfies the condition. When comparing one node-set with another, the expression returns true if any node in the first node-set satisfies the condition when compared with any node in the second node-set. XPath 2.0 introduces six new operators that don't perform this existential quantification (eq, ne, lt, le, gt, and ge). But in your case, you'll want to use "=" to get that existential quantification.

当您使用六个比较运算符(=,!=,<,<=,>和> =)中的任何一个来比较节点集时,如果节点集中的任何节点满足条件,则表达式将返回true。将一个节点集与另一个节点集进行比较时,如果第一个节点集中的任何节点在与第二个节点集中的任何节点进行比较时满足条件,则表达式返回true。 XPath 2.0引入了六个不执行此存在量化的新运算符(eq,ne,lt,le,gt和ge)。但在你的情况下,你会想用“=”来获得存在量化。

Note of course, that you'll still want to use the not() function as you were doing. Most of the time, it's good to avoid the != operator. If you used it here instead of not(), then it would return true if there are any @vertex attributes that are not equal to the @name value, which is not your intention. (And if either node-set is empty, then it would return false, as comparisons with empty node-sets always return false.)

当然,请注意,您仍然希望像以前一样使用not()函数。大多数情况下,避免使用!=运算符是件好事。如果你在这里使用它而不是not(),那么如果有任何@vertex属性不等于@name值,它将返回true,这不是你的意图。 (如果任一节点集为空,那么它将返回false,因为与空节点集的比较总是返回false。)

If you want to use eq instead, then you'd have to do something like you did: separate out the conditional from the iteration so you could bind current(). But in XPath 2.0, you can do this within an expression:

如果你想使用eq,那么你必须像你一样做:从迭代中分离条件,这样你就可以绑定current()。但是在XPath 2.0中,您可以在表达式中执行此操作:

<xsl:for-each select="for $v in //vertex
                      return $v[not(//directed-edge-to[@vertex eq $v/@name])]">

This is useful for when your condition isn't a simple equality comparison (and thus can't be existentially quantified using "="). For example: starts-with(@vertex, $v/@name).

当您的条件不是简单的相等比较(因此不能使用“=”进行存在量化)时,这非常有用。例如:starts-with(@vertex,$ v / @ name)。

XPath 2.0 also has an explicit way of performing existential quantification. Instead of the for expression above, we could have written this:

XPath 2.0还有一种执行存在量化的明确方法。而不是上面的表达式,我们可以这样写:

<xsl:for-each select="//vertex[not(some $e in //directed-edge-to
                                   satisfies @name eq $e/@vertex)]">

In addition to the "some" syntax, XPath 2.0 also supplies a corresponding "every" syntax for performing universal quantification.

除了“some”语法之外,XPath 2.0还提供了相应的“every”语法,用于执行通用量化。

Rather than using for-each, you could also use template rules, which are more modular (and powerful):

您也可以使用更模块化(功能强大)的模板规则,而不是使用for-each:

<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:template match="/">
    <minimal-vertices>
      <xsl:apply-templates/>
    </minimal-vertices>
  </xsl:template>

  <!-- Copy vertex elements that have no arrows pointing to them -->
  <xsl:template match="vertex[not(@name = //directed-edge-to/@vertex)]">
    <minimal-vertex name="{@name}"/>
  </xsl:template>

</xsl:stylesheet>

Again, in this case, we're relying on the existential quantification of =.

同样,在这种情况下,我们依赖于=的存在量化。

XSLT 1.0 prohibits use of the current() function in patterns, i.e., in the match attribute, but XSLT 2.0 allows it. In that case, current() refers to the node currently being matched. So in XSLT 2.0, we could also write this (without having to use a for expression):

XSLT 1.0禁止在模式中使用current()函数,即在match属性中,但XSLT 2.0允许它。在这种情况下,current()指的是当前匹配的节点。所以在XSLT 2.0中,我们也可以编写它(不必使用for表达式):

<xsl:template match="vertex[not(//directed-edge-to[@vertex eq current()/@name])]">

Note that this pattern is essentially the same as the expression you tried to use in for-each, but whereas it doesn't do what you want in for-each, it does do what you want in the pattern (because what current() binds to is different).

请注意,此模式与您尝试在for-each中使用的表达式基本相同,但是虽然它没有按照您想要的方式执行每个模式,但它确实在模式中执行您想要的操作(因为当前()绑定是不同的)。

Finally, I'll add one more variation that in some ways simplifies the logic (removing not()). This also goes back to using XSLT 1.0:

最后,我将添加一个在某些方面简化逻辑的变体(删除not())。这也可以追溯到使用XSLT 1.0:

<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:template match="/">
    <minimal-vertices>
      <xsl:apply-templates/>
    </minimal-vertices>
  </xsl:template>

  <!-- By default, copy vertex elements -->
  <xsl:template match="vertex">
    <minimal-vertex name="{@name}"/>
  </xsl:template>

  <!-- But strip out vertices with incoming arrows -->
  <xsl:template match="vertex[@name = //directed-edge-to/@vertex]"/>

</xsl:stylesheet>

If you don't like the whitespace being output, add an empty rule for text nodes, so they'll get stripped out (overriding the default rule for text nodes, which is to copy them):

如果您不喜欢输出的空格,请为文本节点添加一个空规则,这样它们就会被剥离(覆盖文本节点的默认规则,即复制它们):

<xsl:template match="text()"/>

Or you could just be more selective in what nodes you apply templates to:

或者您可以在应用模板的节点中更具选择性:

<xsl:apply-templates select="/dag/vertex"/>

Which approach you take is partially dependent on taste, partially dependent on the wider context of your stylesheet and expected data (how much the input structure might vary, etc.).

您采用哪种方法部分取决于品味,部分取决于样式表和预期数据的更广泛背景(输入结构可能变化多少等)。

I know I went way beyond what you were asking for, but I hope you at least found this interesting. :-)

我知道我超越了你的要求,但我希望你至少发现这很有意思。 :-)

#2


One such XPath 1.0 expression is:

一个这样的XPath 1.0表达式是:

        /*/vertex[not(@name = /*/vertex/directed-edge-to/@vertex)]

/ * / vertex [not(@name = / * / vertex / directed-edge-to / @ vertex)]

Then just put it into an XSLT stylesheet like that:

然后把它放到像这样的XSLT样式表中:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

    <xsl:template match="/">
      <minimal-vertices>
          <xsl:for-each select=
          "/*/vertex[not(@name = /*/vertex/directed-edge-to/@vertex)]"
          >
           <minimal-vertex name="{@name}"/>
          </xsl:for-each>
      </minimal-vertices>
    </xsl:template>
</xsl:stylesheet>

When this stylesheet is applied on the originally-provided XML document:

将此样式表应用于最初提供的XML文档时:

<dag>
    <vertex name="A">
        <directed-edge-to vertex="C"/>
    </vertex>
    <vertex name="B">
        <directed-edge-to vertex="C"/>
        <directed-edge-to vertex="D"/>
    </vertex>
    <vertex name="C">
        <directed-edge-to vertex="E"/>
    </vertex>
    <vertex name="D">
        <directed-edge-to vertex="E"/>
    </vertex>
    <vertex name="E">
        <directed-edge-to vertex="G"/>
    </vertex>
    <vertex name="F">
        <directed-edge-to vertex="G"/>
    </vertex>
    <vertex name="G"/>
</dag>

The wanted result is produced:

产生了想要的结果:

<minimal-vertices>
  <minimal-vertex name="A" />
  <minimal-vertex name="B" />
  <minimal-vertex name="F" />
</minimal-vertices>

Do note: A solution for traversing full (maybe cyclic) graphs is available in XSLT here.

请注意:XSLT中提供了遍历完整(可能是循环)图的解决方案。

#1


You can take advantage of XPath's implicit existential quantification on the = operator:

您可以在=运算符上利用XPath的隐式存在量化:

<xsl:for-each select="//vertex[not(@name = //vertex/directed-edge-to/@vertex)]">

When you use any of the six comparison operators (=, !=, <, <=, >, and >=) to compare a node-set, the expression will return true if any node in the node-set satisfies the condition. When comparing one node-set with another, the expression returns true if any node in the first node-set satisfies the condition when compared with any node in the second node-set. XPath 2.0 introduces six new operators that don't perform this existential quantification (eq, ne, lt, le, gt, and ge). But in your case, you'll want to use "=" to get that existential quantification.

当您使用六个比较运算符(=,!=,<,<=,>和> =)中的任何一个来比较节点集时,如果节点集中的任何节点满足条件,则表达式将返回true。将一个节点集与另一个节点集进行比较时,如果第一个节点集中的任何节点在与第二个节点集中的任何节点进行比较时满足条件,则表达式返回true。 XPath 2.0引入了六个不执行此存在量化的新运算符(eq,ne,lt,le,gt和ge)。但在你的情况下,你会想用“=”来获得存在量化。

Note of course, that you'll still want to use the not() function as you were doing. Most of the time, it's good to avoid the != operator. If you used it here instead of not(), then it would return true if there are any @vertex attributes that are not equal to the @name value, which is not your intention. (And if either node-set is empty, then it would return false, as comparisons with empty node-sets always return false.)

当然,请注意,您仍然希望像以前一样使用not()函数。大多数情况下,避免使用!=运算符是件好事。如果你在这里使用它而不是not(),那么如果有任何@vertex属性不等于@name值,它将返回true,这不是你的意图。 (如果任一节点集为空,那么它将返回false,因为与空节点集的比较总是返回false。)

If you want to use eq instead, then you'd have to do something like you did: separate out the conditional from the iteration so you could bind current(). But in XPath 2.0, you can do this within an expression:

如果你想使用eq,那么你必须像你一样做:从迭代中分离条件,这样你就可以绑定current()。但是在XPath 2.0中,您可以在表达式中执行此操作:

<xsl:for-each select="for $v in //vertex
                      return $v[not(//directed-edge-to[@vertex eq $v/@name])]">

This is useful for when your condition isn't a simple equality comparison (and thus can't be existentially quantified using "="). For example: starts-with(@vertex, $v/@name).

当您的条件不是简单的相等比较(因此不能使用“=”进行存在量化)时,这非常有用。例如:starts-with(@vertex,$ v / @ name)。

XPath 2.0 also has an explicit way of performing existential quantification. Instead of the for expression above, we could have written this:

XPath 2.0还有一种执行存在量化的明确方法。而不是上面的表达式,我们可以这样写:

<xsl:for-each select="//vertex[not(some $e in //directed-edge-to
                                   satisfies @name eq $e/@vertex)]">

In addition to the "some" syntax, XPath 2.0 also supplies a corresponding "every" syntax for performing universal quantification.

除了“some”语法之外,XPath 2.0还提供了相应的“every”语法,用于执行通用量化。

Rather than using for-each, you could also use template rules, which are more modular (and powerful):

您也可以使用更模块化(功能强大)的模板规则,而不是使用for-each:

<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:template match="/">
    <minimal-vertices>
      <xsl:apply-templates/>
    </minimal-vertices>
  </xsl:template>

  <!-- Copy vertex elements that have no arrows pointing to them -->
  <xsl:template match="vertex[not(@name = //directed-edge-to/@vertex)]">
    <minimal-vertex name="{@name}"/>
  </xsl:template>

</xsl:stylesheet>

Again, in this case, we're relying on the existential quantification of =.

同样,在这种情况下,我们依赖于=的存在量化。

XSLT 1.0 prohibits use of the current() function in patterns, i.e., in the match attribute, but XSLT 2.0 allows it. In that case, current() refers to the node currently being matched. So in XSLT 2.0, we could also write this (without having to use a for expression):

XSLT 1.0禁止在模式中使用current()函数,即在match属性中,但XSLT 2.0允许它。在这种情况下,current()指的是当前匹配的节点。所以在XSLT 2.0中,我们也可以编写它(不必使用for表达式):

<xsl:template match="vertex[not(//directed-edge-to[@vertex eq current()/@name])]">

Note that this pattern is essentially the same as the expression you tried to use in for-each, but whereas it doesn't do what you want in for-each, it does do what you want in the pattern (because what current() binds to is different).

请注意,此模式与您尝试在for-each中使用的表达式基本相同,但是虽然它没有按照您想要的方式执行每个模式,但它确实在模式中执行您想要的操作(因为当前()绑定是不同的)。

Finally, I'll add one more variation that in some ways simplifies the logic (removing not()). This also goes back to using XSLT 1.0:

最后,我将添加一个在某些方面简化逻辑的变体(删除not())。这也可以追溯到使用XSLT 1.0:

<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:template match="/">
    <minimal-vertices>
      <xsl:apply-templates/>
    </minimal-vertices>
  </xsl:template>

  <!-- By default, copy vertex elements -->
  <xsl:template match="vertex">
    <minimal-vertex name="{@name}"/>
  </xsl:template>

  <!-- But strip out vertices with incoming arrows -->
  <xsl:template match="vertex[@name = //directed-edge-to/@vertex]"/>

</xsl:stylesheet>

If you don't like the whitespace being output, add an empty rule for text nodes, so they'll get stripped out (overriding the default rule for text nodes, which is to copy them):

如果您不喜欢输出的空格,请为文本节点添加一个空规则,这样它们就会被剥离(覆盖文本节点的默认规则,即复制它们):

<xsl:template match="text()"/>

Or you could just be more selective in what nodes you apply templates to:

或者您可以在应用模板的节点中更具选择性:

<xsl:apply-templates select="/dag/vertex"/>

Which approach you take is partially dependent on taste, partially dependent on the wider context of your stylesheet and expected data (how much the input structure might vary, etc.).

您采用哪种方法部分取决于品味,部分取决于样式表和预期数据的更广泛背景(输入结构可能变化多少等)。

I know I went way beyond what you were asking for, but I hope you at least found this interesting. :-)

我知道我超越了你的要求,但我希望你至少发现这很有意思。 :-)

#2


One such XPath 1.0 expression is:

一个这样的XPath 1.0表达式是:

        /*/vertex[not(@name = /*/vertex/directed-edge-to/@vertex)]

/ * / vertex [not(@name = / * / vertex / directed-edge-to / @ vertex)]

Then just put it into an XSLT stylesheet like that:

然后把它放到像这样的XSLT样式表中:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

    <xsl:template match="/">
      <minimal-vertices>
          <xsl:for-each select=
          "/*/vertex[not(@name = /*/vertex/directed-edge-to/@vertex)]"
          >
           <minimal-vertex name="{@name}"/>
          </xsl:for-each>
      </minimal-vertices>
    </xsl:template>
</xsl:stylesheet>

When this stylesheet is applied on the originally-provided XML document:

将此样式表应用于最初提供的XML文档时:

<dag>
    <vertex name="A">
        <directed-edge-to vertex="C"/>
    </vertex>
    <vertex name="B">
        <directed-edge-to vertex="C"/>
        <directed-edge-to vertex="D"/>
    </vertex>
    <vertex name="C">
        <directed-edge-to vertex="E"/>
    </vertex>
    <vertex name="D">
        <directed-edge-to vertex="E"/>
    </vertex>
    <vertex name="E">
        <directed-edge-to vertex="G"/>
    </vertex>
    <vertex name="F">
        <directed-edge-to vertex="G"/>
    </vertex>
    <vertex name="G"/>
</dag>

The wanted result is produced:

产生了想要的结果:

<minimal-vertices>
  <minimal-vertex name="A" />
  <minimal-vertex name="B" />
  <minimal-vertex name="F" />
</minimal-vertices>

Do note: A solution for traversing full (maybe cyclic) graphs is available in XSLT here.

请注意:XSLT中提供了遍历完整(可能是循环)图的解决方案。