我应该使用XPath还是DOM?

时间:2022-10-03 15:25:14

I have a bunch of hierarchical data stored in an XML file. I am wrapping that up behind hand-crafted classes using TinyXML. Given an XML fragment that describes a source signature as a set of (frequency, level) pairs a bit like this:

我有一堆分层数据存储在XML文件中。我将使用TinyXML在手工制作的类后面完成这一点。给定一个将源签名描述为一组(频率,级别)对的XML片段如下:

<source>
  <sig><freq>1000</freq><level>100</level><sig>
  <sig><freq>1200</freq><level>110</level><sig>
</source>

i am extracting the pairs with this:

我用这个来提取对:

std::vector< std::pair<double, double> > signature() const
{
    std::vector< std::pair<double, double> > sig;
    for (const TiXmlElement* sig_el = node()->FirstChildElement ("sig");
        sig_el;
        sig_el = sig_el->NextSiblingElement("sig"))
    {
        const double level = boost::lexical_cast<double> (sig_el->FirstChildElement("level")->GetText());
        const double freq =  boost::lexical_cast<double> (sig_el->FirstChildElement("freq")->GetText());
        sig.push_back (std::make_pair (freq, level));
    }
    return sig;
}

where node() is pointing at the <source> node.

其中node()指向节点。

Question: would I get a neater, more elegant, more maintainable or in any other way better piece of code using an XPath library instead?

问题:我应该使用XPath库获得更整洁、更优雅、更易于维护的代码,还是以其他任何方式获得更好的代码?

Update: I have tried it using TinyXPath two ways. Neither of them actually work, which is a big point against them obviously. Am I doing something fundamentally wrong? If this is what it is going to look like with XPath, I don't think it is getting me anything.

更新:我已经用TinyXPath两种方法尝试过了。他们俩都不工作,这显然对他们不利。我做错了什么吗?如果这就是XPath的样子,我不认为这能给我带来什么。

std::vector< std::pair<double, double> > signature2() const
{
    std::vector< std::pair<double, double> > sig;
    TinyXPath::xpath_processor source_proc (node(), "sig");
    const unsigned n_nodes = source_proc.u_compute_xpath_node_set();
    for (unsigned i = 0; i != n_nodes; ++i)
    {
        TiXmlNode* s = source_proc.XNp_get_xpath_node (i);
        const double level = TinyXPath::xpath_processor(s, "level/text()").d_compute_xpath();
        const double freq =  TinyXPath::xpath_processor(s, "freq/text()").d_compute_xpath();
        sig.push_back (std::make_pair (freq, level));
    }
    return sig;
}

std::vector< std::pair<double, double> > signature3() const
{
    std::vector< std::pair<double, double> > sig;
    int i = 1;
    while (TiXmlNode* s = TinyXPath::xpath_processor (node(), 
        ("sig[" + boost::lexical_cast<std::string>(i++) + "]/*").c_str()).
        XNp_get_xpath_node(0))
    {
        const double level = TinyXPath::xpath_processor(s, "level/text()").d_compute_xpath();
        const double freq =  TinyXPath::xpath_processor(s, "freq/text()").d_compute_xpath();
        sig.push_back (std::make_pair (freq, level));
    }
    return sig;
}

As a secondary issue, if so, which XPath library should I be using?

作为第二个问题,如果是,应该使用哪个XPath库?

4 个解决方案

#1


5  

In general I tend to prefer XPath based solutions for their concision and versatility but, honestly, in your case, I don't think using XPath will bring a lot to your signature.

一般来说,我倾向于使用基于XPath的解决方案,因为它们的简洁性和通用性,但老实说,在您的例子中,我认为使用XPath不会给您带来很多好处。

Here is why:

这里是原因:

Code elegance
Your code is nice and compact and it will not get any better with an XPath expression.

代码优雅您的代码是漂亮和紧凑的,它不会得到任何更好的XPath表达式。

Memory footprint
Unless your input XML configuration file is huge (a kind of oxymoron) and the DOM parsing would entail a large memory footprint, for which there is no proof that using XPath would be a decisive cure, I would stick with DOM.

除非您的输入XML配置文件非常大(一种矛盾修饰法),并且DOM解析将需要很大的内存占用,否则没有证据表明使用XPath是一种决定性的解决方案,我将坚持使用DOM。

Execution Speed
On such a simple XML tree, execution speed should be comparable. If there would be a difference, it would probably be in TinyXml's advantage because of the collocation of the freq and level tags under a given node.

在如此简单的XML树上执行速度,执行速度应该可以比较。如果存在差异,则可能是TinyXml的优势,因为在给定节点下的freq和level标记的搭配。

Libraries and external references That's the decisive point.
The leading XPath engine in the C++ world is XQilla. It supports XQuery (therefore both XPath 1.0 and 2.0) and is backed by Oracle because it's developed by the group responsible for Berkeley DB products (including precisely Berkeley DB XML – which uses XQilla).
The problem for C++ developers wishing to use XQilla is that they have several alternatives

图书馆和外部引用是决定性的一点。在c++世界中领先的XPath引擎是XQilla。它支持XQuery(因此同时支持XPath 1.0和2.0),并且得到了Oracle的支持,因为它是由负责Berkeley DB产品的团队开发的(包括使用XQilla的Berkeley DB XML)。希望使用XQilla的c++开发人员的问题是,他们有几个替代方案

  1. use Xerces 2 and XQilla 2.1 litter your code with casts.
  2. 使用Xerces 2和XQilla 2.1使用强制类型转换丢弃代码。
  3. use XQilla 2.2+ and use Xerces 3 (no casts needed here)
  4. 使用XQilla 2.2+并使用Xerces 3(这里不需要强制类型转换)
  5. use TinyXPath nicely integrated with TinyXml but for which there however are a number of limitations (no support for namespaces for instance)
  6. 使用TinyXPath与TinyXml很好地集成,但是有很多限制(例如不支持名称空间)
  7. mix Xerces and tinyXml
  8. Xerces和tinyXml

In summary, in your case switching to XPath just for the sake of it, would bring little benefit if any.

总之,在您的例子中,仅仅为了XPath而切换,即使有好处,也不会带来什么好处。

Yet, XPath is a very powerful tool in today's developer toolbox and no one can ignore it. If you just wish to practice on a simple example, yours is as good as any. Then, I'd keep in mind the points above and probably use TinyXPath anyway.

然而,XPath是当今开发人员工具箱中非常强大的工具,没有人可以忽略它。如果你只是想练习一个简单的例子,你的例子和其他例子一样好。然后,我将记住上面的要点,可能还会使用TinyXPath。

#2


3  

You need XPath if you need the flexibility to make runtime changes to the values extracted.

如果需要灵活性对提取的值进行运行时更改,则需要使用XPath。

But, if you're unlikely to need this kind of flexibility, or a recompile to expand what you're extracting isn't a problem and things are not being changed to often or if users never need to update the expressions. Or if what you have works fine for you, you don't need XPath and there are lots of applications that don't use it.

但是,如果您不太可能需要这种灵活性,或者不需要重新编译以扩展所提取的内容,那么就不会有问题,也不会经常更改内容,或者用户永远不需要更新表达式。或者,如果您所拥有的对您来说很有用,那么您不需要XPath,而且有许多应用程序不使用它。

As to whether it's more readable, well yes it sure can be. But if you're just pulling out a few values I'd question the need to pull in another library.

至于它是否更容易读,当然可以。但是,如果您只是提取一些值,我就会怀疑是否需要提取另一个库。

I would certainly document what you currently have a bit better as those not familiar with tinyxml or xml libraries may not be sure what it's doing but it's not hard to understand as it is.

我当然会记录下您目前拥有的更好的东西,因为那些不熟悉tinyxml或xml库的人可能不确定它在做什么,但它并不难理解。

I'm not sure what sort of overhead XPath adds, but I suspect it may add some. For most, I guess they won't notice any difference at all and it may not be a concern to you or most people, but be aware of it in case it's something you're concerned about.

我不确定哪种开销XPath会增加,但我怀疑它可能会增加一些开销。对大多数人来说,我猜他们根本不会注意到任何不同,这可能不是你或大多数人关心的问题,但要注意,以防这是你担心的事情。

If you do want to use an xpath library then all I can say is that I've used the one that came with Xerces-C++ and it wasn't too hard to learn. I have used TinyXML before and someone here has mentioned TinyXPath. I have no experience with it but it's available.

如果您确实想使用xpath库,那么我只能说,我使用了xerces - c++自带的库,学习起来并不难。我以前使用过TinyXML,这里有人提到了TinyXPath。我没有使用它的经验,但它是可用的。

I also found this link useful when first learning about XPath expressions. http://www.w3schools.com/xpath/default.asp

在第一次学习XPath表达式时,我还发现这个链接很有用。http://www.w3schools.com/xpath/default.asp

#3


1  

XPath was made for this, so of course your code will be "better" if you use it.

XPath就是为此而设计的,因此如果您使用它,您的代码当然会“更好”。

I can't recommend a specific c++ XPath library, but even though using one will be the correct decision most of the time, do a cost/benefit analysis before adding one. Maybe YAGNI.

我不能推荐一个特定的c++ XPath库,但即使在大多数情况下使用一个库是正确的决定,在添加一个库之前,也要做一个成本/收益分析。也许YAGNI。

#4


1  

This XPath expression:

这个XPath表达式:

/*/sig[$pN]/*

selects all children elements (just the pair freq and level) of the $pN-th sig child of the top element of the XML document.

选择XML文档顶部元素的$pN-th sig子元素(只有一对freq和level)的所有子元素。

The string $pN should be substituted with a specific positive integer, for example:

字符串$pN应该替换为一个特定的正整数,例如:

/*/sig[2]/*

selects these two elements:

选择这两个元素:

<freq>1200</freq><level>110</level>

Using an XPath expression as this is obviously much shorter and understandable that the provided C++ code.

使用XPath表达式显然比提供的c++代码要短得多,也可以理解。

Another advantage is that the same XPath expression can be used from a C# or Java or ... program, without having to modify it in any way -- thus adhering to XPath results in very high degree of portability.

另一个优点是可以从c#或Java或…程序,不必以任何方式修改它——因此,坚持使用XPath会导致非常高的可移植性。

#1


5  

In general I tend to prefer XPath based solutions for their concision and versatility but, honestly, in your case, I don't think using XPath will bring a lot to your signature.

一般来说,我倾向于使用基于XPath的解决方案,因为它们的简洁性和通用性,但老实说,在您的例子中,我认为使用XPath不会给您带来很多好处。

Here is why:

这里是原因:

Code elegance
Your code is nice and compact and it will not get any better with an XPath expression.

代码优雅您的代码是漂亮和紧凑的,它不会得到任何更好的XPath表达式。

Memory footprint
Unless your input XML configuration file is huge (a kind of oxymoron) and the DOM parsing would entail a large memory footprint, for which there is no proof that using XPath would be a decisive cure, I would stick with DOM.

除非您的输入XML配置文件非常大(一种矛盾修饰法),并且DOM解析将需要很大的内存占用,否则没有证据表明使用XPath是一种决定性的解决方案,我将坚持使用DOM。

Execution Speed
On such a simple XML tree, execution speed should be comparable. If there would be a difference, it would probably be in TinyXml's advantage because of the collocation of the freq and level tags under a given node.

在如此简单的XML树上执行速度,执行速度应该可以比较。如果存在差异,则可能是TinyXml的优势,因为在给定节点下的freq和level标记的搭配。

Libraries and external references That's the decisive point.
The leading XPath engine in the C++ world is XQilla. It supports XQuery (therefore both XPath 1.0 and 2.0) and is backed by Oracle because it's developed by the group responsible for Berkeley DB products (including precisely Berkeley DB XML – which uses XQilla).
The problem for C++ developers wishing to use XQilla is that they have several alternatives

图书馆和外部引用是决定性的一点。在c++世界中领先的XPath引擎是XQilla。它支持XQuery(因此同时支持XPath 1.0和2.0),并且得到了Oracle的支持,因为它是由负责Berkeley DB产品的团队开发的(包括使用XQilla的Berkeley DB XML)。希望使用XQilla的c++开发人员的问题是,他们有几个替代方案

  1. use Xerces 2 and XQilla 2.1 litter your code with casts.
  2. 使用Xerces 2和XQilla 2.1使用强制类型转换丢弃代码。
  3. use XQilla 2.2+ and use Xerces 3 (no casts needed here)
  4. 使用XQilla 2.2+并使用Xerces 3(这里不需要强制类型转换)
  5. use TinyXPath nicely integrated with TinyXml but for which there however are a number of limitations (no support for namespaces for instance)
  6. 使用TinyXPath与TinyXml很好地集成,但是有很多限制(例如不支持名称空间)
  7. mix Xerces and tinyXml
  8. Xerces和tinyXml

In summary, in your case switching to XPath just for the sake of it, would bring little benefit if any.

总之,在您的例子中,仅仅为了XPath而切换,即使有好处,也不会带来什么好处。

Yet, XPath is a very powerful tool in today's developer toolbox and no one can ignore it. If you just wish to practice on a simple example, yours is as good as any. Then, I'd keep in mind the points above and probably use TinyXPath anyway.

然而,XPath是当今开发人员工具箱中非常强大的工具,没有人可以忽略它。如果你只是想练习一个简单的例子,你的例子和其他例子一样好。然后,我将记住上面的要点,可能还会使用TinyXPath。

#2


3  

You need XPath if you need the flexibility to make runtime changes to the values extracted.

如果需要灵活性对提取的值进行运行时更改,则需要使用XPath。

But, if you're unlikely to need this kind of flexibility, or a recompile to expand what you're extracting isn't a problem and things are not being changed to often or if users never need to update the expressions. Or if what you have works fine for you, you don't need XPath and there are lots of applications that don't use it.

但是,如果您不太可能需要这种灵活性,或者不需要重新编译以扩展所提取的内容,那么就不会有问题,也不会经常更改内容,或者用户永远不需要更新表达式。或者,如果您所拥有的对您来说很有用,那么您不需要XPath,而且有许多应用程序不使用它。

As to whether it's more readable, well yes it sure can be. But if you're just pulling out a few values I'd question the need to pull in another library.

至于它是否更容易读,当然可以。但是,如果您只是提取一些值,我就会怀疑是否需要提取另一个库。

I would certainly document what you currently have a bit better as those not familiar with tinyxml or xml libraries may not be sure what it's doing but it's not hard to understand as it is.

我当然会记录下您目前拥有的更好的东西,因为那些不熟悉tinyxml或xml库的人可能不确定它在做什么,但它并不难理解。

I'm not sure what sort of overhead XPath adds, but I suspect it may add some. For most, I guess they won't notice any difference at all and it may not be a concern to you or most people, but be aware of it in case it's something you're concerned about.

我不确定哪种开销XPath会增加,但我怀疑它可能会增加一些开销。对大多数人来说,我猜他们根本不会注意到任何不同,这可能不是你或大多数人关心的问题,但要注意,以防这是你担心的事情。

If you do want to use an xpath library then all I can say is that I've used the one that came with Xerces-C++ and it wasn't too hard to learn. I have used TinyXML before and someone here has mentioned TinyXPath. I have no experience with it but it's available.

如果您确实想使用xpath库,那么我只能说,我使用了xerces - c++自带的库,学习起来并不难。我以前使用过TinyXML,这里有人提到了TinyXPath。我没有使用它的经验,但它是可用的。

I also found this link useful when first learning about XPath expressions. http://www.w3schools.com/xpath/default.asp

在第一次学习XPath表达式时,我还发现这个链接很有用。http://www.w3schools.com/xpath/default.asp

#3


1  

XPath was made for this, so of course your code will be "better" if you use it.

XPath就是为此而设计的,因此如果您使用它,您的代码当然会“更好”。

I can't recommend a specific c++ XPath library, but even though using one will be the correct decision most of the time, do a cost/benefit analysis before adding one. Maybe YAGNI.

我不能推荐一个特定的c++ XPath库,但即使在大多数情况下使用一个库是正确的决定,在添加一个库之前,也要做一个成本/收益分析。也许YAGNI。

#4


1  

This XPath expression:

这个XPath表达式:

/*/sig[$pN]/*

selects all children elements (just the pair freq and level) of the $pN-th sig child of the top element of the XML document.

选择XML文档顶部元素的$pN-th sig子元素(只有一对freq和level)的所有子元素。

The string $pN should be substituted with a specific positive integer, for example:

字符串$pN应该替换为一个特定的正整数,例如:

/*/sig[2]/*

selects these two elements:

选择这两个元素:

<freq>1200</freq><level>110</level>

Using an XPath expression as this is obviously much shorter and understandable that the provided C++ code.

使用XPath表达式显然比提供的c++代码要短得多,也可以理解。

Another advantage is that the same XPath expression can be used from a C# or Java or ... program, without having to modify it in any way -- thus adhering to XPath results in very high degree of portability.

另一个优点是可以从c#或Java或…程序,不必以任何方式修改它——因此,坚持使用XPath会导致非常高的可移植性。