TCL:如何按标记分割XML文件

时间:2021-10-26 06:33:13

I have an XML file with the following structure:

我有一个XML文件,它的结构如下:

<?xml version="1.0" encoding="UTF-8"?>
  <header>
    <name>generic_1</name>
  </header>
  <body>
    <resources>
      <resource guid="ae8c34ad-a4e6-47fe-9b7d-cd60223754fe">
      </resource>
      <resource guid="fe236467-3df5-4019-9d55-d4881dfabae7">
      </resource>
    </resources>
  </body>

I need to edit the information of each resource so I tried to split the file by the string </resource> but TCL doesn't split it properly.

我需要编辑每个资源的信息,所以我尝试用字符串来分割文件,但是TCL没有正确地拆分它。

This is what I tried: split $file "</resource>". I also tried escaping the <, / and > characters but still no success.

这就是我所尝试的:分割$file“”。我也试着转义<、/和>字符,但还是没有成功。

Can you please help me with an elegant solution? I can do it by taking each line and determining where the resource ends, but a split would be nicer, if it can be done.

你能给我一个优雅的解决方案吗?我可以使用每一行并确定资源在哪里结束,但是如果可以的话,分割会更好。

LE: I can't use tdom, I am editing the file as a text file, not as a XML file.

LE:我不能使用tdom,我把文件编辑成文本文件,而不是XML文件。

Thank you

谢谢你!

2 个解决方案

#1


4  

Suggestion

XML handling in Tcl has been handled numerous times here. It is generally recommended that you use tdom and XPath expressions to navigate the DOM and extract data:

在这里,Tcl的XML处理已经处理了很多次。一般建议您使用tdom和XPath表达式来导航DOM并提取数据:

package req tdom
set doc  [dom parse $xml]
set root [$doc documentElement]
$root selectNodes //resources/resource

Comment

split breaks up a string on a per-character basis. The last argument to split is interpreted as a number of split characters, rather than one split string. Besides, it would not give you what you want.

分割按每个字符分割一个字符串。拆分的最后一个参数被解释为多个拆分字符,而不是一个拆分字符串。此外,它不会给你想要的东西。

#2


2  

This is not an answer, just two additions to mrcalvin's answer, put here for formatting purposes.

这不是一个答案,只是对mrcalvin的回答做了两个补充,放在这里用于格式化。

First, your XML is invalid, as it lacks a root element (maybe it's snipped out).

首先,您的XML是无效的,因为它缺少根元素(可能它已经被删除)。

Second, you didn't describe in what manner you wanted to edit the nodes. Two obvious ways is to add a new attribute value and to add a new child node. This is how you can select to do each with tdom based on the value of the guid attribute:

其次,您没有描述要以什么方式编辑节点。两种明显的方法是添加一个新的属性值并添加一个新的子节点。这就是如何根据guid属性的值选择使用tdom的方法:

set nodes [$root selectNodes //resources/resource]
foreach node $nodes {
    switch [$node getAttribute guid] {
        ae8c34ad-a4e6-47fe-9b7d-cd60223754fe {
            $node setAttribute foo bar
        }
        fe236467-3df5-4019-9d55-d4881dfabae7 {
            $node appendChild [$doc createElement quux]
        }
        default {
            error "unknown resource"
        }
    }
}

If you wish to add something more complex than a child node, there are several ways to do so, including using node commands, appending an XML literal, appending via a script (most useful when several similar additions are made), and appending a nested Tcl list that describes a node structure with attributes.

如果你想添加一些更复杂的比一个子节点,有几种方法,包括使用节点命令,添加XML文字,通过添加一个脚本(最有用的几个类似的增加时),并附加一个嵌套的Tcl描述节点结构和属性列表。

You can then get the edited DOM structure as XML by calling $doc asXML.

然后,您可以通过调用$doc asXML获得编辑过的DOM结构作为XML。

#1


4  

Suggestion

XML handling in Tcl has been handled numerous times here. It is generally recommended that you use tdom and XPath expressions to navigate the DOM and extract data:

在这里,Tcl的XML处理已经处理了很多次。一般建议您使用tdom和XPath表达式来导航DOM并提取数据:

package req tdom
set doc  [dom parse $xml]
set root [$doc documentElement]
$root selectNodes //resources/resource

Comment

split breaks up a string on a per-character basis. The last argument to split is interpreted as a number of split characters, rather than one split string. Besides, it would not give you what you want.

分割按每个字符分割一个字符串。拆分的最后一个参数被解释为多个拆分字符,而不是一个拆分字符串。此外,它不会给你想要的东西。

#2


2  

This is not an answer, just two additions to mrcalvin's answer, put here for formatting purposes.

这不是一个答案,只是对mrcalvin的回答做了两个补充,放在这里用于格式化。

First, your XML is invalid, as it lacks a root element (maybe it's snipped out).

首先,您的XML是无效的,因为它缺少根元素(可能它已经被删除)。

Second, you didn't describe in what manner you wanted to edit the nodes. Two obvious ways is to add a new attribute value and to add a new child node. This is how you can select to do each with tdom based on the value of the guid attribute:

其次,您没有描述要以什么方式编辑节点。两种明显的方法是添加一个新的属性值并添加一个新的子节点。这就是如何根据guid属性的值选择使用tdom的方法:

set nodes [$root selectNodes //resources/resource]
foreach node $nodes {
    switch [$node getAttribute guid] {
        ae8c34ad-a4e6-47fe-9b7d-cd60223754fe {
            $node setAttribute foo bar
        }
        fe236467-3df5-4019-9d55-d4881dfabae7 {
            $node appendChild [$doc createElement quux]
        }
        default {
            error "unknown resource"
        }
    }
}

If you wish to add something more complex than a child node, there are several ways to do so, including using node commands, appending an XML literal, appending via a script (most useful when several similar additions are made), and appending a nested Tcl list that describes a node structure with attributes.

如果你想添加一些更复杂的比一个子节点,有几种方法,包括使用节点命令,添加XML文字,通过添加一个脚本(最有用的几个类似的增加时),并附加一个嵌套的Tcl描述节点结构和属性列表。

You can then get the edited DOM structure as XML by calling $doc asXML.

然后,您可以通过调用$doc asXML获得编辑过的DOM结构作为XML。