在未格式化的XML文件中使用批处理文件获取特定标记之间的所有字符串。

时间:2022-11-27 18:00:39

I'm trying to get the strings between 2 tags in an XML file adapting a solution I found in here.

我试图在XML文件的两个标记之间获取字符串,以适应我在这里找到的解决方案。

This is the batch file I've:

这是我的批处理文件:

@echo off
setlocal EnableDelayedExpansion

(for /F "delims=" %%a in ('findstr /I /L "<Name>" contacts.xml') do (
   set "line=%%a
   set "line=!line:*<Name>=!"
   for /F "delims=<" %%b in ("!line!") do echo %%b
)) > list.txt

Now when the XML is formatted I get all the names

现在,当XML被格式化时,我得到了所有的名称

<List>
   <Contacts>
      <Row>
         <Name>Carlos</Name>
         <Path>\Some\path\1</Path>
         <Hidden>False</Hidden>
      </Row>
      <Row>
         <Name>Fernando</Name>
         <Path>\Some\path\2</Path>
         <Hidden>False</Hidden>
      </Row>
      <Row>
         <Name>Luis</Name>
         <Path>\Some\path\3</Path>
         <Hidden>False</Hidden>
      </Row>
      <Row>
         <Name>Daniel</Name>
         <Path>\Some\path\4</Path>
         <Hidden>False</Hidden>
      </Row>
   </Contacts>
</List>

Carlos

卡洛斯

Fernando

费尔南多

Luis

路易斯

Daniel

丹尼尔

But when the XML(This is how it's generated) is in 1 line I only get the first name

但是,当XML(这是它的生成方式)在一行中时,我只能得到第一个名称

<List><Contacts><Row><Name>Carlos</Name><Path>\Some\path\1</Path><Hidden>False</Hidden></Row><Row><Name>Fernando</Name><Path>\Some\path\1</Path><Hidden>False</Hidden></Row><Row><Name>Luis</Name><Path>\Some\path\1</Path><Hidden>False</Hidden></Row><Row><Name>Daniel</Name><Path>\Some\path\1</Path><Hidden>False</Hidden></Row></Contacts></List>

Carlos

卡洛斯

What changes should I make to the batch file so it correctly parse unformatted XML files?

我应该对批处理文件做什么更改,以便它正确地解析未格式化的XML文件?

3 个解决方案

#1


1  

Batch files are strongly tied to the format of the data to process. If the data changes, usually a new Batch file is required. The pure Batch file below extract the names of your example unformatted xml file as long as the line be less than 8190 characters.

批处理文件与要处理的数据格式紧密相关。如果数据发生变化,通常需要一个新的批处理文件。下面的纯批处理文件提取示例未格式化xml文件的名称,只要行数小于8190个字符。

@echo off
setlocal EnableDelayedExpansion

for /F "delims=" %%a in (contacts.xml) do (
   set "line=%%a"
   for %%X in (^"^
% Do NOT remove this line %
^") do for /F "delims=" %%b in ("!line:>=%%~X!") do (
      if /I "!field!" equ "<Name" for /F "delims=<" %%c in ("%%b") do echo %%c
      set "field=%%b"
   )
)

#2


4  

As Adriano implied in his comment, parsing XML via a powerful tool like regular expressions is frowned upon. Parsing XML with batch is far worse.

正如阿德里亚诺在他的评论中所暗示的,通过正则表达式这样强大的工具解析XML是不可取的。使用批处理解析XML要糟糕得多。

Pure, native batch cannot work with lines of text longer than 8191 bytes unless you use extraordinary techniques involving the FC command - trust me, you don't want to go there. There is no reason to expect an XML file to be smaller than 8191 bytes, so the short answer is essentially - you cannot parse unformatted XML that exists as one continuous line using native batch commands.

纯的、本机批处理不能处理长度超过8191字节的文本行,除非您使用涉及FC命令的特殊技术——相信我,您不会想去那里。没有理由期望XML文件小于8191字节,因此简单地说,您不能解析未格式化的XML,它作为一个使用本机批处理命令的连续行而存在。

I have written a script based regular expression utility for batch called JREPL.BAT. It is a hybrid JScript/batch script that runs natively on any Windows machine from XP onward. I recommend putting JREPL.BAT in a folder (I use c:\utils) and then include that folder in your PATH variable.

我已经为批处理编写了一个基于脚本的正则表达式实用程序j答复。bat。它是一个混合的JScript/批处理脚本,从XP开始就在任何Windows机器上运行。我建议把JREPL。在文件夹中添加BAT(我使用c:\utils),然后在路径变量中包含该文件夹。

The following JREPL.BAT command can be used to parse out your names under most simple scenarios, assuming you never have nested <Name> elements. But like any regular expression "solution", this code is not robust for all situations.

以下JREPL。BAT命令可以用于解析大多数简单场景下的名称,假设您从未嵌套过 元素。但与任何正则表达式“解决方案”一样,这段代码并不适用于所有情况。

jrepl "<Name>([\s\S]*?)</Name>" "$1" /m /jmatch /f "contacts.xml" /o "list.txt"

Since JREPL is a batch script, then you must use CALL JREPL if you want to use the command within another batch script.

因为JREPL是批处理脚本,所以如果您想在另一个批处理脚本中使用该命令,则必须使用CALL JREPL。

#3


3  

Before I answer, I should point out that your single-line XML is missing a </Row> close tag, and all <Name> elements contain Carlos. So, in testing my answer, I used the following XML:

在回答之前,我应该指出,您的单行XML缺少一个关闭标记,并且所有 元素都包含Carlos。因此,在测试我的答案时,我使用了以下XML:

<List><Contacts><Row><Name>Carlos</Name><Path>\Some\path\1</Path><Hidden>False</Hidden></Row><Row><Name>Fernando</Name><Path>\Some\path\1</Path><Hidden>False</Hidden></Row><Row><Name>Luis</Name><Path>\Some\path\1</Path><Hidden>False</Hidden></Row><Row><Name>Daniel</Name><Path>\Some\path\1</Path><Hidden>False</Hidden></Row></Contacts></List>

Whenever you're manipulating or extracting data from XML or HTML, I think it's generally preferable to parse it as XML or HTML, rather than trying to scrape bits of text from it. Regardless of whether your XML is beautified or minified, if you parse XML as XML, your code still works. The same can't be said for regexp or token searches.

每当您操作或从XML或HTML中提取数据时,我认为通常最好将其解析为XML或HTML,而不是试图从XML或HTML中获取少量文本。不管您的XML是美化了还是缩小了,如果您将XML解析为XML,那么您的代码仍然可以工作。对于regexp或令牌搜索也不能这么说。

Pure batch doesn't handle XML all that well. But Windows Scripting Host does. Your best bet would be to employ JScript or VBscript, or possibly PowerShell. My solution is a batch + JScript hybrid script, employing the Microsoft.XMLDOM COM object and an XPath query to select the text child nodes of all the <Name> nodes -- basically, selectNodes('//Name/text()').

纯批处理不能很好地处理XML。但是Windows脚本主机是这样的。最好的方法是使用JScript或VBscript,或者PowerShell。我的解决方案是使用微软的批处理+ JScript混合脚本。XMLDOM COM对象和XPath查询来选择所有 节点的文本子节点——基本上就是selectNodes('/ Name/text()))。

Save this with a .bat extension and salt to taste.

保存这个与。bat扩展和盐味道。

@if (@CodeSection == @Batch) @then

@echo off
setlocal

set "xmlfile=test.xml"

for /f "delims=" %%I in ('cscript /nologo /e:JScript "%~f0" "%xmlfile%"') do (
    echo Name: %%~I
)

rem // end main runtime
goto :EOF

@end
// end batch / begin JScript chimera

var DOM = WSH.CreateObject('Microsoft.XMLDOM');

with (DOM) {
    load(WSH.Arguments(0));
    async = false;
    setProperty('SelectionLanguage', 'XPath');
}

if (DOM.parseError.errorCode) {
   WSH.Echo(DOM.parseError.reason);
   WSH.Quit(1);
}

for (var d = DOM.documentElement.selectNodes('//Name/text()'), i = 0; i < d.length; i++) {
    WSH.Echo(d[i].data);
}

#1


1  

Batch files are strongly tied to the format of the data to process. If the data changes, usually a new Batch file is required. The pure Batch file below extract the names of your example unformatted xml file as long as the line be less than 8190 characters.

批处理文件与要处理的数据格式紧密相关。如果数据发生变化,通常需要一个新的批处理文件。下面的纯批处理文件提取示例未格式化xml文件的名称,只要行数小于8190个字符。

@echo off
setlocal EnableDelayedExpansion

for /F "delims=" %%a in (contacts.xml) do (
   set "line=%%a"
   for %%X in (^"^
% Do NOT remove this line %
^") do for /F "delims=" %%b in ("!line:>=%%~X!") do (
      if /I "!field!" equ "<Name" for /F "delims=<" %%c in ("%%b") do echo %%c
      set "field=%%b"
   )
)

#2


4  

As Adriano implied in his comment, parsing XML via a powerful tool like regular expressions is frowned upon. Parsing XML with batch is far worse.

正如阿德里亚诺在他的评论中所暗示的,通过正则表达式这样强大的工具解析XML是不可取的。使用批处理解析XML要糟糕得多。

Pure, native batch cannot work with lines of text longer than 8191 bytes unless you use extraordinary techniques involving the FC command - trust me, you don't want to go there. There is no reason to expect an XML file to be smaller than 8191 bytes, so the short answer is essentially - you cannot parse unformatted XML that exists as one continuous line using native batch commands.

纯的、本机批处理不能处理长度超过8191字节的文本行,除非您使用涉及FC命令的特殊技术——相信我,您不会想去那里。没有理由期望XML文件小于8191字节,因此简单地说,您不能解析未格式化的XML,它作为一个使用本机批处理命令的连续行而存在。

I have written a script based regular expression utility for batch called JREPL.BAT. It is a hybrid JScript/batch script that runs natively on any Windows machine from XP onward. I recommend putting JREPL.BAT in a folder (I use c:\utils) and then include that folder in your PATH variable.

我已经为批处理编写了一个基于脚本的正则表达式实用程序j答复。bat。它是一个混合的JScript/批处理脚本,从XP开始就在任何Windows机器上运行。我建议把JREPL。在文件夹中添加BAT(我使用c:\utils),然后在路径变量中包含该文件夹。

The following JREPL.BAT command can be used to parse out your names under most simple scenarios, assuming you never have nested <Name> elements. But like any regular expression "solution", this code is not robust for all situations.

以下JREPL。BAT命令可以用于解析大多数简单场景下的名称,假设您从未嵌套过 元素。但与任何正则表达式“解决方案”一样,这段代码并不适用于所有情况。

jrepl "<Name>([\s\S]*?)</Name>" "$1" /m /jmatch /f "contacts.xml" /o "list.txt"

Since JREPL is a batch script, then you must use CALL JREPL if you want to use the command within another batch script.

因为JREPL是批处理脚本,所以如果您想在另一个批处理脚本中使用该命令,则必须使用CALL JREPL。

#3


3  

Before I answer, I should point out that your single-line XML is missing a </Row> close tag, and all <Name> elements contain Carlos. So, in testing my answer, I used the following XML:

在回答之前,我应该指出,您的单行XML缺少一个关闭标记,并且所有 元素都包含Carlos。因此,在测试我的答案时,我使用了以下XML:

<List><Contacts><Row><Name>Carlos</Name><Path>\Some\path\1</Path><Hidden>False</Hidden></Row><Row><Name>Fernando</Name><Path>\Some\path\1</Path><Hidden>False</Hidden></Row><Row><Name>Luis</Name><Path>\Some\path\1</Path><Hidden>False</Hidden></Row><Row><Name>Daniel</Name><Path>\Some\path\1</Path><Hidden>False</Hidden></Row></Contacts></List>

Whenever you're manipulating or extracting data from XML or HTML, I think it's generally preferable to parse it as XML or HTML, rather than trying to scrape bits of text from it. Regardless of whether your XML is beautified or minified, if you parse XML as XML, your code still works. The same can't be said for regexp or token searches.

每当您操作或从XML或HTML中提取数据时,我认为通常最好将其解析为XML或HTML,而不是试图从XML或HTML中获取少量文本。不管您的XML是美化了还是缩小了,如果您将XML解析为XML,那么您的代码仍然可以工作。对于regexp或令牌搜索也不能这么说。

Pure batch doesn't handle XML all that well. But Windows Scripting Host does. Your best bet would be to employ JScript or VBscript, or possibly PowerShell. My solution is a batch + JScript hybrid script, employing the Microsoft.XMLDOM COM object and an XPath query to select the text child nodes of all the <Name> nodes -- basically, selectNodes('//Name/text()').

纯批处理不能很好地处理XML。但是Windows脚本主机是这样的。最好的方法是使用JScript或VBscript,或者PowerShell。我的解决方案是使用微软的批处理+ JScript混合脚本。XMLDOM COM对象和XPath查询来选择所有 节点的文本子节点——基本上就是selectNodes('/ Name/text()))。

Save this with a .bat extension and salt to taste.

保存这个与。bat扩展和盐味道。

@if (@CodeSection == @Batch) @then

@echo off
setlocal

set "xmlfile=test.xml"

for /f "delims=" %%I in ('cscript /nologo /e:JScript "%~f0" "%xmlfile%"') do (
    echo Name: %%~I
)

rem // end main runtime
goto :EOF

@end
// end batch / begin JScript chimera

var DOM = WSH.CreateObject('Microsoft.XMLDOM');

with (DOM) {
    load(WSH.Arguments(0));
    async = false;
    setProperty('SelectionLanguage', 'XPath');
}

if (DOM.parseError.errorCode) {
   WSH.Echo(DOM.parseError.reason);
   WSH.Quit(1);
}

for (var d = DOM.documentElement.selectNodes('//Name/text()'), i = 0; i < d.length; i++) {
    WSH.Echo(d[i].data);
}