在TSQL中更好的方法是为不存在的节点搜索xml。

时间:2022-06-06 12:37:02

We have a source XML file that has an address node, and each node is supposed to have a zip_code node beneath in order to validate. We received a file that failed the schema validation because at least one node was missing it's zip_code (there were several thousand addresses in the file).

我们有一个具有地址节点的源XML文件,每个节点下应该有一个zip_code节点,以便进行验证。我们收到了一个未能通过模式验证的文件,因为至少有一个节点丢失了它的zip_code(文件中有数千个地址)。

We need to find the elements that do not have a zip code, so we can repair the file and send an audit report to the source.

我们需要找到没有邮政编码的元素,以便修复文件并向源发送审计报告。

--declare @x xml = bulkcolumn from openrowset(bulk 'x:\file.xml',single_blob) as s
declare @x xml = N'<addresses>
    <address><external_address_id>1</external_address_id><zip_code>53207</zip_code></address>
    <address><external_address_id>2</external_address_id></address>
</addresses>'

declare @t xml = (
select @x.query('for $a in .//address 
    return 
        if ($a/zip_code) 
            then <external_address_id /> 
        else $a/external_address_id')
)
select x.AddressID.value('.', 'int') AddressID
from @t.nodes('./external_address_id') x(AddressID)
where x.AddressID.value('.', 'int') > 0
GO

Really, it's the where clause that bugs me. I feel like I'm depending on a cast for a null value to 0, and it works, but I'm not really sure that it should. I tried a few variations with the .exist function, but I couldn't get the correct result.

真的,这是where子句困扰我的地方。我觉得我依赖于一个空值为0的类型,它是有效的,但我不确定它是否应该。我尝试了一些.exist函数的变体,但是我不能得到正确的结果。

2 个解决方案

#1


2  

If you just want to locate those nodes that are missing their <zip_code> element, you could use something like this:

如果您只是想要定位那些丢失其 元素的节点,您可以使用以下内容:

SELECT
    ADRS.ADR.value('(external_address_id)[1]', 'int') as 'ExtAdrID'
FROM
    @x.nodes('/addresses/address') as ADRS(ADR)
WHERE
    ADRS.ADR.exist('zip_code') = 0

It uses the built-in .exist() method in XQuery to check the existence of a subnode inside an XML node.

它使用XQuery中的内置.exist()方法检查XML节点中是否存在子节点。

#2


4  

If you just want to ensure that you are selecting address elements that have a zip_code element, then adjust your XPATH to include that criteria in a predicate filter:

如果您只是想确保您正在选择具有zip_code元素的地址元素,那么请调整XPATH以将该条件包含到谓词过滤器中:

/addresses/address[zip_code]

If you also want to ensure that the zip_code element also has a value, use a predicate filter for the zip_node to select those that have text() nodes:

如果您还想确保zip_code元素也有一个值,请使用zip_node的谓词过滤器来选择那些具有text()节点的:

/addresses/address[zip_code[text()]]

EDIT:

编辑:

Actually, I'm looking for the opposite. I need to identify the nodes that don't have a zip, so we can manually correct the source data.

实际上,我在寻找相反的东西。我需要识别没有zip的节点,以便我们可以手动更正源数据。

So, if you want to identify all of the address elements that do not have a zip_code, you can specify it in the XPATH like this:

因此,如果您想识别所有没有zip_code的地址元素,您可以在XPATH中指定它,如下所示:

/addresses/address[not(zip_code)]

#1


2  

If you just want to locate those nodes that are missing their <zip_code> element, you could use something like this:

如果您只是想要定位那些丢失其 元素的节点,您可以使用以下内容:

SELECT
    ADRS.ADR.value('(external_address_id)[1]', 'int') as 'ExtAdrID'
FROM
    @x.nodes('/addresses/address') as ADRS(ADR)
WHERE
    ADRS.ADR.exist('zip_code') = 0

It uses the built-in .exist() method in XQuery to check the existence of a subnode inside an XML node.

它使用XQuery中的内置.exist()方法检查XML节点中是否存在子节点。

#2


4  

If you just want to ensure that you are selecting address elements that have a zip_code element, then adjust your XPATH to include that criteria in a predicate filter:

如果您只是想确保您正在选择具有zip_code元素的地址元素,那么请调整XPATH以将该条件包含到谓词过滤器中:

/addresses/address[zip_code]

If you also want to ensure that the zip_code element also has a value, use a predicate filter for the zip_node to select those that have text() nodes:

如果您还想确保zip_code元素也有一个值,请使用zip_node的谓词过滤器来选择那些具有text()节点的:

/addresses/address[zip_code[text()]]

EDIT:

编辑:

Actually, I'm looking for the opposite. I need to identify the nodes that don't have a zip, so we can manually correct the source data.

实际上,我在寻找相反的东西。我需要识别没有zip的节点,以便我们可以手动更正源数据。

So, if you want to identify all of the address elements that do not have a zip_code, you can specify it in the XPATH like this:

因此,如果您想识别所有没有zip_code的地址元素,您可以在XPATH中指定它,如下所示:

/addresses/address[not(zip_code)]