使用XML::Simple从解析的XML数据中读取值时遇到的麻烦

时间:2022-12-07 00:17:00

I am half way through writing a script using XML::Simple. I have read that is not so "simple", and even its own documentation discourages its use in new code, but I have no other choice as this script will be an extension to existing code.

我已经完成了使用XML::Simple编写脚本的一半工作。我读到过它并不是那么“简单”,甚至它自己的文档也不鼓励在新代码中使用它,但是我别无选择,因为这个脚本将是现有代码的扩展。

What I am doing is this

我在做的就是这个

  1. Get XML by reading from a URL
  2. 从URL读取XML
  3. Parse it using XML::Simple
  4. 解析它使用XML::Simple
  5. Read the required elements from the data
  6. 从数据中读取所需的元素
  7. Run different checks on these required elements
  8. 对这些必需的元素执行不同的检查

I could parse and do some checks on a few of the elements, but while reading elements that are in array, I am getting undef.

我可以对其中的一些元素进行解析和检查,但是在读取数组中的元素时,我得到了undef。

This is my code:

这是我的代码:

#!/usr/bin/perl

use strict;
use warnings;

use LWP::UserAgent;
use LWP::Simple;
use XML::Simple;
use DBI;

use Data::Dumper;

my $str = "<Actual_URL>";

my $ua = LWP::UserAgent->new;
$ua->timeout( 180 );
$ua->agent( "$0/0.1 " . $ua->agent );

my $req = HTTP::Request->new( GET => $str );

my $buffer;
$req->content_type( 'text/xml' );
$req->content( $buffer );

my $response = $ua->request( $req );

my $xml = $response->content();
print "Value of \$xml is:\n";
print $xml;

my $filename = 'record.txt';
open( my $fh, '>', $filename ) or die "Could not open file '$filename' $!";
print $fh $xml;
close $fh;

my $number_of_lines = `wc -l record.txt | cut -d' ' -f1`;
print "Number of lines in $filename are: $number_of_lines\n";
if ( $number_of_lines >= 50 ) {
    print "TEST_1 SUCCESS\n";
}

my $mysql_dbh;
my $test_id;

my $xst;
my %cmts_Pre_EQ_tags;

if ( ( not defined $xml ) or ( $xml =~ m/read\stimeout/i ) ) {
    &printXMLErr( 'DRUM request timed out' );
}
else {
    my $xs = XML::Simple->new();
    $xst = eval { $xs->XMLin( $xml, KeyAttr => 1 ) };
    &printXMLErr( $@ ) if ( $@ );
    print "Value of \$xst inside is:\n";
    print Dumper( $xst );
}

$cmts_Pre_EQ_tags{'$cmts_Pre_EQ_groupDelayMag'} =
    $xst->{cmts}->{Pre_EQ}->{groupDelayMag}->{content};

#More elements like this are checked here
$cmts_Pre_EQ_tags{'$cmts_Pre_EQ_ICFR'} =
    $xst->{cmts}->{Pre_EQ}->{ICFR}->{content};

my $decision1 = 1;
print "\%cmts_Pre_EQ_tags:\n";
foreach ( sort keys %cmts_Pre_EQ_tags ) {
    print "$_ : $cmts_Pre_EQ_tags{$_}\n";
    if ( $cmts_Pre_EQ_tags{$_} eq '' ) {
        print "$_ is empty!\n";
        $decision1 = 0;
    }
}
print "\n";

if ( $decision1 == 0 ) {
    print "TEST_2_1 FAIL\n";
}
else {
    print "TEST_2_1 SUCCESS\n";
}

my $cpeIP4 = $xst->{cmts}->{cpeIP4}->{content};
print "The cpe IP is: $cpeIP4\n";

if ( $cpeIP4 ne '' ) {
    print "TEST_2_2 SUCCESS\n";
}
else {
    print "TEST_2_2 FAIL\n";
}

# Working fine until here, but following 2 print are showing undef
print Dumper ( $xst->{cmts}{STBDSG}{dsg}[0]{dsgIfStdTunnelFilterTunnelId} );
print Dumper ( $xst->{cmts}{STBDSG}{dsg}[0]{dsgIfStdTunnelFilterClientIdType} );
print "After\n";

Output of last three print statements is:

最后三个打印声明的输出为:

$VAR1 = undef;
$VAR1 = undef;
After

I can't provide the entire XML or the output of print Dumper($xst) as it's too big and gets generated dynamically, but I'll provide a sample of it.

我不能提供整个XML或打印Dumper($xst)的输出,因为它太大,并且会动态生成,但是我将提供它的一个示例。

The part of the XML that is causing trouble is

引起麻烦的XML部分是

<cmts>
  <STBDSG>
    <dsg>
      <dsgIfStdTunnelFilterTunnelId>1</dsgIfStdTunnelFilterTunnelId>
      <dsgIfStdTunnelFilterClientIdType>caSystemId</dsgIfStdTunnelFilterClientIdType>
    </dsg>
    <dsg>
      <dsgIfStdTunnelFilterTunnelId>2</dsgIfStdTunnelFilterTunnelId>
      <dsgIfStdTunnelFilterClientIdType>gaSystemId</dsgIfStdTunnelFilterClientIdType>
    </dsg>
  </STBDSG>
</cmts>

And when this part is parsed, then its corresponding output in $xst is

当这个部分被解析时,它对应的输出是$xst

$VAR1 = {
    'cmts' => {
            'STBDSG' => {
                'dsg' => [
                         {
                           'dsgIfStdTunnelFilterTunnelId' => '1',
                           'dsgIfStdTunnelFilterClientIdType' => 'caSystemId',
                         },
                         {
                           'dsgIfStdTunnelFilterTunnelId' => '2',
                           'dsgIfStdTunnelFilterClientIdType' => 'gaSystemId',
                         }
                         ]
                     },
    },
};

The XML part where after parsing the values are fetched fine is like this

解析完值后获取良好的XML部分如下所示

<cmts>
    <name field_name="Name">cts01nsocmo</name>
    <object field_name="Nemos Object">888</object>
    <vendor field_name="Vendor">xyz</vendor>
</cmts>

Which was converted as:

这是转换为:

    $VAR1 = {
      'cmts' => {
        'name' => {
                    'content' => 'cts01nsocmo',
                    'field_name' => 'Name'
                  },
        'object' => {
                      'content' => '888',
                      'field_name' => 'Nemos Object'
                    },
        'vendor' => {
                      'content' => 'xyz',
                      'field_name' => 'Vendor'
                    }
         },
};

So basically when there is no array in parsed content, the values are being fetched correctly in variables.

所以基本上,当解析内容中没有数组时,值在变量中被正确地获取。

It seems that the reason why this

看来这就是原因

print Dumper ( $xst->{cmts}{STBDSG}{dsg}[0]{dsgIfStdTunnelFilterTunnelId} );
print Dumper ( $xst->{cmts}{STBDSG}{dsg}[0]{dsgIfStdTunnelFilterClientIdType} );

is getting undef is related to setting correct values to either KeyAttr or ForceArray. I am trying to find it by reading XML::Simple, but I wanted to see if there's something distinct that I am missing here.

获取undef与将正确的值设置为KeyAttr或ForceArray相关。我试图通过阅读XML::Simple来找到它,但是我想看看这里是否有我所缺少的不同之处。

2 个解决方案

#1


4  

It's worth considering the use of XML::Twig, regardless of what the rest of your project does

考虑使用XML::Twig是值得的,不管项目的其他部分做什么

In particular, XML::Twig::Elt objects -- the module's implementation of XML elements -- have a simplify method, whose documentation says this

特别是XML::Twig::Elt对象——模块对XML元素的实现——有一个简化的方法,其文档说明了这一点。

Return a data structure suspiciously similar to XML::Simple's. Options are identical to XMLin options

返回与XML::Simple相似的数据结构。选项与XMLin选项相同

So you can use XML::Twig for its precision and convenience, and apply the simplify method if you need to pass on any data that looks like an XML::Simple data structure

因此,您可以使用XML::Twig来实现它的精确性和方便性,如果需要传递任何看起来像XML::Simple数据结构的数据,可以使用simplify方法

#2


1  

As you have found - XML::Simple, isn't. Even it's documentation suggests:

正如您所发现的——XML::简单,不是。甚至它的文档表明:

The use of this module in new code is discouraged. Other modules are available which provide more straightforward and consistent interfaces.

不建议在新代码中使用此模块。其他模块提供了更简单和一致的接口。

Part of the problem is - XML doesn't have any such thing as arrays. It might have duplicated tags. But as such - there is no linear mapping between 'array' and 'XML' so it always makes the programming uncomfortable.

问题的一部分是——XML没有数组之类的东西。它可能有重复的标签。但就其本身而言,在'array'和'XML'之间没有线性映射,因此它总是使编程变得不舒服。

What it's doing to you is assuming that the dsg elements are an array, and casting them automatically.

它对您的作用是假设dsg元素是一个数组,并自动地对它们进行强制转换。

Anyway, I would suggest using XML::Twig instead - and then your 'print' statements just look like this:

无论如何,我建议使用XML::Twig,然后你的“print”语句就像这样:

#!/usr/bin/env perl
use strict;
use warnings;
use XML::Twig;

my $twig = XML::Twig->new->parse( \*DATA );

foreach my $element ( $twig->get_xpath( "cmts/STBDSG/dsg", 0 ) ) {
    print $element ->first_child_text("dsgIfStdTunnelFilterTunnelId"), "\n";
    print $element ->first_child_text("dsgIfStdTunnelFilterClientIdType"),
        "\n";
}

Anyway, if you're forced into using XML::Simple - and throwing it away and starting over isn't an option. (Because seriously, I'd consider it!).

无论如何,如果您*使用XML::Simple——那么丢弃它并重新开始是不可能的。(说真的,我会考虑的!)

What XML::Simple does with 'matching' elements is try and pretend they're arrays.

XML: Simple对“匹配”元素做的是尝试并假装它们是数组。

If there aren't matching elements, it treats them as a hash. That's probably what's catching you out. The problem is - in perl, hashes can't have duplicate keys - so your example, dsg - rather than duplicating it, it array-ifys it.

如果没有匹配的元素,它将它们视为散列。这可能就是你想说的。问题是—在perl中,散列不能有重复的键—因此您的示例dsg—而不是复制它,而是对它进行排列。

Switching on ForceArray puts everything into arrays, but some of the arrays might be single elements. That's useful if you want consistency though.

打开ForceArray将所有东西都放入数组中,但是有些数组可能是单个元素。如果你想要一致性,这是很有用的。

KeyAttr probably doesn't help you - that's primarily geared to having different subelements and you wanting to 'map' them. It allows you to turn one of the XML attributes into the 'key' field in a hash.

KeyAttr可能对您没有帮助——这主要是针对拥有不同的子元素,您想要“映射”它们。它允许您将其中一个XML属性转换为散列中的“key”字段。

E.g.

如。

<element name="firstelement">content</element>
<element name="secondelement">morecontent</element>

If you specify KeyAttr as name it will make a hash with keys of firstelement and secondelement.

如果您指定KeyAttr作为名称,它将使用firstelement和secondelement键生成一个散列。

As your dsg doesn't have this, then that's not what you want.

因为dsg没有这个,所以这不是你想要的。

To iterate upon dsg:

迭代dsg。

foreach my $element ( @{ $xst->{cmts}{STBDSG}{dsg} } ) {
    print $element ->{dsgIfStdTunnelFilterTunnelId},     "\n";
    print $element ->{dsgIfStdTunnelFilterClientIdType}, "\n";
}

#1


4  

It's worth considering the use of XML::Twig, regardless of what the rest of your project does

考虑使用XML::Twig是值得的,不管项目的其他部分做什么

In particular, XML::Twig::Elt objects -- the module's implementation of XML elements -- have a simplify method, whose documentation says this

特别是XML::Twig::Elt对象——模块对XML元素的实现——有一个简化的方法,其文档说明了这一点。

Return a data structure suspiciously similar to XML::Simple's. Options are identical to XMLin options

返回与XML::Simple相似的数据结构。选项与XMLin选项相同

So you can use XML::Twig for its precision and convenience, and apply the simplify method if you need to pass on any data that looks like an XML::Simple data structure

因此,您可以使用XML::Twig来实现它的精确性和方便性,如果需要传递任何看起来像XML::Simple数据结构的数据,可以使用simplify方法

#2


1  

As you have found - XML::Simple, isn't. Even it's documentation suggests:

正如您所发现的——XML::简单,不是。甚至它的文档表明:

The use of this module in new code is discouraged. Other modules are available which provide more straightforward and consistent interfaces.

不建议在新代码中使用此模块。其他模块提供了更简单和一致的接口。

Part of the problem is - XML doesn't have any such thing as arrays. It might have duplicated tags. But as such - there is no linear mapping between 'array' and 'XML' so it always makes the programming uncomfortable.

问题的一部分是——XML没有数组之类的东西。它可能有重复的标签。但就其本身而言,在'array'和'XML'之间没有线性映射,因此它总是使编程变得不舒服。

What it's doing to you is assuming that the dsg elements are an array, and casting them automatically.

它对您的作用是假设dsg元素是一个数组,并自动地对它们进行强制转换。

Anyway, I would suggest using XML::Twig instead - and then your 'print' statements just look like this:

无论如何,我建议使用XML::Twig,然后你的“print”语句就像这样:

#!/usr/bin/env perl
use strict;
use warnings;
use XML::Twig;

my $twig = XML::Twig->new->parse( \*DATA );

foreach my $element ( $twig->get_xpath( "cmts/STBDSG/dsg", 0 ) ) {
    print $element ->first_child_text("dsgIfStdTunnelFilterTunnelId"), "\n";
    print $element ->first_child_text("dsgIfStdTunnelFilterClientIdType"),
        "\n";
}

Anyway, if you're forced into using XML::Simple - and throwing it away and starting over isn't an option. (Because seriously, I'd consider it!).

无论如何,如果您*使用XML::Simple——那么丢弃它并重新开始是不可能的。(说真的,我会考虑的!)

What XML::Simple does with 'matching' elements is try and pretend they're arrays.

XML: Simple对“匹配”元素做的是尝试并假装它们是数组。

If there aren't matching elements, it treats them as a hash. That's probably what's catching you out. The problem is - in perl, hashes can't have duplicate keys - so your example, dsg - rather than duplicating it, it array-ifys it.

如果没有匹配的元素,它将它们视为散列。这可能就是你想说的。问题是—在perl中,散列不能有重复的键—因此您的示例dsg—而不是复制它,而是对它进行排列。

Switching on ForceArray puts everything into arrays, but some of the arrays might be single elements. That's useful if you want consistency though.

打开ForceArray将所有东西都放入数组中,但是有些数组可能是单个元素。如果你想要一致性,这是很有用的。

KeyAttr probably doesn't help you - that's primarily geared to having different subelements and you wanting to 'map' them. It allows you to turn one of the XML attributes into the 'key' field in a hash.

KeyAttr可能对您没有帮助——这主要是针对拥有不同的子元素,您想要“映射”它们。它允许您将其中一个XML属性转换为散列中的“key”字段。

E.g.

如。

<element name="firstelement">content</element>
<element name="secondelement">morecontent</element>

If you specify KeyAttr as name it will make a hash with keys of firstelement and secondelement.

如果您指定KeyAttr作为名称,它将使用firstelement和secondelement键生成一个散列。

As your dsg doesn't have this, then that's not what you want.

因为dsg没有这个,所以这不是你想要的。

To iterate upon dsg:

迭代dsg。

foreach my $element ( @{ $xst->{cmts}{STBDSG}{dsg} } ) {
    print $element ->{dsgIfStdTunnelFilterTunnelId},     "\n";
    print $element ->{dsgIfStdTunnelFilterClientIdType}, "\n";
}