如何使用Perl解析大型XML文件? [重复]

时间:2023-01-15 09:41:45

Possible Duplicate:
Why am I getting an “Out of memory” error with Perl's XML::Simple?

可能重复:为什么我用Perl的XML :: Simple会出现“Out of memory”错误?

I have a XML file like this:

我有一个像这样的XML文件:

            <message>
                <c1>
                    <rrcConnectionSetupComplete>
                        <rrc-TransactionIdentifier>2</rrc-TransactionIdentifier>
                        <criticalExtensions>
                            <c1>
                                <rrcConnectionSetupComplete-r8>
                                    <selectedPLMN-Identity> 1 </selectedPLMN-Identity>
                                    <dedicatedInfoNAS> 07410109014290112345671000028020000f0 </dedicatedInfoNAS>
                                </rrcConnectionSetupComplete-r8>
                            </c1>
                        </criticalExtensions>
                    </rrcConnectionSetupComplete>
                </c1>
            </message>

I am using Perl code like this to access the data in XML file (I should stick on to this format of accessing)

我使用这样的Perl代码来访问XML文件中的数据(我应该坚持这种访问格式)

#!/usr/bin/perl

use strict;

use XML::Simple;

my $xml = new XML::Simple;

my $data = $xml->XMLin("uL-DCCH-Message.xml");

my $rrc_trans_identifier = $data->{'c1'}->{'rrcConnectionSetupComplete'}->{'rrc-TransactionIdentifier'};
print "rrc_trans_id :: $rrc_trans_identifier\n";

my $selected_plmn_id = $data->{c1}->{rrcConnectionSetupComplete}->{criticalExtensions}->{c1}->{'rrcConnectionSetupComplete-r8'}->{'selectedPLMN-Identity'};
print "plmn identity :: $selected_plmn_id\n";

my $rrc_dedicated_info_nas = $data->{c1}->{rrcConnectionSetupComplete}->{criticalExtensions}->{c1}->{'rrcConnectionSetupComplete-r8'}->{dedicatedInfoNAS};
print "dedicated info nas :: $rrc_dedicated_info_nas\n";

The output produced is,

产生的输出是,

rrc_trans_id :: 2
plmn identity ::  1
dedicated info nas ::  07410109014290112345671000028020000f0

Perl code using XML::Simple is working fine for smaller XML files (as shown in the above output).

使用XML :: Simple的Perl代码适用于较小的XML文件(如上面的输出所示)。

But If XML file is large, then XML::Simple cannot handle and it is showing the error message Ran out of memory.

但是如果XML文件很大,那么XML :: Simple无法处理,它显示错误消息Ran out of memory。

Are there any other XML Parsers I can use so that I can access the elements in XML file in the similar manner as shown above?

我可以使用任何其他XML解析器,以便我可以以与上面类似的方式访问XML文件中的元素吗?

If there is any other parsers available, can any one give an example by following the same conventions I am following for XML::Simple.

如果有任何其他解析器可用,可以通过遵循我遵循XML :: Simple的相同约定来提供示例。

2 个解决方案

#1


4  

There are two types of XML Parsers available:

有两种类型的XML解析器可用:

  • simple ones that read the whole XML file into memory and generate an easy accessible data structure, this takes a rather big amount of memory so you'll get into trouble with bigger files. Their advantage is that they are usually very easy to work with.

    简单的将整个XML文件读入内存并生成易于访问的数据结构,这需要相当大的内存,因此您将遇到更大的文件的麻烦。它们的优点是它们通常很容易使用。

  • the SAX based parsers which process the XML element by element. To work with this parsers the developer (you!) has to register callbacks for every interesting element and work with the information from the callbacks. Everytime the SAX parser encounters a given element the associated callback is executed and you are able to only work with the interesting tags rather the whole file at once. This parsers keep the memory usage (potentially) very low but require substatially more work.

    基于SAX的解析器,它按元素处理XML元素。为了使用这个解析器,开发人员(你!)必须为每个有趣的元素注册回调,并使用回调中的信息。每次SAX解析器遇到给定元素时,都会执行关联的回调,并且您只能同时处理有趣的标记而不是整个文件。这个解析器保持内存使用(可能)非常低,但需要更多的工作。

#2


2  

If the file is huge then you should use any SAX based parser or try LibXML parser.

如果文件很大,那么你应该使用任何基于SAX的解析器或尝试LibXML解析器。

#1


4  

There are two types of XML Parsers available:

有两种类型的XML解析器可用:

  • simple ones that read the whole XML file into memory and generate an easy accessible data structure, this takes a rather big amount of memory so you'll get into trouble with bigger files. Their advantage is that they are usually very easy to work with.

    简单的将整个XML文件读入内存并生成易于访问的数据结构,这需要相当大的内存,因此您将遇到更大的文件的麻烦。它们的优点是它们通常很容易使用。

  • the SAX based parsers which process the XML element by element. To work with this parsers the developer (you!) has to register callbacks for every interesting element and work with the information from the callbacks. Everytime the SAX parser encounters a given element the associated callback is executed and you are able to only work with the interesting tags rather the whole file at once. This parsers keep the memory usage (potentially) very low but require substatially more work.

    基于SAX的解析器,它按元素处理XML元素。为了使用这个解析器,开发人员(你!)必须为每个有趣的元素注册回调,并使用回调中的信息。每次SAX解析器遇到给定元素时,都会执行关联的回调,并且您只能同时处理有趣的标记而不是整个文件。这个解析器保持内存使用(可能)非常低,但需要更多的工作。

#2


2  

If the file is huge then you should use any SAX based parser or try LibXML parser.

如果文件很大,那么你应该使用任何基于SAX的解析器或尝试LibXML解析器。