如何用Nokogiri创建一个唯一的数组?

时间:2022-11-25 14:09:11

I have code that looks like:

我有这样的代码:

file = Nokogiri::XML(File.open('file.xml'))
test = file.xpath("//title") #all <title> elements in xml file

Then when I try:

当我尝试:

puts test.uniq

I get the following error:

我得到以下错误:

 undefined method `uniq' for #<Nokogiri::XML::NodeSet:0x000000011b8bf8> 

Is test not an array? If it's not, how do I make it one?

test不是数组吗?如果不是,我怎么做呢?

Otherwise, how do I get only unique values from the test array?

否则,如何从测试数组中获得唯一的值?

2 个解决方案

#1


7  

Is test not an array? If it's not, how do I make it one?

test不是数组吗?如果不是,我怎么做呢?

test will be a NodeSet:

测试将是一个节点:

Nokogiri::XML('<xml><foo/></xml>').xpath('//foo').class
=> Nokogiri::XML::NodeSet

foo = Nokogiri::XML('<xml><foo/></xml>').xpath('//foo')
=> [#<Nokogiri::XML::Element:0x8109a674 name="foo">]

foo.is_a? Array
=> false

foo.is_a? Enumerable
=> true

To turn it into an array use to_a:

要将其转换为数组,请使用to_a:

foo.respond_to? :to_a
=> true

However, that's not necessary because it also responds to map, each, and all the normal things we'd expect when iterating an Array because it includes Enumerable. map, by definition, automatically returns an array, so there's the conversion you wondered about in your comments and your question.

但是,这并不是必需的,因为它还响应map、each和我们在迭代数组时期望的所有常规事物,因为它包含可枚举。根据定义,map会自动返回一个数组,所以在您的注释和问题中有您想要的转换。

foo.methods.sort - Object.methods
=> [:%, :&, :+, :-, :/, :<<, :[], :add_class, :after, :all?, :any?, :at, :at_css, :at_xpath, :attr, :attribute, :before, :children, :chunk, :collect, :collect_concat, :count, :css, :cycle, :delete, :detect, :document, :document=, :drop, :drop_while, :each, :each_cons, :each_entry, :each_slice, :each_with_index, :each_with_object, :empty?, :entries, :filter, :find, :find_all, :find_index, :first, :flat_map, :grep, :group_by, :index, :inject, :inner_html, :inner_text, :last, :length, :map, :max, :max_by, :member?, :min, :min_by, :minmax, :minmax_by, :none?, :one?, :partition, :pop, :push, :reduce, :reject, :remove, :remove_attr, :remove_class, :reverse, :reverse_each, :search, :select, :set, :shift, :size, :slice, :slice_before, :sort, :sort_by, :take, :take_while, :text, :to_a, :to_ary, :to_html, :to_xhtml, :to_xml, :unlink, :wrap, :xpath, :zip, :|]

I suspect the reason uniq isn't implemented is it's very difficult to figure out how to test for uniqueness. A very simple tag, like:

我怀疑uniq没有实现的原因是很难找到如何测试独特性的方法。一个非常简单的标签,比如:

<div class="foo" id="bar">

is functionally the same as:

功能上与:

<div id="bar" class="foo">

but the obvious to_s test will fail because they won't match a string equality test.

但是显然的to_s测试会失败,因为它们不会匹配字符串相等测试。

The tags would have to be normalized on the fly to put their parameters into the same order, then converted to strings, but what if the class parameter was "foo1 foo2" in the first tag and "foo2 foo1" in the second? Does the uniq code have to dive into specific parameters and reorder them? And, what if the tag is a container, like div is? Should the children of the node also be considered in the uniq test?

标记必须被动态地规范化,以便将它们的参数按相同的顺序排列,然后转换为字符串,但是如果第一个标记中的类参数是“foo1 foo2”,第二个标记是“foo2 foo1 foo1”,该怎么办呢?uniq代码是否必须深入到特定的参数并重新排序?如果标签是一个容器,比如div呢?在uniq测试中是否也应该考虑节点的子节点?

I think that's a can of worms most of us would back away from quickly, and those who'd jump into trying to define uniq would learn a very valuable lesson about rabbit holes. Instead, you are free to define uniq as fits your particular application, so it makes sense to you. I think that's a great design decision for Nokogiri's authors.

我认为这是一种我们大多数人都能很快摆脱的寄生虫,那些想要定义uniq的人将会学到关于兔子洞的很有价值的一课。相反,您可以*地将uniq定义为适合您的特定应用程序,因此这对您是有意义的。我认为这对Nokogiri的作者来说是一个很好的设计决定。

#2


1  

please try -

请尝试,

puts test.map(&:text).uniq

See one example code to demonstrate how it works:

请参见一个示例代码来演示它是如何工作的:

require "nokogiri"

doc = Nokogiri::HTML(<<-EOF) 
<a class = "foo" href = "https://example.com"> Click here </a>
EOF

node = 2.times.map{|n| n = Nokogiri::XML::Node.new('title', doc); n.content = "xxx";n }
node # => [#<Nokogiri::XML::Element:0x4637712 name="title" children=[#<Nokogiri::XML::Text:0x4636efc "xxx">]>, #<Nokogiri::XML::Element:0x4637690 name="title" children=[#<Nokogiri::XML::Text:0x4636218 "xxx">]>]


nodeset = Nokogiri::XML::NodeSet.new(doc,node)
nodeset # => [#<Nokogiri::XML::Element:0x4637712 name="title" children=[#<Nokogiri::XML::Text:0x4636efc "xxx">]>, #<Nokogiri::XML::Element:0x4637690 name="title" children=[#<Nokogiri::XML::Text:0x4636218 "xxx">]>]

nodeset.map{|i| i.text }.uniq # => ["xxx"]

#1


7  

Is test not an array? If it's not, how do I make it one?

test不是数组吗?如果不是,我怎么做呢?

test will be a NodeSet:

测试将是一个节点:

Nokogiri::XML('<xml><foo/></xml>').xpath('//foo').class
=> Nokogiri::XML::NodeSet

foo = Nokogiri::XML('<xml><foo/></xml>').xpath('//foo')
=> [#<Nokogiri::XML::Element:0x8109a674 name="foo">]

foo.is_a? Array
=> false

foo.is_a? Enumerable
=> true

To turn it into an array use to_a:

要将其转换为数组,请使用to_a:

foo.respond_to? :to_a
=> true

However, that's not necessary because it also responds to map, each, and all the normal things we'd expect when iterating an Array because it includes Enumerable. map, by definition, automatically returns an array, so there's the conversion you wondered about in your comments and your question.

但是,这并不是必需的,因为它还响应map、each和我们在迭代数组时期望的所有常规事物,因为它包含可枚举。根据定义,map会自动返回一个数组,所以在您的注释和问题中有您想要的转换。

foo.methods.sort - Object.methods
=> [:%, :&, :+, :-, :/, :<<, :[], :add_class, :after, :all?, :any?, :at, :at_css, :at_xpath, :attr, :attribute, :before, :children, :chunk, :collect, :collect_concat, :count, :css, :cycle, :delete, :detect, :document, :document=, :drop, :drop_while, :each, :each_cons, :each_entry, :each_slice, :each_with_index, :each_with_object, :empty?, :entries, :filter, :find, :find_all, :find_index, :first, :flat_map, :grep, :group_by, :index, :inject, :inner_html, :inner_text, :last, :length, :map, :max, :max_by, :member?, :min, :min_by, :minmax, :minmax_by, :none?, :one?, :partition, :pop, :push, :reduce, :reject, :remove, :remove_attr, :remove_class, :reverse, :reverse_each, :search, :select, :set, :shift, :size, :slice, :slice_before, :sort, :sort_by, :take, :take_while, :text, :to_a, :to_ary, :to_html, :to_xhtml, :to_xml, :unlink, :wrap, :xpath, :zip, :|]

I suspect the reason uniq isn't implemented is it's very difficult to figure out how to test for uniqueness. A very simple tag, like:

我怀疑uniq没有实现的原因是很难找到如何测试独特性的方法。一个非常简单的标签,比如:

<div class="foo" id="bar">

is functionally the same as:

功能上与:

<div id="bar" class="foo">

but the obvious to_s test will fail because they won't match a string equality test.

但是显然的to_s测试会失败,因为它们不会匹配字符串相等测试。

The tags would have to be normalized on the fly to put their parameters into the same order, then converted to strings, but what if the class parameter was "foo1 foo2" in the first tag and "foo2 foo1" in the second? Does the uniq code have to dive into specific parameters and reorder them? And, what if the tag is a container, like div is? Should the children of the node also be considered in the uniq test?

标记必须被动态地规范化,以便将它们的参数按相同的顺序排列,然后转换为字符串,但是如果第一个标记中的类参数是“foo1 foo2”,第二个标记是“foo2 foo1 foo1”,该怎么办呢?uniq代码是否必须深入到特定的参数并重新排序?如果标签是一个容器,比如div呢?在uniq测试中是否也应该考虑节点的子节点?

I think that's a can of worms most of us would back away from quickly, and those who'd jump into trying to define uniq would learn a very valuable lesson about rabbit holes. Instead, you are free to define uniq as fits your particular application, so it makes sense to you. I think that's a great design decision for Nokogiri's authors.

我认为这是一种我们大多数人都能很快摆脱的寄生虫,那些想要定义uniq的人将会学到关于兔子洞的很有价值的一课。相反,您可以*地将uniq定义为适合您的特定应用程序,因此这对您是有意义的。我认为这对Nokogiri的作者来说是一个很好的设计决定。

#2


1  

please try -

请尝试,

puts test.map(&:text).uniq

See one example code to demonstrate how it works:

请参见一个示例代码来演示它是如何工作的:

require "nokogiri"

doc = Nokogiri::HTML(<<-EOF) 
<a class = "foo" href = "https://example.com"> Click here </a>
EOF

node = 2.times.map{|n| n = Nokogiri::XML::Node.new('title', doc); n.content = "xxx";n }
node # => [#<Nokogiri::XML::Element:0x4637712 name="title" children=[#<Nokogiri::XML::Text:0x4636efc "xxx">]>, #<Nokogiri::XML::Element:0x4637690 name="title" children=[#<Nokogiri::XML::Text:0x4636218 "xxx">]>]


nodeset = Nokogiri::XML::NodeSet.new(doc,node)
nodeset # => [#<Nokogiri::XML::Element:0x4637712 name="title" children=[#<Nokogiri::XML::Text:0x4636efc "xxx">]>, #<Nokogiri::XML::Element:0x4637690 name="title" children=[#<Nokogiri::XML::Text:0x4636218 "xxx">]>]

nodeset.map{|i| i.text }.uniq # => ["xxx"]