WWW :: Mechanize :: Firefox如何在HTML元素标签中提取文本?

时间:2022-11-19 21:01:08

Good Day,

How do you print the text of an HTML tag with WWW::Mechanize::Firefox?

如何使用WWW :: Mechanize :: Firefox打印HTML标签的文本?

I have tried:

我试过了:

    print $_->text, '/n' for $mech->selector('td.dataCell');

    print $_->text(), '/n' for $mech->selector('td.dataCell');


    print $_->{text}, '/n' for $mech->selector('td.dataCell');

    print $_->content, '/n' for $mech->selector('td.dataCell');

Remember I do not want {innerhtml}, but that does work btw.

记住我不想{innerhtml},但确实有用。

print $_->{text}, '/n' for $mech->selector('td.dataCell');

The above line does work, but output is just multiple /n

上面的行确实有效,但输出只有多个/ n

4 个解决方案

#1


3  

my $node = $mech->xpath('//td[@class="dataCell"]/text()');

print $node->{nodeValue};

Note that if you're retrieving text interspersed with other tags, like "Test_1" and "Test_3" in this example...

请注意,如果您正在检索散布有其他标记的文本,例如本例中的“Test_1”和“Test_3”...

<html>
  <body>
    <form name="input" action="demo_form_action.asp" method="get">
      <input name="testRadioButton" value="test 1" type="radio">Test_1<br>
      <input name="testRadioButton" value="test 3" type="radio">Test_3<br>
      <input value="Submit" type="submit">
    </form>
  </body>
</html>

You need to refer to them by their position within the tag (taking any newlines into account):

您需要通过他们在标记中的位置来引用它们(考虑任何换行):

$node = $self->{mech}->xpath("//form/text()[2]", single=>1);

print $node->{nodeValue};

Which prints "Test_1".

其中打印“Test_1”。

#2


1  

I would do :

我会做 :

print $mech->xpath('//td[@class="dataCell"]/text()');

using a expression

使用xpath表达式

#3


1  

The only solution I have is to use:

我唯一的解决方案是使用:

my $element = $mech->selector('td.dataCell');

my $string = $element->{innerHTML};

And then formatting the html within each dataCell

然后格式化每个dataCell中的html

#4


0  

Either:

$element->{textContent};

or

$element->{innerText};

will work.

#1


3  

my $node = $mech->xpath('//td[@class="dataCell"]/text()');

print $node->{nodeValue};

Note that if you're retrieving text interspersed with other tags, like "Test_1" and "Test_3" in this example...

请注意,如果您正在检索散布有其他标记的文本,例如本例中的“Test_1”和“Test_3”...

<html>
  <body>
    <form name="input" action="demo_form_action.asp" method="get">
      <input name="testRadioButton" value="test 1" type="radio">Test_1<br>
      <input name="testRadioButton" value="test 3" type="radio">Test_3<br>
      <input value="Submit" type="submit">
    </form>
  </body>
</html>

You need to refer to them by their position within the tag (taking any newlines into account):

您需要通过他们在标记中的位置来引用它们(考虑任何换行):

$node = $self->{mech}->xpath("//form/text()[2]", single=>1);

print $node->{nodeValue};

Which prints "Test_1".

其中打印“Test_1”。

#2


1  

I would do :

我会做 :

print $mech->xpath('//td[@class="dataCell"]/text()');

using a expression

使用xpath表达式

#3


1  

The only solution I have is to use:

我唯一的解决方案是使用:

my $element = $mech->selector('td.dataCell');

my $string = $element->{innerHTML};

And then formatting the html within each dataCell

然后格式化每个dataCell中的html

#4


0  

Either:

$element->{textContent};

or

$element->{innerText};

will work.