如何从HTML标签中删除数据

时间:2022-10-30 09:32:35

Say I have data like this:

假设我有这样的数据:

<option value="abc" >Test - 123</option>
<option value="def" >Test - 456</option>
<option value="ghi" >Test - 789</option>

Using PHP, how would I sort through the HTML tags, returning all text from within the option values. For instance, given the code above, I'd like to return 'Test - 123', 'Test - 456', 'Test - 789'.

使用PHP,如何对HTML标记进行排序,从选项值中返回所有文本。例如,给定上面的代码,我想返回'Test - 123', 'Test - 456', 'Test - 789'。

Thanks for the help!

谢谢你的帮助!

UPDATE: So that I'm more clear - I'm using filegetcontents() to get the html from a site. For my purposes, I'd like to be able to sort through the html, find the option values, and output them. In this case, return 'Test - 123', 'Test - 456', etc.

更新:这样我就更清楚了——我使用filegetcontents()从站点获取html。出于我的目的,我希望能够对html进行排序,找到选项值,并输出它们。在这种情况下,返回“Test - 123”、“Test - 456”等。

6 个解决方案

#1


0  

If we're doing regex stuff, I like this perl-like syntax:

如果我们在做regex,我喜欢这种类似perl的语法:

$test = "<option value=\"abc\" >Test - 123</option>\n" .
    "<option value=\"abc\" >Test - 456</option>\n" .
    "<option value=\"abc\" >Test - 789</option>\n"; 

for ($offset=0; preg_match("/<option[^>]*>([^<]+)/",$test, $matches, 
                        PREG_OFFSET_CAPTURE, $offset); $offset=$matches[1][1])
   print($matches[1][0] . "\n");'

#2


3  

There are many ways, which one is the best depends on more details than you've provided in your question.
One possibility: DOMDocument and DOMXPath

有很多方法,哪个是最好的取决于比你在你的问题中提供的更多的细节。一种可能性是DOMDocument和DOMXPath

<?php
$doc = new DOMDocument;
$doc->loadhtml('<html><head><title>???</title></head><body>
  <form method="post" action="?" id="form1">
      <div>
        <select name="foo">
        <option value="abc" >Test - 123</option>
        <option value="def" >Test - 456</option>
        <option value="ghi" >Test - 789</option>
      </select>
    </div>
  </form>
</body></html>');

$xpath = new DOMXPath($doc);
foreach( $xpath->query('//form[@id="form1"]//option') as $o) {
    echo 'option text: ', $o->nodeValue, "  \n";
}

prints

打印

option text: Test - 123  
option text: Test - 456  
option text: Test - 789  

#3


1  

This code would load the values into an array, assuming you have line breaks in between the option tags like you showed:

该代码将把值加载到一个数组中,假设在选项标签之间有换行符,如您所示:

// Load your HTML into a string.
$html = <<<EOF
<option value="abc" >Test - 123</option>
<option value="def" >Test - 456</option>
<option value="ghi" >Test - 789</option>
EOF;

// Break the values into an array.
$vals = explode("\n", strip_tags($html));

#4


1  

If you’ve not just a fracture like the one mentioned, use a real parser like DOMDocument that you can walk through with DOMXPath.

如果您不像前面提到的那样,只是出现了一个断裂,那么请使用一个真正的语法分析器,比如DOMDocument,您可以使用DOMXPath处理它。

Otherwise try this regular expression together with preg_match_all:

否则,使用preg_match_all来尝试这个正则表达式:

<option(?:[^>"']+|"[^"]*"|'[^']*')*>([^<]+)</option>

#5


0  

http://networking.ringofsaturn.com/Web/removetags.php

http://networking.ringofsaturn.com/Web/removetags.php

preg_match_all("s/<[a-zA-Z\/][^>]*>//g", $data, $out);

#6


0  

Using strip_tags unless I'm misunderstanding the question.

使用strip_tags,除非我误解了这个问题。

    $string = '<option value="abc" >Test - 123</option>
    <option value="def" >Test - 456</option>
    <option value="ghi" >Test - 789</option>';

    $string = strip_tags($string);

Update: Missed that you loosely specify an array in your question. In this case, and I'm sure there's a cleaner method, I'd do something like:

更新:错过了松散地指定问题中的数组。在这种情况下,我确信有一种更干净的方法,我会这样做:

$teststring = '<option value="abc" >Test - 123</option>
<option value="def" >Test - 456</option>
<option value="ghi" >Test - 789</option>';

$stringarray = split("\n", strip_tags($teststring));
print_r($stringarray);

Update 2: And just to top and tail it, to present it as you originally asked (not an array as we may have been misled to believe, try the following:

更新2:点击顶部并跟踪它,按照您最初的要求显示它(不是我们可能被误导的数组,请尝试以下操作:

$teststring = '<option value="abc" >Test - 123</option>
<option value="def" >Test - 456</option>
<option value="ghi" >Test - 789</option>';

$stringarray = split("\n", strip_tags($teststring));

$newstring = join($stringarray, "','");
echo "'" . $newstring . "'\n";

#1


0  

If we're doing regex stuff, I like this perl-like syntax:

如果我们在做regex,我喜欢这种类似perl的语法:

$test = "<option value=\"abc\" >Test - 123</option>\n" .
    "<option value=\"abc\" >Test - 456</option>\n" .
    "<option value=\"abc\" >Test - 789</option>\n"; 

for ($offset=0; preg_match("/<option[^>]*>([^<]+)/",$test, $matches, 
                        PREG_OFFSET_CAPTURE, $offset); $offset=$matches[1][1])
   print($matches[1][0] . "\n");'

#2


3  

There are many ways, which one is the best depends on more details than you've provided in your question.
One possibility: DOMDocument and DOMXPath

有很多方法,哪个是最好的取决于比你在你的问题中提供的更多的细节。一种可能性是DOMDocument和DOMXPath

<?php
$doc = new DOMDocument;
$doc->loadhtml('<html><head><title>???</title></head><body>
  <form method="post" action="?" id="form1">
      <div>
        <select name="foo">
        <option value="abc" >Test - 123</option>
        <option value="def" >Test - 456</option>
        <option value="ghi" >Test - 789</option>
      </select>
    </div>
  </form>
</body></html>');

$xpath = new DOMXPath($doc);
foreach( $xpath->query('//form[@id="form1"]//option') as $o) {
    echo 'option text: ', $o->nodeValue, "  \n";
}

prints

打印

option text: Test - 123  
option text: Test - 456  
option text: Test - 789  

#3


1  

This code would load the values into an array, assuming you have line breaks in between the option tags like you showed:

该代码将把值加载到一个数组中,假设在选项标签之间有换行符,如您所示:

// Load your HTML into a string.
$html = <<<EOF
<option value="abc" >Test - 123</option>
<option value="def" >Test - 456</option>
<option value="ghi" >Test - 789</option>
EOF;

// Break the values into an array.
$vals = explode("\n", strip_tags($html));

#4


1  

If you’ve not just a fracture like the one mentioned, use a real parser like DOMDocument that you can walk through with DOMXPath.

如果您不像前面提到的那样,只是出现了一个断裂,那么请使用一个真正的语法分析器,比如DOMDocument,您可以使用DOMXPath处理它。

Otherwise try this regular expression together with preg_match_all:

否则,使用preg_match_all来尝试这个正则表达式:

<option(?:[^>"']+|"[^"]*"|'[^']*')*>([^<]+)</option>

#5


0  

http://networking.ringofsaturn.com/Web/removetags.php

http://networking.ringofsaturn.com/Web/removetags.php

preg_match_all("s/<[a-zA-Z\/][^>]*>//g", $data, $out);

#6


0  

Using strip_tags unless I'm misunderstanding the question.

使用strip_tags,除非我误解了这个问题。

    $string = '<option value="abc" >Test - 123</option>
    <option value="def" >Test - 456</option>
    <option value="ghi" >Test - 789</option>';

    $string = strip_tags($string);

Update: Missed that you loosely specify an array in your question. In this case, and I'm sure there's a cleaner method, I'd do something like:

更新:错过了松散地指定问题中的数组。在这种情况下,我确信有一种更干净的方法,我会这样做:

$teststring = '<option value="abc" >Test - 123</option>
<option value="def" >Test - 456</option>
<option value="ghi" >Test - 789</option>';

$stringarray = split("\n", strip_tags($teststring));
print_r($stringarray);

Update 2: And just to top and tail it, to present it as you originally asked (not an array as we may have been misled to believe, try the following:

更新2:点击顶部并跟踪它,按照您最初的要求显示它(不是我们可能被误导的数组,请尝试以下操作:

$teststring = '<option value="abc" >Test - 123</option>
<option value="def" >Test - 456</option>
<option value="ghi" >Test - 789</option>';

$stringarray = split("\n", strip_tags($teststring));

$newstring = join($stringarray, "','");
echo "'" . $newstring . "'\n";