RSS提要-解析/提取JAVA中描述标记中的src图像标记

时间:2022-08-03 04:54:38

Extending this question

扩展这个问题

How to extract an image src from RSS feed

如何从RSS提要中提取图像src

for JAVA, answer is already made for ios, but to make it work in JAVA there is not enough solutions made for it.

对于JAVA,答案已经为ios做了,但是要让它在JAVA中工作,还没有足够的解决方案。

RSS Feeds parsing the direct tag is known for me, but parsing tag inside another tag is quite complicated like this below

RSS提要解析直接标记为我所知,但是在另一个标签内解析标签非常复杂,如下所示。

    <description>
    <![CDATA[
<img width="745" height="410" src="http://example.com/image.png" class="attachment-large wp-post-image" alt="alt tag" style="margin-bottom: 15px;" />description text
    ]]>
    </description>

How to split up the src tag alone?

如何单独分割src标签?

2 个解决方案

#1


5  

Take a look at jsoup. I think it's what you need.

看看jsoup。我想这正是你所需要的。

EDIT:

编辑:

private String extractImageUrl(String description) {
    Document document = Jsoup.parse(description);
    Elements imgs = document.select("img");

    for (Element img : imgs) {
        if (img.hasAttr("src")) {
            return img.attr("src");
        }
    }

    // no image URL
    return "";
}

#2


1  

You could try to use a regular expression to get the value, give a look to this little example, I hope it's help you. For more info about regular expression you can find more info here. http://www.tutorialspoint.com/java/java_regular_expressions.htm

您可以尝试使用正则表达式来获得值,请查看这个小示例,希望它对您有帮助。有关正则表达式的更多信息,可以在这里找到更多信息。http://www.tutorialspoint.com/java/java_regular_expressions.htm

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Test{

     public static void main(String []args){
        String regularExpression = "src=\"(.*)\" class";
        String html = "<description> <![CDATA[ <img width=\"745\" height=\"410\" src=\"http://example.com/image.png\" class=\"attachment-large wp-post-image\" alt=\"alt tag\" style=\"margin-bottom: 15px;\" />description text ]]> </description>";

      // Create a Pattern object
      Pattern pattern = Pattern.compile(regularExpression);
      // Now create matcher object.
      Matcher matcher = pattern.matcher(html);


   if (matcher.find( )) {
         System.out.println("Found value: " + matcher.group(1) );
        //It's prints Found value: http://example.com/image.png  
    }

     }
}

#1


5  

Take a look at jsoup. I think it's what you need.

看看jsoup。我想这正是你所需要的。

EDIT:

编辑:

private String extractImageUrl(String description) {
    Document document = Jsoup.parse(description);
    Elements imgs = document.select("img");

    for (Element img : imgs) {
        if (img.hasAttr("src")) {
            return img.attr("src");
        }
    }

    // no image URL
    return "";
}

#2


1  

You could try to use a regular expression to get the value, give a look to this little example, I hope it's help you. For more info about regular expression you can find more info here. http://www.tutorialspoint.com/java/java_regular_expressions.htm

您可以尝试使用正则表达式来获得值,请查看这个小示例,希望它对您有帮助。有关正则表达式的更多信息,可以在这里找到更多信息。http://www.tutorialspoint.com/java/java_regular_expressions.htm

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Test{

     public static void main(String []args){
        String regularExpression = "src=\"(.*)\" class";
        String html = "<description> <![CDATA[ <img width=\"745\" height=\"410\" src=\"http://example.com/image.png\" class=\"attachment-large wp-post-image\" alt=\"alt tag\" style=\"margin-bottom: 15px;\" />description text ]]> </description>";

      // Create a Pattern object
      Pattern pattern = Pattern.compile(regularExpression);
      // Now create matcher object.
      Matcher matcher = pattern.matcher(html);


   if (matcher.find( )) {
         System.out.println("Found value: " + matcher.group(1) );
        //It's prints Found value: http://example.com/image.png  
    }

     }
}