如何使用Java从网站检索URL?

时间:2022-09-03 23:19:01

I want to use HTTP GET and POST commands to retrieve URLs from a website and parse the HTML. How do I do this?

我想使用HTTP GET和POST命令从网站检索URL并解析HTML。我该怎么做呢?

5 个解决方案

#1


19  

You can use HttpURLConnection in combination with URL.

您可以将HttpURLConnection与URL结合使用。

URL url = new URL("http://example.com");
HttpURLConnection connection = (HttpURLConnection)url.openConnection();
connection.setRequestMethod("GET");
connection.connect();

InputStream stream = connection.getInputStream();
// read the contents using an InputStreamReader

#2


3  

The easiest way to do a GET is to use the built in java.net.URL. However, as mentioned, httpclient is the proper way to go, as it will allow you among others to handle redirects.

执行GET最简单的方法是使用内置的java.net.URL。但是,如上所述,httpclient是正确的方法,因为它将允许您和其他人处理重定向。

For parsing the html, you can use html parser.

对于解析html,您可以使用html解析器。

#3


3  

The ticked/approved answer for this is from robhruska - thank you. This shows the most basic way to do it, it's simple with an understanding of what's necessary to do a simple URL connection. However, the longer term strategy would be to use HTTP Client for more advanced and feature rich ways to complete this task.

勾选/批准的答案来自robhruska - 谢谢。这显示了最基本的方法,它很简单,了解了进行简单URL连接所需的内容。但是,长期策略是使用HTTP客户端来获得更高级和功能丰富的方法来完成此任务。

Thank you everyone, here's the quick answer again:

谢谢大家,这里是快速回答:

URL url = new URL("http://example.com");
HttpURLConnection connection = (HttpURLConnection)url.openConnection();
connection.setRequestMethod("GET");
connection.connect();

InputStream stream = connection.getInputStream();
// read the contents using an InputStreamReader

#5


0  

I have used JTidy in a project and it worked quite well. A list of other parsers is here, but besides from JTidy I don't know any of them.

我在一个项目中使用了JTidy并且运行良好。其他解析器的列表在这里,但除了JTidy,我不知道它们中的任何一个。

#1


19  

You can use HttpURLConnection in combination with URL.

您可以将HttpURLConnection与URL结合使用。

URL url = new URL("http://example.com");
HttpURLConnection connection = (HttpURLConnection)url.openConnection();
connection.setRequestMethod("GET");
connection.connect();

InputStream stream = connection.getInputStream();
// read the contents using an InputStreamReader

#2


3  

The easiest way to do a GET is to use the built in java.net.URL. However, as mentioned, httpclient is the proper way to go, as it will allow you among others to handle redirects.

执行GET最简单的方法是使用内置的java.net.URL。但是,如上所述,httpclient是正确的方法,因为它将允许您和其他人处理重定向。

For parsing the html, you can use html parser.

对于解析html,您可以使用html解析器。

#3


3  

The ticked/approved answer for this is from robhruska - thank you. This shows the most basic way to do it, it's simple with an understanding of what's necessary to do a simple URL connection. However, the longer term strategy would be to use HTTP Client for more advanced and feature rich ways to complete this task.

勾选/批准的答案来自robhruska - 谢谢。这显示了最基本的方法,它很简单,了解了进行简单URL连接所需的内容。但是,长期策略是使用HTTP客户端来获得更高级和功能丰富的方法来完成此任务。

Thank you everyone, here's the quick answer again:

谢谢大家,这里是快速回答:

URL url = new URL("http://example.com");
HttpURLConnection connection = (HttpURLConnection)url.openConnection();
connection.setRequestMethod("GET");
connection.connect();

InputStream stream = connection.getInputStream();
// read the contents using an InputStreamReader

#4


#5


0  

I have used JTidy in a project and it worked quite well. A list of other parsers is here, but besides from JTidy I don't know any of them.

我在一个项目中使用了JTidy并且运行良好。其他解析器的列表在这里,但除了JTidy,我不知道它们中的任何一个。