如何在JAVA中使用htmlparsing和curl来完成这项任务......?

时间:2022-09-06 12:04:43

I'm trying to write a program that takes company names from a text file and searches them on a search engine website (SEC's Edgar search). Each search usually comes up with 1-10 unique search result links and so I want to use curl to click on the link with the relevant company name. The link page has a brief summary with the term "state of incorporation:" and then the state name. Im hoping to parse the state name. I am having trouble understanding how to use HTML parsing and curl and their classes. I would appreciate any help possible such as a brief outline of steps or just any advice at all. Thanks.

我正在尝试编写一个程序,该程序从文本文件中获取公司名称,并在搜索引擎网站上搜索它们(SEC的Edgar搜索)。每次搜索通常会提供1-10个独特的搜索结果链接,因此我想使用curl点击具有相关公司名称的链接。链接页面有一个简短的摘要,其中包含“公司注册状态:”,然后是州名。我希望解析州名。我无法理解如何使用HTML解析和卷曲及其类。我将不胜感激任何可能的帮助,例如简要的步骤大纲或任何建议。谢谢。

1 个解决方案

#1


Assuming that the HTML is fairly basic, use something like the Mozilla Java HTML Parser. The getting started guide will give you more details on creating the DOM. Java has builtin APIs for downloading content from the web, and these will likely be sufficient for you (rather than using "curl").

假设HTML非常基础,请使用类似Mozilla Java HTML Parser的东西。入门指南将为您提供有关创建DOM的更多详细信息。 Java内置了用于从Web下载内容的API,这些对您来说可能就足够了(而不是使用“curl”)。

Once you have a DOM, you can use the standard DOM APIs to navigate for the links and items that you want.

拥有DOM后,您可以使用标准DOM API来导航所需的链接和项目。

#1


Assuming that the HTML is fairly basic, use something like the Mozilla Java HTML Parser. The getting started guide will give you more details on creating the DOM. Java has builtin APIs for downloading content from the web, and these will likely be sufficient for you (rather than using "curl").

假设HTML非常基础,请使用类似Mozilla Java HTML Parser的东西。入门指南将为您提供有关创建DOM的更多详细信息。 Java内置了用于从Web下载内容的API,这些对您来说可能就足够了(而不是使用“curl”)。

Once you have a DOM, you can use the standard DOM APIs to navigate for the links and items that you want.

拥有DOM后,您可以使用标准DOM API来导航所需的链接和项目。