Java - 使用RegEx从HTML-String中提取信息

时间:2022-09-13 11:11:10

I am trying to extract informations out of the HTML-Code of a Youtube Playlist page. (Playlist Name, Video Names, Video Links)

我试图从Youtube播放列表页面的HTML代码中提取信息。 (播放列表名称,视频名称,视频链接)

I know it is bad practice to use Regex but since this programm is just for personal use and I only read in 1 line per video in the playlist it doesn't need to be very sophisticated.

我知道使用正则表达式是不好的做法,但由于这个程序仅供个人使用,我只在播放列表中每个视频读取1行,因此不需要非常复杂。

Like I said per video there is basicly only 1 line I need.

就像我说的每个视频基本上我只需要一行。

Example:

<tr class="pl-video yt-uix-tile " data-video-id="VIDEO-ID" data-set-video-id="" data-title="TITLE"><td class="pl-video-handle "></td><td class="pl-video-index"></td><td class="pl-video-thumbnail"><a href="reflink inside palylist" class="ux-thumb-wrap yt-uix-sessionlink contains-addto pl-video-thumb"  data-sessionlink="sessionlink">    <span class="video-thumb  yt-thumb yt-thumb-72"

The only 2 information I basicly need are VIDEO-ID and TITLE. My RegEx pattern looks like this so far:

我基本上需要的唯一2个信息是VIDEO-ID和TITLE。到目前为止,我的RegEx模式看起来像这样:

Pattern pLine = Pattern.compile("<tr class=\"(?<line>.*)");

He finds exactly the lines I need but every attempt from me to get only TITLE and VIDEO-ID got me no results :/

他找到了我需要的线条,但是每次尝试只获得TITLE和VIDEO-ID都没有结果:/

I'm sorry if this is a trivial question or one that shouldn't be asked here. But that is my situation so far. And no this is NO homework ;)

如果这是一个微不足道的问题,或者不应该在这里提出问题,我很抱歉。但到目前为止,这是我的情况。这不是没有作业;)

2 个解决方案

#1


3  

.*?data-video-id="(.*?)".*?data-title="(.*?)"

This should do it.Extract match 1 and match 2.

这应该这样做。提取匹配1并匹配2。

See demo.

http://regex101.com/r/lK9zP6/4

#2


1  

Using the following expressions matches the video-id and title fine in your given example.

使用以下表达式匹配给定示例中的视频ID和标题。

ID: "data-video-id=\"([^\"]+)\""

Title: "data-title=\"([^\"]+)\""

#1


3  

.*?data-video-id="(.*?)".*?data-title="(.*?)"

This should do it.Extract match 1 and match 2.

这应该这样做。提取匹配1并匹配2。

See demo.

http://regex101.com/r/lK9zP6/4

#2


1  

Using the following expressions matches the video-id and title fine in your given example.

使用以下表达式匹配给定示例中的视频ID和标题。

ID: "data-video-id=\"([^\"]+)\""

Title: "data-title=\"([^\"]+)\""