当div类对于相同属性不同时,使用R来刮取数据

时间:2022-06-09 08:53:19

I am trying to extract a full description from a website with a div class as such:

我试图从具有div类的网站中提取完整的描述:

当div类对于相同属性不同时,使用R来刮取数据

This div class changes for data that would go in the same column. I am using the following r code for other div classes that do not change:

此div类更改将在同一列中的数据。我正在使用以下r代码用于其他不更改的div类:

#get the beer IBU
num_ibu <- html_nodes(webpage, ".ibu")
num_ibu <- as.character(html_text(num_ibu))

My question is how do I modify this code to find a div class like '.desc-full'? I have tried full_desc <- html_nodes(webpage, ".desc-full*") only to receive the following error:

我的问题是如何修改此代码以找到像'.desc-full'这样的div类?我尝试过full_desc < - html_nodes(网页,“。desc-full *”)只是为了收到以下错误:

Error in parse_simple_selector(stream) : 
  Expected selector, got <DELIM '*' at 11>

I seem to be having a difficult time finding a like command that works in html_nodes. Is this a case where I should use regex? That feels like overkill.

我似乎很难找到一个在html_nodes中工作的命令。这是我应该使用正则表达式的情况吗?这感觉有点矫枉过正。

1 个解决方案

#1


0  

webpage <- "https://untappd.com/beer/top_rated?country_id=86"

sess <- html_session(webpage)
all_desc_nodes <- html_nodes(sess, ".desc")
full_desc_nodes <- all_desc[grep("desc-full", all_desc_nodes)]
full_desc_text <- html_text(full_desc_nodes)

#1


0  

webpage <- "https://untappd.com/beer/top_rated?country_id=86"

sess <- html_session(webpage)
all_desc_nodes <- html_nodes(sess, ".desc")
full_desc_nodes <- all_desc[grep("desc-full", all_desc_nodes)]
full_desc_text <- html_text(full_desc_nodes)