I am trying to subset data, using names of work and test set
我正在尝试使用工作和测试集的名称来对数据进行子集化
ws_data <- subset(data, grepl(paste0("v*[0-9]_",ws_names, collapse="|" ),
rownames(data))==TRUE)
It seems to work ok, but for the rownames like
它似乎工作正常,但对于像这样的rownames
"(Difluoromethoxy)trifluoromethane"
are just skipped. Are parenthese allowed as legal names in R? How can I solve this problem not changing row names? Thanks in advance!
刚跳过。是否允许括号作为R中的合法名称?如何在不更改行名的情况下解决此问题?提前致谢!
The example of data
数据的例子
64 | v0064_(Chloro)(trifluor)omethane | -51.5 | 510.9 | 104.5 | 11.2 |
65 | v0067_(Dichloro)difluoromethane | -81.0 | 233.0 | 121.0 | 16.1 |
64 | v0064_(氯)(三氟)甲烷| -51.5 | 510.9 | 104.5 | 11.2 | 65 | v0067_(二氯)二氟甲烷| -81.0 | 233.0 | 121.0 | 16.1 |
Regular expressions
常用表达
rownames(ts)[1]
[1] "Bromotrifluoromethane"rownames(ts)[1] [1]“Bromotrifluoromethane”
rownames(data)[1]
[1] "v0001_Bromotrifluoromethane"rownames(数据)[1] [1]“v0001_Bromotrifluoromethane”
grepl("v[0-9]*_Bromotrifluoromethane", rownames(data)[1])
[1] TRUEgrepl(“v [0-9] * _ Bromotrifluoromethane”,rownames(数据)[1])[1] TRUE
grepl("v*[0-9]_Bromotrifluoromethane", rownames(data)[1])
[1] TRUEgrepl(“v * [0-9] _Bromotrifluoromethane”,rownames(数据)[1])[1] TRUE
2 个解决方案
#1
1
I'm guessing the problem you're facing is the fact that the parentheses have a meaning in regular expressions. This post has a cure for that, which you can use to do something like this:
我猜你所面临的问题是括号在正则表达式中有意义。这篇文章有一个治愈方法,你可以用来做这样的事情:
quotemeta <- function(x) gsub("([^A-Za-z_0-9])", "\\\\\\1", x)
data[grepl(paste0("^v[0-9]*_", quotemeta(ws_names), collapse="|"), rownames(data)), ]
#2
2
In general you can have rownames with characters like that in names and rownames, you just need to quote them when using them. I think the problem here is the subset
function, it allows some unusual ways to specify the subset which makes some things easier, but others harder. It is trying to figure out what you mean by the rownames (rather than just take them as literal strings) and the parentheses are probably confusing that process.
一般来说,你可以在名字和rownames中使用带有字符的rownames,你只需要在使用它们时引用它们。我认为这里的问题是子集函数,它允许一些不寻常的方法来指定使某些事情更容易的子集,但其他更难。它试图找出rownames的意思(而不是仅仅将它们作为文字字符串),并且括号可能会混淆该过程。
Try something like:
尝试以下方法:
data[ grepl( paste0("v*[0-9]_",ws_names, collapse="|" ), rownames(data)), ]
You may also be able to simplify this using %in%
if you can construct the list of names.
如果可以构造名称列表,也可以使用%in%来简化此操作。
Also see fortune(69)
, the ==TRUE is redundant and slightly less useful than adding 0 or multiplying by 1.
另见fortune(69),== TRUE是多余的,比添加0或乘以1稍微有用。
#1
1
I'm guessing the problem you're facing is the fact that the parentheses have a meaning in regular expressions. This post has a cure for that, which you can use to do something like this:
我猜你所面临的问题是括号在正则表达式中有意义。这篇文章有一个治愈方法,你可以用来做这样的事情:
quotemeta <- function(x) gsub("([^A-Za-z_0-9])", "\\\\\\1", x)
data[grepl(paste0("^v[0-9]*_", quotemeta(ws_names), collapse="|"), rownames(data)), ]
#2
2
In general you can have rownames with characters like that in names and rownames, you just need to quote them when using them. I think the problem here is the subset
function, it allows some unusual ways to specify the subset which makes some things easier, but others harder. It is trying to figure out what you mean by the rownames (rather than just take them as literal strings) and the parentheses are probably confusing that process.
一般来说,你可以在名字和rownames中使用带有字符的rownames,你只需要在使用它们时引用它们。我认为这里的问题是子集函数,它允许一些不寻常的方法来指定使某些事情更容易的子集,但其他更难。它试图找出rownames的意思(而不是仅仅将它们作为文字字符串),并且括号可能会混淆该过程。
Try something like:
尝试以下方法:
data[ grepl( paste0("v*[0-9]_",ws_names, collapse="|" ), rownames(data)), ]
You may also be able to simplify this using %in%
if you can construct the list of names.
如果可以构造名称列表,也可以使用%in%来简化此操作。
Also see fortune(69)
, the ==TRUE is redundant and slightly less useful than adding 0 or multiplying by 1.
另见fortune(69),== TRUE是多余的,比添加0或乘以1稍微有用。