如何将具有最高频率的类分配给R中data.frame的每一行?

时间:2022-12-30 22:54:29

i have the following table:

我有下表:

mymatrix <- matrix(c(34,11,65,32,12,9,32,90,21,51,45,23), ncol=3)
colnames(mymatrix) <- c("pos", "neg", "neutr") # class
rownames(mymatrix) <- c("1 -1 0", "-1 -1 0", "0 -1 1", "0 0 1") # patterns
mytable <- as.table(mymatrix)
mytable

#         pos neg neutr
# 1 -1 0   34  12    21
# -1 -1 0  11   9    51
# 0 -1 1   65  32    45
# 0 0 1    32  90    23

now i have new data with three columns. each row contains one of the patterns "1 -1 0", "-1 -1 0", "0 -1 1" and "0 0 1". so for example, my new data looks like this:

现在我有三列新数据。每行包含模式“1 -1 0”,“ - 1 0 0”,“0 -1 1”和“0 0 1”中的一个。例如,我的新数据如下所示:

one <- c(  1,  1,  0, -1, 0,  1, 1)
two <- c( -1, -1, -1, -1, 0, -1, -1)
three <- c(0,  0,  1,  0, 1,  0, 0)
mydf <- data.frame(one, two, three)
mydf

#   one two three
# 1   1  -1     0
# 2   1  -1     0
# 3   0  -1     1
# 4  -1  -1     0
# 5   0   0     1
# 6   1  -1     0
# 7   1  -1     0

now i want to get a fourth column in mydf that assigns the class (pos, neg, neutr) to each row in mydf. the class with the highest frequency should be assigned.

现在我想在mydf中获得第四列,将类(pos,neg,neutr)分配给mydf中的每一行。应指定频率最高的班级。

it should look like this:

它应该是这样的:

#   one two three    four
# 1   1  -1     0    pos  # (because for this pattern (1 1 -1), "pos" gets highest frequency in mytable.)
# 2   1  -1     0    pos
# 3   0  -1     1    pos
# 4  -1  -1     0    neutr
# 5   0   0     1    neg
# 6   1  -1     0    pos
# 7   1  -1     0    pos

how can i do that?

我怎样才能做到这一点?

thank you!

1 个解决方案

#1


1  

In the first step you could learn the mapping from triple to label, and then you could look up the mapped value for each row of mydf:

在第一步中,您可以学习从三元组到标签的映射,然后您可以查找mydf的每一行的映射值:

maxes = apply(mytable, 1, function(x) colnames(mytable)[which.max(x)])
mydf$four = maxes[match(paste(mydf$one, mydf$two, mydf$three), rownames(mytable))]
mydf
# mydf
#   one two three  four
# 1   1  -1     0   pos
# 2   1  -1     0   pos
# 3   0  -1     1   pos
# 4  -1  -1     0 neutr
# 5   0   0     1   neg
# 6   1  -1     0   pos
# 7   1  -1     0   pos

#1


1  

In the first step you could learn the mapping from triple to label, and then you could look up the mapped value for each row of mydf:

在第一步中,您可以学习从三元组到标签的映射,然后您可以查找mydf的每一行的映射值:

maxes = apply(mytable, 1, function(x) colnames(mytable)[which.max(x)])
mydf$four = maxes[match(paste(mydf$one, mydf$two, mydf$three), rownames(mytable))]
mydf
# mydf
#   one two three  four
# 1   1  -1     0   pos
# 2   1  -1     0   pos
# 3   0  -1     1   pos
# 4  -1  -1     0 neutr
# 5   0   0     1   neg
# 6   1  -1     0   pos
# 7   1  -1     0   pos