我如何使用R来确定一个特定的数字是kmeans的一部分?

时间:2020-12-08 04:02:36

This is my vector before kmeans -

这是kmeans -之前的向量

> sort(table(mydata))
mydata
23  7  9  4 10  3  5  8  2  1 
 1  3  3  4  5  6  6  6  7  9

km <- kmeans(mydata, centers = 10) 

After kmeans -

后kmeans -

> sort(table(km$cluster))
km$cluster
 1  6  7  3  5  2  4 10  8  9 
 1  3  3  4  5  6  6  6  7  9 

Clearly, all my 1s are stored in cluster 9, all 2s are stored in Cluster 8 and so on.

显然,所有的1s都存储在集群9中,所有的2s都存储在集群8中等等。

Can I find using R which cluster a particular number belongs to? Say, finding which cluster my 1s are in?

我能用R找到一个特定的数属于哪个簇吗?比如说,找到我的1s星系团在哪?

1 个解决方案

#1


4  

The values for $cluster are returned in the same order as your original data.

$cluster的值以与原始数据相同的顺序返回。

mydata <- rep(c(23,7,9,4,10,3,5,8,2,1), c(1,3,3,4,5,6,6,6,7,9))
sort(table(mydata))
# mydata
# 23  7  9  4 10  3  5  8  2  1 
#  1  3  3  4  5  6  6  6  7  9 

km <- kmeans(mydata, centers = 10) 
unique(cbind(value=mydata, clust=km$cluster))
#       value clust
#  [1,]    23     9
#  [2,]     7     5
#  [3,]     9     7
#  [4,]     4     4
#  [5,]    10     1
#  [6,]     3    10
#  [7,]     5     2
#  [8,]     8     8
#  [9,]     2     6
# [10,]     1     3

Here i've just re-joined the two with cbind and used unique to eliminate all the dups since you have such discrete data.

这里,我用cbind重新加入了这两个函数,并使用unique来消除所有dup,因为您有如此离散的数据。

#1


4  

The values for $cluster are returned in the same order as your original data.

$cluster的值以与原始数据相同的顺序返回。

mydata <- rep(c(23,7,9,4,10,3,5,8,2,1), c(1,3,3,4,5,6,6,6,7,9))
sort(table(mydata))
# mydata
# 23  7  9  4 10  3  5  8  2  1 
#  1  3  3  4  5  6  6  6  7  9 

km <- kmeans(mydata, centers = 10) 
unique(cbind(value=mydata, clust=km$cluster))
#       value clust
#  [1,]    23     9
#  [2,]     7     5
#  [3,]     9     7
#  [4,]     4     4
#  [5,]    10     1
#  [6,]     3    10
#  [7,]     5     2
#  [8,]     8     8
#  [9,]     2     6
# [10,]     1     3

Here i've just re-joined the two with cbind and used unique to eliminate all the dups since you have such discrete data.

这里,我用cbind重新加入了这两个函数,并使用unique来消除所有dup,因为您有如此离散的数据。