在R中为较大数量的值分配较少数量的值

I have two large vectors:

我有两个大向量：

A: https://dl.dropbox.com/u/22681355/A.csv
B: https://dl.dropbox.com/u/22681355/B.csv

A has over 20000 entries but only 1350 unique entries. B is a random number generated from 1 to 9 exactly 1350 times

A有超过20000个条目，但只有1350个唯一条目。 B是从1到9生成1350倍的随机数

I would like to assign values from B to A such that the same values in A get the same values in B. e.g. if there are multiple 1's each 1 should get the same number from B.

我想将B中的值分配给A，使得A中的相同值在B中获得相同的值，例如：如果有多个1，则每个1应从B获得相同的数字。

I have been using the A[B] command but after the 18000th entry I get NAs

我一直在使用A [B]命令但是在第18000次输入后我得到了NA

What is the proper way of doing this?

这样做的正确方法是什么？

code:

码：

A<-read.csv("A.csv")
B<-read.csv("B.csv")

A[B]

1 个解决方案

#1

read.csv() creates a data frame, not a vector.
read.csv（）创建一个数据框，而不是矢量。
You probably mean B[A] which for each element in A gets the value of B at the index of that element's value. Since A's values range from 1 to 1899 it exceeds B's size of 1349. For those elements outside the bounds of B, NAs get introduced.
你可能意味着B [A]，对于A中的每个元素，它在该元素的值的索引处获得B的值。由于A的值范围从1到1899，它超过了B的大小1349.对于那些超出B界限的元素，引入了NAs。

The correct way to doing what you want to achieve is

做你想做的事的正确方法是

A = read.table("http://dl.dropbox.com/u/22681355/A.csv")
B = read.table("http://dl.dropbox.com/u/22681355/B.csv")
A = A$V1
B = B$V1
A = as.factor(A)

B[match(A,levels(A))]

match(A,levels(A)) will return a vector of the same length as A that for each element contains the position of the element of A in its factor's levels, i.e. a number between 1 and 1350 (1350 distinct values). If A was as.factor(c(1,1,3,5,5,7)), levels(A) would be c(1,3,5,7) and match(A,levels(A)) would be c(1,1,2,3,3,4), i.e. the position of the element in it's levels.

match（A，levels（A））将返回与A相同长度的向量，其中每个元素包含A元素在其因子级别中的位置，即1到1350之间的数字（1350个不同的值）。如果A是as.factor（c（1,1,3,5,5,7）），等级（A）将是c（1,3,5,7）并且匹配（A，等级（A））将是c（1,1,2,3,3,4），即元素在其中的位置。

#1

read.csv() creates a data frame, not a vector.
read.csv（）创建一个数据框，而不是矢量。
You probably mean B[A] which for each element in A gets the value of B at the index of that element's value. Since A's values range from 1 to 1899 it exceeds B's size of 1349. For those elements outside the bounds of B, NAs get introduced.
你可能意味着B [A]，对于A中的每个元素，它在该元素的值的索引处获得B的值。由于A的值范围从1到1899，它超过了B的大小1349.对于那些超出B界限的元素，引入了NAs。

The correct way to doing what you want to achieve is

做你想做的事的正确方法是

A = read.table("http://dl.dropbox.com/u/22681355/A.csv")
B = read.table("http://dl.dropbox.com/u/22681355/B.csv")
A = A$V1
B = B$V1
A = as.factor(A)

B[match(A,levels(A))]

秒客网

在R中为较大数量的值分配较少数量的值

1 个解决方案

#1

#1

相关文章