在R中为较大数量的值分配较少数量的值

时间:2022-10-16 09:25:44

I have two large vectors:

我有两个大向量:

A: https://dl.dropbox.com/u/22681355/A.csv
B: https://dl.dropbox.com/u/22681355/B.csv

A has over 20000 entries but only 1350 unique entries. B is a random number generated from 1 to 9 exactly 1350 times

A有超过20000个条目,但只有1350个唯一条目。 B是从1到9生成1350倍的随机数

I would like to assign values from B to A such that the same values in A get the same values in B. e.g. if there are multiple 1's each 1 should get the same number from B.

我想将B中的值分配给A,使得A中的相同值在B中获得相同的值,例如:如果有多个1,则每个1应从B获得相同的数字。

I have been using the A[B] command but after the 18000th entry I get NAs

我一直在使用A [B]命令但是在第18000次输入后我得到了NA

What is the proper way of doing this?

这样做的正确方法是什么?

code:

码:

A<-read.csv("A.csv")
B<-read.csv("B.csv")

A[B]

1 个解决方案

#1


1  

  1. read.csv() creates a data frame, not a vector.
  2. read.csv()创建一个数据框,而不是矢量。
  3. You probably mean B[A] which for each element in A gets the value of B at the index of that element's value. Since A's values range from 1 to 1899 it exceeds B's size of 1349. For those elements outside the bounds of B, NAs get introduced.
  4. 你可能意味着B [A],对于A中的每个元素,它在该元素的值的索引处获得B的值。由于A的值范围从1到1899,它超过了B的大小1349.对于那些超出B界限的元素,引入了NAs。

The correct way to doing what you want to achieve is

做你想做的事的正确方法是

A = read.table("http://dl.dropbox.com/u/22681355/A.csv")
B = read.table("http://dl.dropbox.com/u/22681355/B.csv")
A = A$V1
B = B$V1
A = as.factor(A)

B[match(A,levels(A))]

match(A,levels(A)) will return a vector of the same length as A that for each element contains the position of the element of A in its factor's levels, i.e. a number between 1 and 1350 (1350 distinct values). If A was as.factor(c(1,1,3,5,5,7)), levels(A) would be c(1,3,5,7) and match(A,levels(A)) would be c(1,1,2,3,3,4), i.e. the position of the element in it's levels.

match(A,levels(A))将返回与A相同长度的向量,其中每个元素包含A元素在其因子级别中的位置,即1到1350之间的数字(1350个不同的值)。如果A是as.factor(c(1,1,3,5,5,7)),等级(A)将是c(1,3,5,7)并且匹配(A,等级(A))将是c(1,1,2,3,3,4),即元素在其中的位置。

#1


1  

  1. read.csv() creates a data frame, not a vector.
  2. read.csv()创建一个数据框,而不是矢量。
  3. You probably mean B[A] which for each element in A gets the value of B at the index of that element's value. Since A's values range from 1 to 1899 it exceeds B's size of 1349. For those elements outside the bounds of B, NAs get introduced.
  4. 你可能意味着B [A],对于A中的每个元素,它在该元素的值的索引处获得B的值。由于A的值范围从1到1899,它超过了B的大小1349.对于那些超出B界限的元素,引入了NAs。

The correct way to doing what you want to achieve is

做你想做的事的正确方法是

A = read.table("http://dl.dropbox.com/u/22681355/A.csv")
B = read.table("http://dl.dropbox.com/u/22681355/B.csv")
A = A$V1
B = B$V1
A = as.factor(A)

B[match(A,levels(A))]

match(A,levels(A)) will return a vector of the same length as A that for each element contains the position of the element of A in its factor's levels, i.e. a number between 1 and 1350 (1350 distinct values). If A was as.factor(c(1,1,3,5,5,7)), levels(A) would be c(1,3,5,7) and match(A,levels(A)) would be c(1,1,2,3,3,4), i.e. the position of the element in it's levels.

match(A,levels(A))将返回与A相同长度的向量,其中每个元素包含A元素在其因子级别中的位置,即1到1350之间的数字(1350个不同的值)。如果A是as.factor(c(1,1,3,5,5,7)),等级(A)将是c(1,3,5,7)并且匹配(A,等级(A))将是c(1,1,2,3,3,4),即元素在其中的位置。