R:为给定列的每个唯一值创建一个顺序计数器

时间:2021-12-29 13:09:08

Let's say I have the following dataframe:

假设我有以下数据帧:

personid date measurement
1         x     23
1         x     32
2         y     21
3         x     23
3         z     23
3         y     23

I want to sort this dataframe by the measurement column, and then create a new column that is a sequence along the sorted measurement column, like so:

我想通过测量列对此数据帧进行排序,然后创建一个新列,该列是排序测量列中的序列,如下所示:

personid date measurement id
1         x     23         2
1         x     32         3
2         y     21         1
3         x     23         2
3         z     23         2
3         y     23         2

My first instinct was to do something like:

我的第一直觉是做以下事情:

unique_measurements <- data.frame(unique(sort(df$measurement)))
unique_dates$counter <- 1:nrow(unique_dates)

Now I basically have a data-frame that represents a mapping from a given measurement to the correct counter. I recognize this is the wrong way of doing this, but (1) how would I actually use this mapping to achieve my goals; (2) what is the right way of doing this?

现在我基本上有一个数据框,表示从给定测量到正确计数器的映射。我认识到这是做错的方法,但是(1)我将如何实际使用这种映射来实现我的目标; (2)这样做的正确方法是什么?

2 个解决方案

#1


2  

Using factor as an intermediate:

使用因子作为中间体:

df$id = as.integer(factor(df$measurement))

If you want to use your method, just use merge (though it might mess up the row order, use dplyr::left_join or data.table::merge instead to preserve row order in the original).

如果你想使用你的方法,只需使用合并(虽然它可能搞乱行顺序,使用dplyr :: left_join或data.table :: merge来保留原始的行顺序)。

unique_measurements <- data.frame(measurement = sort(unique(df$measurement)))
unique_dates$id <- 1:nrow(unique_dates)
merge(df, unique_dates)

#2


2  

Here's a simpler way to do this:

这是一种更简单的方法:

df$id <- match(df$measurement, sort(unique(df$measurement)))
#   personid date measurement id
# 1        1    x          23  2
# 2        1    x          32  3
# 3        2    y          21  1
# 4        3    x          23  2
# 5        3    z          23  2
# 6        3    y          23  2

#1


2  

Using factor as an intermediate:

使用因子作为中间体:

df$id = as.integer(factor(df$measurement))

If you want to use your method, just use merge (though it might mess up the row order, use dplyr::left_join or data.table::merge instead to preserve row order in the original).

如果你想使用你的方法,只需使用合并(虽然它可能搞乱行顺序,使用dplyr :: left_join或data.table :: merge来保留原始的行顺序)。

unique_measurements <- data.frame(measurement = sort(unique(df$measurement)))
unique_dates$id <- 1:nrow(unique_dates)
merge(df, unique_dates)

#2


2  

Here's a simpler way to do this:

这是一种更简单的方法:

df$id <- match(df$measurement, sort(unique(df$measurement)))
#   personid date measurement id
# 1        1    x          23  2
# 2        1    x          32  3
# 3        2    y          21  1
# 4        3    x          23  2
# 5        3    z          23  2
# 6        3    y          23  2