如何创建数据。基于条件的附加行

时间:2021-10-12 09:11:02

I would like to create a table out of another table (data.table) that has additional rows based on a condition. Lets say in the following table, I want to create an additional row if length(indicator)>2. The result should be the table below.

我希望从另一个表(data.table)中创建一个表,该表根据条件具有其他行。假设在下表中,我想创建一个额外的行if长度(指标)>2。结果应该是下面的表格。

The source table looks like this:

源表如下:

    id  indicator
1   123 abc
2   456 NA
3   456 NA
4   456 NA
5   123 abcd
6   789 abc
dt1 <- data.table(id=c(123, 456, 456, 456, 123, 789), indicator = c("abc", NA, NA, NA, "abcd", "abc"))

Resulting table should look like this:

结果表应该是这样的:

    id  indicator
1   123 abc
2   123 abc2
3   456 NA
4   456 NA
5   456 NA
6   123 abcd
7   123 abcd2
8   789 abc
9   789 abc2
dt2 <- data.table(id=c(123,123, 456, 456, 456, 123,123,789, 789), indicator = c("abc", "abc2", NA, NA, NA, "abcd", "abcd2", "abc", "abc2"))

1 个解决方案

#1


3  

EDIT: cleaner version courtesy Arun (note there is a key argument added to the data.table creation):

编辑:更简洁的版本(注意有一个关键参数添加到数据中。表创建):

dt1 <- data.table(
  id=c(123, 456, 456, 456, 123, 789), 
  indicator = c("abc", NA, NA, NA, "abcd", "abc"), 
  key=c("id", "indicator")
)                    
dt1[, 
  list(indicator=
    if(nchar(indicator) > 2)
      paste0(indicator, c("", 2:(max(2, .N))))
    else 
      rep(indicator, .N)
    ),
  by=list(indicator, id)
][, -1, with=F]
#     id indicator
# 1: 123       abc
# 2: 123      abc2
# 3: 123      abcd
# 4: 123     abcd2
# 5: 456        NA
# 6: 456        NA
# 7: 456        NA
# 8: 789       abc
# 9: 789      abc2                    

Old version

旧版本

There probably is a more elegant way, but this will do it. Basically, you rbind the rows that don't meet your condition, with those that do, modified by appending the numeric modifier (or "" for the first one). Note, if you have non-unique id/indicators, this will just add another numeric modifier (i.e. 123-abc, 123-abc, ends up as 123-abc, 123-abc2, 123-abc3).

可能有一种更优雅的方式,但这将实现它。基本上,您将不满足条件的行与满足条件的行绑定,通过添加数字修饰符(或第一个的“”)进行修改。注意,如果您有非唯一的id/指示器,这将添加另一个数字修饰符(即123-abc, 123-abc,以123-abc、123-abc2、123-abc3 -abc3结尾)。

dt1 <- data.table(id=c(123, 456, 456, 456, 123, 789), indicator = c("abc", NA, NA, NA, "abcd", "abc"))                    
rbind(
  dt1[nchar(indicator) <= 2 | is.na(indicator)],
  dt1[
    nchar(indicator) > 2, 
    list(indicator=paste0(indicator, c("", 2:(max(2, .N))))), 
    by=list(indicator, id)
  ][, -1, with=F]
)[order(id, indicator)]
#     id indicator
# 1: 123       abc
# 2: 123      abc2
# 3: 123      abcd
# 4: 123     abcd2
# 5: 456        NA
# 6: 456        NA
# 7: 456        NA
# 8: 789       abc
# 9: 789      abc2                    

#1


3  

EDIT: cleaner version courtesy Arun (note there is a key argument added to the data.table creation):

编辑:更简洁的版本(注意有一个关键参数添加到数据中。表创建):

dt1 <- data.table(
  id=c(123, 456, 456, 456, 123, 789), 
  indicator = c("abc", NA, NA, NA, "abcd", "abc"), 
  key=c("id", "indicator")
)                    
dt1[, 
  list(indicator=
    if(nchar(indicator) > 2)
      paste0(indicator, c("", 2:(max(2, .N))))
    else 
      rep(indicator, .N)
    ),
  by=list(indicator, id)
][, -1, with=F]
#     id indicator
# 1: 123       abc
# 2: 123      abc2
# 3: 123      abcd
# 4: 123     abcd2
# 5: 456        NA
# 6: 456        NA
# 7: 456        NA
# 8: 789       abc
# 9: 789      abc2                    

Old version

旧版本

There probably is a more elegant way, but this will do it. Basically, you rbind the rows that don't meet your condition, with those that do, modified by appending the numeric modifier (or "" for the first one). Note, if you have non-unique id/indicators, this will just add another numeric modifier (i.e. 123-abc, 123-abc, ends up as 123-abc, 123-abc2, 123-abc3).

可能有一种更优雅的方式,但这将实现它。基本上,您将不满足条件的行与满足条件的行绑定,通过添加数字修饰符(或第一个的“”)进行修改。注意,如果您有非唯一的id/指示器,这将添加另一个数字修饰符(即123-abc, 123-abc,以123-abc、123-abc2、123-abc3 -abc3结尾)。

dt1 <- data.table(id=c(123, 456, 456, 456, 123, 789), indicator = c("abc", NA, NA, NA, "abcd", "abc"))                    
rbind(
  dt1[nchar(indicator) <= 2 | is.na(indicator)],
  dt1[
    nchar(indicator) > 2, 
    list(indicator=paste0(indicator, c("", 2:(max(2, .N))))), 
    by=list(indicator, id)
  ][, -1, with=F]
)[order(id, indicator)]
#     id indicator
# 1: 123       abc
# 2: 123      abc2
# 3: 123      abcd
# 4: 123     abcd2
# 5: 456        NA
# 6: 456        NA
# 7: 456        NA
# 8: 789       abc
# 9: 789      abc2