使用函数在R data.frame中添加列

时间:2022-12-25 22:50:43

I am trying to write a function in R which lumps species columns together within a data.frame.

我试图在R中编写一个函数,它将物种列集中在data.frame中。

(To elaborate a bit on what I'm doing...I have a data frame with multiple plant species for multiple sites and years. Some of the species were misidentified, so I'd like to group to a more general level (e.g. spp a and spp b were mixed up throughout the years; so I'd like to create a new column called spp.ab in which the data for spp a and b are lumped together)).

(详细说明我正在做的事情......我有一个多个植物种类的数据框,用于多个地点和多年。有些物种被误认,所以我想分组到更一般的水平(例如spp a和spp b多年来混淆了;所以我想创建一个名为spp.ab的新列,其中spp a和b的数据被集中在一起))。

Example:

例:

spp.a spp.b
  1     0
  2     3
  0     4
  3     2
  4     5

I'd like to eventually end up with a single column that displays the maximum from value from the two species:

我想最终得到一个列,显示两个物种的最大值:

spp.ab
  1
  3
  4
  3
  5

I've started writing a function which does this; however, I'm having troubling adding the new column to my data set and dropping the old ones. Could someone tell me what's wrong with my code?

我已经开始编写一个函数来执行此操作;但是,我很难将新列添加到我的数据集并删除旧列。有人能告诉我我的代码有什么问题吗?

lump <- function(db, spp.list, new.spp) { #input spp.list as c('spp.a', 'spp.b', ...)
  mini.db <- subset(db, select=spp.list);
  newcol <- as.vector(apply(mini.db, 1, max, na.rm=T));
  db$new.spp <- newcol
  db <- db[,names(db) %in% spp.list]
  return(db)
}

When I call the function as such

当我这样调用函数时

test <- lump(db, c('spp.a', 'spp.b'), spp.ab)
test

all that pops up is the mini.db. Am I missing something with return()?

弹出的就是mini.db.我错过了return()的东西吗?

For reference, db is the database, spp.list is the species I want to lump together, and new.spp is what I would like the new column named.

作为参考,db是数据库,spp.list是我想要混在一起的物种,new.spp是我想要的新列命名。

Thanks for any help,
Paul

谢谢你的帮助,保罗

2 个解决方案

#1


1  

I've figured it out...stupid mistake, of course. Here is the code that works:

当然,我已经弄清楚了......愚蠢的错误。这是有效的代码:

lump <- function(db, spp.list, new.spp) { #input spp.list as a c('spp.a', 'spp.b', ...), and new.spp must be in quotes (e.g. 'new.spp')
    mini.db <- subset(db, select=spp.list);
    newcol <- as.vector(apply(mini.db, 1, max, na.rm=T));
    newcol[newcol==-Inf] <- NA;
    db[new.spp] <- newcol;
    db <- db[, !names(db) %in% spp.list];
    return(as.data.frame(db));
 }

The key is in the db[new.spp] <- newcol; line. Apparently using this works, but using db$new.spp <- newcol does not. I also then added a ! to the line db <- db[,!names(db) %in% spp.list]. This was my biggest mistake.

关键是在db [new.spp] < - newcol;线。显然使用这个工作,但使用db $ new.spp < - newcol没有。我还加了一个!到db < - db [,!name(db)%in%spp.list]中的行。这是我最大的错误。

#2


0  

While it seems like you've found your answer, I would suggest, instead, the pmax function:

虽然看起来你找到了答案,但我建议使用pmax函数:

> with(db, pmax(spp.a, spp.b))
[1] 1 3 4 3 5

You can use this with within or transform to mimic your function:

您可以在内部使用它或转换来模仿您的功能:

out <- within(db, spp.ab <- pmax(spp.a, spp.b))
out
#   spp.a spp.b spp.ab
# 1     1     0      1
# 2     2     3      3
# 3     0     4      4
# 4     3     2      3
# 5     4     5      5

#1


1  

I've figured it out...stupid mistake, of course. Here is the code that works:

当然,我已经弄清楚了......愚蠢的错误。这是有效的代码:

lump <- function(db, spp.list, new.spp) { #input spp.list as a c('spp.a', 'spp.b', ...), and new.spp must be in quotes (e.g. 'new.spp')
    mini.db <- subset(db, select=spp.list);
    newcol <- as.vector(apply(mini.db, 1, max, na.rm=T));
    newcol[newcol==-Inf] <- NA;
    db[new.spp] <- newcol;
    db <- db[, !names(db) %in% spp.list];
    return(as.data.frame(db));
 }

The key is in the db[new.spp] <- newcol; line. Apparently using this works, but using db$new.spp <- newcol does not. I also then added a ! to the line db <- db[,!names(db) %in% spp.list]. This was my biggest mistake.

关键是在db [new.spp] < - newcol;线。显然使用这个工作,但使用db $ new.spp < - newcol没有。我还加了一个!到db < - db [,!name(db)%in%spp.list]中的行。这是我最大的错误。

#2


0  

While it seems like you've found your answer, I would suggest, instead, the pmax function:

虽然看起来你找到了答案,但我建议使用pmax函数:

> with(db, pmax(spp.a, spp.b))
[1] 1 3 4 3 5

You can use this with within or transform to mimic your function:

您可以在内部使用它或转换来模仿您的功能:

out <- within(db, spp.ab <- pmax(spp.a, spp.b))
out
#   spp.a spp.b spp.ab
# 1     1     0      1
# 2     2     3      3
# 3     0     4      4
# 4     3     2      3
# 5     4     5      5