从data.frame中删除整个列。

Does anyone know how to remove an entire column from a data.frame in R? For example if I am given this data.frame:

有没有人知道如何从数据中删除整个列?例如，如果给我这个数据。

> head(data)
   chr       genome region
1 chr1 hg19_refGene    CDS
2 chr1 hg19_refGene   exon
3 chr1 hg19_refGene    CDS
4 chr1 hg19_refGene   exon
5 chr1 hg19_refGene    CDS
6 chr1 hg19_refGene   exon

and I want to remove the 2nd column.

我想去掉第二列。

5 个解决方案

#1

308

You can set it to NULL.

你可以把它设为空。

> Data$genome <- NULL
> head(Data)
   chr region
1 chr1    CDS
2 chr1   exon
3 chr1    CDS
4 chr1   exon
5 chr1    CDS
6 chr1   exon

As pointed out in the comments, here are some other possibilities:

正如在评论中指出的，这里还有一些其他的可能性:

Data[2] <- NULL    # Wojciech Sobala
Data[[2]] <- NULL  # same as above
Data <- Data[,-2]  # Ian Fellows
Data <- Data[-2]   # same as above

You can remove multiple columns via:

您可以通过以下方式删除多个列:

Data[1:2] <- list(NULL)  # Marek
Data[1:2] <- NULL        # does not work!

Be careful with matrix-subsetting though, as you can end up with a vector:

但是要小心使用矩阵-subsetting，因为你最终会得到一个向量:

Data <- Data[,-(2:3)]             # vector
Data <- Data[,-(2:3),drop=FALSE]  # still a data.frame

#2

To remove one or more columns by name, when the column names are known (as opposed to being determined at run-time), I like the subset() syntax. E.g. for the data-frame

要删除一个或多个列的名称，当已知列名时(而不是在运行时被确定)，我喜欢子集()语法。如对数据帧

df <- data.frame(a=1:3, d=2:4, c=3:5, b=4:6)

to remove just the a column you could do

只需要删除一个列就可以了。

Data <- subset( Data, select = -a )

and to remove the b and d columns you could do

去掉b和d列。

Data <- subset( Data, select = -c(d, b ) )

You can remove all columns between d and b with:

你可以移除d和b之间的所有列:

Data <- subset( Data, select = -c( d : b )

As I said above, this syntax works only when the column names are known. It won't work when say the column names are determined programmatically (i.e. assigned to a variable). I'll reproduce this Warning from the ?subset documentation:

如上所述，只有在已知列名时，此语法才起作用。当使用编程方式确定列名时，它不会起作用(例如，赋值给一个变量)。我将从子集文档中复制这个警告:

Warning:

警告:

This is a convenience function intended for use interactively. For programming it is better to use the standard subsetting functions like '[', and in particular the non-standard evaluation of argument 'subset' can have unanticipated consequences.

这是一个方便的功能，旨在交互式地使用。对于编程来说，最好使用像“[”这样的标准的子设置函数，特别是对参数子集的非标准评估可能会产生意料之外的结果。

#3

The posted answers are very good when working with data.frames. However, these tasks can be pretty inefficient from a memory perspective. With large data, removing a column can take an unusually long amount of time and/or fail due to out of memory errors. Package data.table helps address this problem with the := operator:

在使用数据框时，发布的答案非常好。但是，从内存的角度来看，这些任务可能非常低效。对于大数据，删除一个列可以花费非常长的时间和/或由于内存错误而失败。包数据。table帮助解决这个问题:=操作符:

library(data.table)
> dt <- data.table(a = 1, b = 1, c = 1)
> dt[,a:=NULL]
     b c
[1,] 1 1

I should put together a bigger example to show the differences. I'll update this answer at some point with that.

我应该用一个更大的例子来说明差异。我会在某个时候更新这个答案。

#4

(For completeness) If you want to remove columns by name, you can do this:

(为了完整性)如果您想要删除列的名称，您可以这样做:

cols.dont.want <- "genome"
cols.dont.want <- c("genome", "region") # if you want to remove multiple columns

data <- data[, ! names(data) %in% cols.dont.want, drop = F]

Including drop = F ensures that the result will still be a data.frame even if only one column remains.

包括drop = F，确保结果仍然是一个数据。即使只有一个列仍然存在。

#5

With this you can remove the column and store variable into another variable.

这样，您可以将列和存储变量删除到另一个变量中。

df = subset(data, select = -c(genome) )

#1

308