什么是在多个数据帧上执行相同操作的最有效方法?

时间:2021-11-18 11:00:57

my apologies if this is a duplicate, i couldn't find it anywhere..

我很抱歉,如果这是一个副本,我无法在任何地方找到它..

say i've got a bunch of data frames, and i want to convert all of their column names to lowercase. what's the most efficient way to do this? it's straightforward with assign and get but i'm wondering if there's a faster way?

说我有一堆数据帧,我想将他们所有的列名转换为小写。什么是最有效的方法呢?分配并获得它是直截了当但我想知道是否有更快的方法?

if i've just got ChickWeight and mtcars, the non-dynamic operation would simply be..

如果我刚拿到ChickWeight和mtcars,那么非动态操作就是......

names( ChickWeight ) <- tolower( names( ChickWeight ) )
names( mtcars ) <- tolower( names( mtcars ) )

..and then here's how i would make this process dynamic, but i wonder if there's a more efficient solution?

..然后这是我如何使这个过程动态,但我想知道是否有更有效的解决方案?

# column headers contain uppercase
head(ChickWeight)

# start with a vector of data frame names..
# this might contain many, many data frames
tl <- c( 'ChickWeight' , 'mtcars' )

# loop through each data frame name..
for ( i in tl ){
    # save it to a temporary object name
    x <- get( i )

    # main operations here..

    # perform the operation(s) you want to run on each data frame
    names( x ) <- tolower( names( x ) )

    # ..end of main operations


    # assign the updated data frame to overwrite the original data frame
    assign( i , x )
}

# no longer contains uppercase
head(ChickWeight)

1 个解决方案

#1


1  

I don't think you're likely to gain a whole lot of speed by changing approaches. A more idiomatic way to do this would be to store all of your data frames in a list and use something like `

我不认为通过改变方法你可能会获得很大的速度。更为惯用的方法是将所有数据帧存储在列表中并使用类似`的东西

dlist <- list(mtcars,ChickWeight)

(or)

namevec <- c("mtcars","ChickWeight")
dlist <- lapply(namevec,get)

then:

dlist <- lapply(dlist,function(x) setNames(x,tolower(names(x))))

... but of course in order to use this approach you have to commit to referring to the data frames as list elements, which in turn affects the whole structure of your analysis. If you don't want to do that then I don't see anything much better than your get/assign approach.

...但当然为了使用这种方法,您必须承诺将数据框称为列表元素,这反过来会影响分析的整个结构。如果你不想这样做那么我没有看到比你的获取/分配方法更好的东西。

If you want to assign the values of the list back to the global environment you can do:

如果要将列表的值分配回全局环境,可以执行以下操作:

invisible(mapply(assign,namevec,dlist,MoreArgs=list(envir=.GlobalEnv)))

I want to emphasize that this is not necessarily faster or more transparent than the simple approach presented in the original post.

我想强调的是,这不一定比原帖中提出的简单方法更快或更透明。

#1


1  

I don't think you're likely to gain a whole lot of speed by changing approaches. A more idiomatic way to do this would be to store all of your data frames in a list and use something like `

我不认为通过改变方法你可能会获得很大的速度。更为惯用的方法是将所有数据帧存储在列表中并使用类似`的东西

dlist <- list(mtcars,ChickWeight)

(or)

namevec <- c("mtcars","ChickWeight")
dlist <- lapply(namevec,get)

then:

dlist <- lapply(dlist,function(x) setNames(x,tolower(names(x))))

... but of course in order to use this approach you have to commit to referring to the data frames as list elements, which in turn affects the whole structure of your analysis. If you don't want to do that then I don't see anything much better than your get/assign approach.

...但当然为了使用这种方法,您必须承诺将数据框称为列表元素,这反过来会影响分析的整个结构。如果你不想这样做那么我没有看到比你的获取/分配方法更好的东西。

If you want to assign the values of the list back to the global environment you can do:

如果要将列表的值分配回全局环境,可以执行以下操作:

invisible(mapply(assign,namevec,dlist,MoreArgs=list(envir=.GlobalEnv)))

I want to emphasize that this is not necessarily faster or more transparent than the simple approach presented in the original post.

我想强调的是,这不一定比原帖中提出的简单方法更快或更透明。