如何替换"。“在读取.csv()生成的列名中，当导出时只有一个空格?”

I am using R to do some data pre-processing, and here is the problem that I am faced with: I input the data using read.csv(filename,header=TRUE), and then the space in variable names became ".", for example, a variable named Full Code became Full.Code in the generated dataframe. After the processing, I use write.xlsx(filename) to export the results, while the variable names are changed. How to address this problem?

我正在使用R进行一些数据预处理，我面临的问题是:我使用read.csv(文件名，header=TRUE)输入数据，然后变量名中的空格变为" "例如，一个名为Full Code的变量变为Full。生成的dataframe中的代码。在处理之后，我使用write.xlsx(文件名)导出结果，同时更改变量名。如何解决这个问题?

Besides, in the output .xlsx file, the first column become indices(i.e., 1 to N), which is not what I am expecting.

此外，在输出.xlsx文件中，第一列成为索引(即1到N)，这不是我所期望的。

4 个解决方案

#1

If your set check.names=FALSE in read.csv when you read the data in then the names will not be changed and you will not need to edit them before writing the data back out. This of course means that you would need quote the column names (back quotes in some cases) or refer to the columns by location rather than name while editing.

如果您的设置检查。name =FALSE，在read中。当你读取数据时，名字将不会被改变，你也不需要在把数据写出来之前编辑它们。当然，这意味着您需要引用列名(在某些情况下是回引号)，或者在编辑时使用位置而不是名称引用列。

#2

To get spaces back in the names, do this (right before you export - R does let you have spaces in variable names, but it's a pain):

要在名称中返回空格，请执行以下操作(在导出之前- R允许在变量名称中保留空格，但这很痛苦):

# A simple regular expression to replace dots with spaces
# This might have unintended consequences, so be sure to check the results
names(yourdata) <- gsub(x = names(yourdata),
                        pattern = "\\.",
                        replacement = " ")

To drop the first-column index, just add row.names = FALSE to your write.xlsx(). That's a common argument for functions that write out data in tabular format (write.csv() has it, too).

要删除第一列索引，只需将row.names = FALSE添加到write.xlsx()中。这是用表格格式写出数据的函数的常见参数(write.csv()也有)。

#3

Here's a function (sorry, I know it could be refactored) that makes nice column names even if there are multiple consecutive dots and trailing dots:

这里有一个函数(抱歉，我知道它可以重构)，即使有多个连续的点和结尾的点，它也可以成为很好的列名:

makeColNamesUserFriendly <- function(ds) {
  # FIXME: Repetitive.

  # Convert any number of consecutive dots to a single space.
  names(ds) <- gsub(x = names(ds),
                    pattern = "(\\.)+",
                    replacement = " ")

  # Drop the trailing spaces.
  names(ds) <- gsub(x = names(ds),
                    pattern = "( )+$",
                    replacement = "")
  ds
}

Example usage:

使用示例:

ds <- makeColNamesUserFriendly(ds)

#4

Just to add to the answers already provided, here is another way of replacing the “.” or any other kind of punctation in column names by using a regex with the stringr package in the way like:

为了补充已经提供的答案，这里有另一种替换“”的方法。“或使用regex对stringr包使用类似于:

require(“stringr”)   
colnames(data) <- str_replace_all(colnames(data), "[:punct:]", " ")

For example try:

例如尝试:

data <- data.frame(variable.x = 1:10, variable.y = 21:30, variable.z = "const")

colnames(data) <- str_replace_all(colnames(data), "[:punct:]", " ")

and

和

colnames(data)

will give you

会给你

[1] "variable x" "variable y" "variable z"

#1

#2