将Excel文件导入R、xlsx或xls

时间:2023-01-15 11:36:31

Please can someone help me on the best way to import an excel 2007 (.xlsx) file into R. I have tried several methods and none seems to work. I have upgraded to 2.13.1, windows XP, xlsx 0.3.0, I don't know why the error keeps coming up. I tried:

请帮助我将excel 2007 (.xlsx)文件导入r的最好方法。我已经升级到2.13.1,windows XP, xlsx 0.3.0,我不知道为什么错误不断出现。我试着:

AB<-read.xlsx("C:/AB_DNA_Tag_Numbers.xlsx","DNA_Tag_Numbers")

OR

AB<-read.xlsx("C:/AB_DNA_Tag_Numbers.xlsx",1)

but I get the error:

但我得到了一个错误:

 Error in .jnew("java/io/FileInputStream", file) : 
  java.io.FileNotFoundException: C:\AB_DNA_Tag_Numbers.xlsx (The system cannot find the file specified)

Thank you.

谢谢你!

14 个解决方案

#1


72  

For a solution that is free of fiddly external dependencies*, there is now readxl:

对于一个没有复杂外部依赖的解决方案*,现在有readxl:

The readxl package makes it easy to get data out of Excel and into R. Compared to many of the existing packages (e.g. gdata, xlsx, xlsReadWrite) readxl has no external dependencies so it's easy to install and use on all operating systems. It is designed to work with tabular data stored in a single sheet.

与许多现有的包(例如gdata、xlsx、xlsReadWrite)相比,readxl包使从Excel中获取数据和将数据输入r变得更加容易。它被设计用于处理存储在单张表中的表格数据。

Readxl supports both the legacy .xls format and the modern xml-based .xlsx format. .xls support is made possible the with libxls C library, which abstracts away many of the complexities of the underlying binary format. To parse .xlsx, we use the RapidXML C++ library.

Readxl支持遗留的.xls格式和现代基于xml的.xlsx格式。要解析.xlsx,我们使用RapidXML c++库。

It can be installed like so:

可以这样安装:

install.packages("readxl") # CRAN version

or

devtools::install_github("hadley/readxl") # development version

Usage

使用

library(readxl)

# read_excel reads both xls and xlsx files
read_excel("my-old-spreadsheet.xls")
read_excel("my-new-spreadsheet.xlsx")

# Specify sheet with a number or name
read_excel("my-spreadsheet.xls", sheet = "data")
read_excel("my-spreadsheet.xls", sheet = 2)

# If NAs are represented by something other than blank cells,
# set the na argument
read_excel("my-spreadsheet.xls", na = "NA")

* not strictly true, it requires the Rcpp package, which in turn requires Rtools (for Windows) or Xcode (for OSX), which are dependencies external to R. But they don't require any fiddling with paths, etc., so that's an advantage over Java and Perl dependencies.

*严格来说并不正确,它需要Rcpp包,而Rtools(用于Windows)或Xcode(用于OSX)是r外部的依赖项,但它们不需要对路径进行任何修改,等等。

Update There is now the rexcel package. This promises to get Excel formatting, functions and many other kinds of information from the Excel file and into R.

更新现在是rexcel包。它承诺从Excel文件和R中获取Excel格式、函数和许多其他类型的信息。

#2


33  

You may also want to try the XLConnect package. I've had better luck with it than xlsx (plus it can read .xls files too).

您可能还想尝试XLConnect包。与xlsx相比,我的运气更好(另外它也可以读取.xls文件)。

library(XLConnect)
theData <- readWorksheet(loadWorkbook("C:/AB_DNA_Tag_Numbers.xlsx"),sheet=1)

also, if you are having trouble with your file not being found, try selecting it with file.choose().

另外,如果您的文件没有找到,请尝试使用file.choose()选择它。

#3


19  

I would definitely try the read.xls function in the gdata package, which is considerably more mature than the xlsx package. It may require Perl ...

我一定会尝试阅读。xls函数在gdata包中,它比xlsx包成熟得多。它可能需要Perl……

#4


18  

Update

As the Answer below is now somewhat outdated, I'd just draw attention to the readxl package. If the Excel sheet is well formatted/lain out then I would now use readxl to read from the workbook. If sheets are poorly formatted/lain out then I would still export to CSV and then handle the problems in R either via read.csv() or plain old readLines().

由于下面的答案现在有点过时了,我只想提请大家注意readxl包。如果Excel表的格式很好,那么我现在将使用readxl从工作簿中读取。如果表的格式很糟糕,那么我仍然会导出到CSV,然后通过read.csv()或普通的readLines()处理R中的问题。

Original

My preferred way is to save individual Excel sheets in comma separated value (CSV) files. On Windows, these files are associated with Excel so you don't loose the double-click-open-in-Excel "feature".

我的首选方法是在逗号分隔值(CSV)文件中保存单个Excel表。在Windows上,这些文件与Excel相关联,因此您不会失去双击打开Excel的“特性”。

CSV files can be read into R using read.csv(), or, if you are in a location or using a computer set up with some European settings (where , is used as the decimal place), using read.csv2().

CSV文件可以使用read.csv()来读到R,或者,如果您在某个位置,或者使用一些欧洲设置的计算机(其中,用作小数点),请使用read.csv2()。

These functions have sensible defaults that makes reading appropriately formatted files simple. Just keep any labels for samples or variables in the first row or column.

这些函数具有合理的默认值,使读取适当格式的文件变得简单。在第一行或列中保留任何样本或变量的标签。

Added benefits of storing files in CSV are that as the files are plain text they can be passed around very easily and you can be confident they will open anywhere; one doesn't need Excel to look at or edit the data.

在CSV中存储文件的额外好处是,由于文件是纯文本,所以可以很容易地传递它们,并且您可以确信它们将在任何地方打开;不需要Excel来查看或编辑数据。

#5


17  

Example 2012:

2012年的例子:

library("xlsx")
FirstTable <- read.xlsx("MyExcelFile.xlsx", 1 , stringsAsFactors=F)
SecondTable <- read.xlsx("MyExcelFile.xlsx", 2 , stringsAsFactors=F)
  • I would try 'xlsx' package for it is easy to handle and seems mature enough
  • 我想试试xlsx包,因为它很容易处理,看起来也足够成熟
  • worked fine for me and did not need any additionals like Perl or whatever
  • 我工作得很好,不需要Perl之类的其他工具

Example 2015:

2015年的例子:

library("readxl")
FirstTable  <- read_excel("MyExcelFile.xlsx", 1)
SecondTable <- read_excel("MyExcelFile.xlsx", 2)
  • nowadays I use readxl and have made good experience with it.
  • 现在我使用的是readxl,并且已经有了很好的经验。
  • no extra stuff needed
  • 不需要额外的东西
  • good performance
  • 良好的性能

#6


12  

This new package looks nice http://cran.r-project.org/web/packages/openxlsx/openxlsx.pdf It doesn't require rJava and is using 'Rcpp' for speed.

这个新包看起来不错,http://cran.r-project.org/web/packages/openxlsx/openxlsx.pdf它不需要rJava,并且使用“Rcpp”作为速度。

#7


3  

I recently discovered Schaun Wheeler's function for importing excel files into R after realising that the xlxs package hadn't been updated for R 3.1.0.

我最近发现了Schaun Wheeler的函数,用于将excel文件导入到R中,因为我意识到xlxs包在r3.1.0中没有更新。

https://gist.github.com/schaunwheeler/5825002

https://gist.github.com/schaunwheeler/5825002

The file name needs to have the ".xlsx" extension and the file can't be open when you run the function.

文件名必须包含“”。xlsx"扩展名,运行函数时文件无法打开。

This function is really useful for accessing other peoples work. The main advantages over using the read.csv function are when

这个功能对于访问其他人的工作非常有用。与使用read相比的主要优势。csv函数时

  • Importing multiple excel files
  • 导入多个excel文件
  • Importing large files
  • 进口大文件
  • Files that are updated regularly
  • 定期更新的文件

Using the read.csv function requires manual opening and saving of each Excel document which is time consuming and very boring. Using Schaun's function to automate the workflow is therefore a massive help.

使用阅读。csv功能需要手动打开和保存每个Excel文档,这很耗时,也很枯燥。因此,使用Schaun的功能来自动化工作流是一个巨大的帮助。

Big props to Schaun for this solution.

Schaun为这个解决方案提供了很大的支持。

#8


3  

If you are running into the same problem and R is giving you an error -- could not find function ".jnew" -- Just install the library rJava. Or if you have it already just run the line library(rJava). That should be the problem.

如果你遇到同样的问题R给你一个错误,找不到函数。jnew——只需安装库rJava。或者,如果您已经有了它,那么只需运行line库(rJava)。这应该是问题所在。

Also, it should be clear to everybody that csv and txt files are easier to work with, but life is not easy and sometimes you just have to open an xlsx.

而且,每个人都应该清楚,csv和txt文件更容易使用,但生活并不容易,有时你只需要打开xlsx。

#9


1  

What's your operating system? What version of R are you running: 32-bit or 64-bit? What version of Java do you have installed?

你的操作系统是什么?您正在运行的R版本是32位还是64位?您安装了什么版本的Java ?

I had a similar error when I first started using the read.xlsx() function and discovered that my issue (which may or may not be related to yours; at a minimum, this response should be viewed as "try this, too") was related to the incompatability of .xlsx pacakge with 64-bit Java. I'm fairly certain that the .xlsx package requires 32-bit Java.

当我第一次使用read.xlsx()函数时,也有类似的错误,并发现我的问题(可能与您的问题有关,也可能与您的问题无关);至少,这个响应应该被看作是“尝试这个”,它与64位Java的.xlsx pacakge的不兼容性有关。我非常确定。xlsx包需要32位Java。

Use 32-bit R and make sure that 32-bit Java is installed. This may address your issue.

使用32位的R并确保安装了32位的Java。这可能会解决你的问题。

#10


1  

You have checked that R is actually able to find the file, e.g. file.exists("C:/AB_DNA_Tag_Numbers.xlsx") ? – Ben Bolker Aug 14 '11 at 23:05

您已经检查了R是否能够找到文件,例如file.exist(“C:/AB_DNA_Tag_Numbers.xlsx”)?- Ben Bolker 8月14日23:05

Above comment should've solved your problem:

以上评论应该已经解决了你的问题:

require("xlsx")
read.xlsx("filepath/filename.xlsx",1) 

should work fine after that.

在那之后应该没问题。

#11


1  

For me the openxlx package worked in the easiest way.

对我来说,openxlx包以最简单的方式工作。

install.packages("openxlsx")
library(openxlsx)
rawData<-read.xlsx("your.xlsx");

#12


0  

You may be able to keep multiple tabs and more formatting information if you export to an OpenDocument Spreadsheet file (ods) or an older Excel format and import it with the ODS reader or the Excel reader you mentioned above.

如果您导出到OpenDocument电子表格文件(ods)或旧的Excel格式,并将其导入到您上面提到的ods reader或Excel reader中,那么您可能可以保存多个选项卡和更多格式化信息。

#13


0  

As stated by many here, I am writing the same thing but with an additional point!

正如这里的许多人所说,我写的是一样的东西,但还有一点!

At first we need to make sure that our R Studio has these two packages installed:

首先,我们需要确保我们的R Studio安装了这两个包:

  1. "readxl"
  2. “readxl”
  3. "XLConnect"
  4. “XLConnect”

In order to load a package in R you can use the below function:

为了在R中装载一个包,您可以使用以下函数:

install.packages("readxl/XLConnect")
library(XLConnect)
search()

search will display the list of current packages being available in your R Studio.

search将显示R Studio中可用的当前包的列表。

Now another catch, even though you might have these two packages but still you may encounter problem while reading "xlsx" file and the error could be like "error: more columns than column name"

现在,另一个问题是,尽管您可能有这两个包,但是在读取“xlsx”文件时仍然可能遇到问题,错误可能是“错误:列多于列名”

To solve this issue you can simply resave your excel sheet "xlsx" in to

要解决这个问题,只需将excel表“xlsx”重新保存到To中

"CSV (Comma delimited)"

“CSV(逗号分隔)”

and your life will be super easy....

和你的生活将是超级简单....

Have fun!!

玩得开心! !

#14


0  

I have tried very hard on all the answers above. However, they did not actually help because I used a mac. The rio library has this import function which can basically import any type of data file into Rstudio, even those file using languages other than English!

我对上面所有的答案都很努力。但是,他们实际上没有帮助,因为我使用的是mac。里约热内卢库有这个导入功能,基本上可以将任何类型的数据文件导入Rstudio,甚至是那些使用非英语语言的文件!

Try codes below:

试试下面的代码:

    library(rio)
    AB <- import("C:/AB_DNA_Tag_Numbers.xlsx")
    AB <- AB[,1]

Hope this help. For more detailed reference: https://cran.r-project.org/web/packages/rio/vignettes/rio.html

希望这个有帮助。更详细的参考:https://cran.r-project.org/web/packages/rio/vignettes/rio.html

#1


72  

For a solution that is free of fiddly external dependencies*, there is now readxl:

对于一个没有复杂外部依赖的解决方案*,现在有readxl:

The readxl package makes it easy to get data out of Excel and into R. Compared to many of the existing packages (e.g. gdata, xlsx, xlsReadWrite) readxl has no external dependencies so it's easy to install and use on all operating systems. It is designed to work with tabular data stored in a single sheet.

与许多现有的包(例如gdata、xlsx、xlsReadWrite)相比,readxl包使从Excel中获取数据和将数据输入r变得更加容易。它被设计用于处理存储在单张表中的表格数据。

Readxl supports both the legacy .xls format and the modern xml-based .xlsx format. .xls support is made possible the with libxls C library, which abstracts away many of the complexities of the underlying binary format. To parse .xlsx, we use the RapidXML C++ library.

Readxl支持遗留的.xls格式和现代基于xml的.xlsx格式。要解析.xlsx,我们使用RapidXML c++库。

It can be installed like so:

可以这样安装:

install.packages("readxl") # CRAN version

or

devtools::install_github("hadley/readxl") # development version

Usage

使用

library(readxl)

# read_excel reads both xls and xlsx files
read_excel("my-old-spreadsheet.xls")
read_excel("my-new-spreadsheet.xlsx")

# Specify sheet with a number or name
read_excel("my-spreadsheet.xls", sheet = "data")
read_excel("my-spreadsheet.xls", sheet = 2)

# If NAs are represented by something other than blank cells,
# set the na argument
read_excel("my-spreadsheet.xls", na = "NA")

* not strictly true, it requires the Rcpp package, which in turn requires Rtools (for Windows) or Xcode (for OSX), which are dependencies external to R. But they don't require any fiddling with paths, etc., so that's an advantage over Java and Perl dependencies.

*严格来说并不正确,它需要Rcpp包,而Rtools(用于Windows)或Xcode(用于OSX)是r外部的依赖项,但它们不需要对路径进行任何修改,等等。

Update There is now the rexcel package. This promises to get Excel formatting, functions and many other kinds of information from the Excel file and into R.

更新现在是rexcel包。它承诺从Excel文件和R中获取Excel格式、函数和许多其他类型的信息。

#2


33  

You may also want to try the XLConnect package. I've had better luck with it than xlsx (plus it can read .xls files too).

您可能还想尝试XLConnect包。与xlsx相比,我的运气更好(另外它也可以读取.xls文件)。

library(XLConnect)
theData <- readWorksheet(loadWorkbook("C:/AB_DNA_Tag_Numbers.xlsx"),sheet=1)

also, if you are having trouble with your file not being found, try selecting it with file.choose().

另外,如果您的文件没有找到,请尝试使用file.choose()选择它。

#3


19  

I would definitely try the read.xls function in the gdata package, which is considerably more mature than the xlsx package. It may require Perl ...

我一定会尝试阅读。xls函数在gdata包中,它比xlsx包成熟得多。它可能需要Perl……

#4


18  

Update

As the Answer below is now somewhat outdated, I'd just draw attention to the readxl package. If the Excel sheet is well formatted/lain out then I would now use readxl to read from the workbook. If sheets are poorly formatted/lain out then I would still export to CSV and then handle the problems in R either via read.csv() or plain old readLines().

由于下面的答案现在有点过时了,我只想提请大家注意readxl包。如果Excel表的格式很好,那么我现在将使用readxl从工作簿中读取。如果表的格式很糟糕,那么我仍然会导出到CSV,然后通过read.csv()或普通的readLines()处理R中的问题。

Original

My preferred way is to save individual Excel sheets in comma separated value (CSV) files. On Windows, these files are associated with Excel so you don't loose the double-click-open-in-Excel "feature".

我的首选方法是在逗号分隔值(CSV)文件中保存单个Excel表。在Windows上,这些文件与Excel相关联,因此您不会失去双击打开Excel的“特性”。

CSV files can be read into R using read.csv(), or, if you are in a location or using a computer set up with some European settings (where , is used as the decimal place), using read.csv2().

CSV文件可以使用read.csv()来读到R,或者,如果您在某个位置,或者使用一些欧洲设置的计算机(其中,用作小数点),请使用read.csv2()。

These functions have sensible defaults that makes reading appropriately formatted files simple. Just keep any labels for samples or variables in the first row or column.

这些函数具有合理的默认值,使读取适当格式的文件变得简单。在第一行或列中保留任何样本或变量的标签。

Added benefits of storing files in CSV are that as the files are plain text they can be passed around very easily and you can be confident they will open anywhere; one doesn't need Excel to look at or edit the data.

在CSV中存储文件的额外好处是,由于文件是纯文本,所以可以很容易地传递它们,并且您可以确信它们将在任何地方打开;不需要Excel来查看或编辑数据。

#5


17  

Example 2012:

2012年的例子:

library("xlsx")
FirstTable <- read.xlsx("MyExcelFile.xlsx", 1 , stringsAsFactors=F)
SecondTable <- read.xlsx("MyExcelFile.xlsx", 2 , stringsAsFactors=F)
  • I would try 'xlsx' package for it is easy to handle and seems mature enough
  • 我想试试xlsx包,因为它很容易处理,看起来也足够成熟
  • worked fine for me and did not need any additionals like Perl or whatever
  • 我工作得很好,不需要Perl之类的其他工具

Example 2015:

2015年的例子:

library("readxl")
FirstTable  <- read_excel("MyExcelFile.xlsx", 1)
SecondTable <- read_excel("MyExcelFile.xlsx", 2)
  • nowadays I use readxl and have made good experience with it.
  • 现在我使用的是readxl,并且已经有了很好的经验。
  • no extra stuff needed
  • 不需要额外的东西
  • good performance
  • 良好的性能

#6


12  

This new package looks nice http://cran.r-project.org/web/packages/openxlsx/openxlsx.pdf It doesn't require rJava and is using 'Rcpp' for speed.

这个新包看起来不错,http://cran.r-project.org/web/packages/openxlsx/openxlsx.pdf它不需要rJava,并且使用“Rcpp”作为速度。

#7


3  

I recently discovered Schaun Wheeler's function for importing excel files into R after realising that the xlxs package hadn't been updated for R 3.1.0.

我最近发现了Schaun Wheeler的函数,用于将excel文件导入到R中,因为我意识到xlxs包在r3.1.0中没有更新。

https://gist.github.com/schaunwheeler/5825002

https://gist.github.com/schaunwheeler/5825002

The file name needs to have the ".xlsx" extension and the file can't be open when you run the function.

文件名必须包含“”。xlsx"扩展名,运行函数时文件无法打开。

This function is really useful for accessing other peoples work. The main advantages over using the read.csv function are when

这个功能对于访问其他人的工作非常有用。与使用read相比的主要优势。csv函数时

  • Importing multiple excel files
  • 导入多个excel文件
  • Importing large files
  • 进口大文件
  • Files that are updated regularly
  • 定期更新的文件

Using the read.csv function requires manual opening and saving of each Excel document which is time consuming and very boring. Using Schaun's function to automate the workflow is therefore a massive help.

使用阅读。csv功能需要手动打开和保存每个Excel文档,这很耗时,也很枯燥。因此,使用Schaun的功能来自动化工作流是一个巨大的帮助。

Big props to Schaun for this solution.

Schaun为这个解决方案提供了很大的支持。

#8


3  

If you are running into the same problem and R is giving you an error -- could not find function ".jnew" -- Just install the library rJava. Or if you have it already just run the line library(rJava). That should be the problem.

如果你遇到同样的问题R给你一个错误,找不到函数。jnew——只需安装库rJava。或者,如果您已经有了它,那么只需运行line库(rJava)。这应该是问题所在。

Also, it should be clear to everybody that csv and txt files are easier to work with, but life is not easy and sometimes you just have to open an xlsx.

而且,每个人都应该清楚,csv和txt文件更容易使用,但生活并不容易,有时你只需要打开xlsx。

#9


1  

What's your operating system? What version of R are you running: 32-bit or 64-bit? What version of Java do you have installed?

你的操作系统是什么?您正在运行的R版本是32位还是64位?您安装了什么版本的Java ?

I had a similar error when I first started using the read.xlsx() function and discovered that my issue (which may or may not be related to yours; at a minimum, this response should be viewed as "try this, too") was related to the incompatability of .xlsx pacakge with 64-bit Java. I'm fairly certain that the .xlsx package requires 32-bit Java.

当我第一次使用read.xlsx()函数时,也有类似的错误,并发现我的问题(可能与您的问题有关,也可能与您的问题无关);至少,这个响应应该被看作是“尝试这个”,它与64位Java的.xlsx pacakge的不兼容性有关。我非常确定。xlsx包需要32位Java。

Use 32-bit R and make sure that 32-bit Java is installed. This may address your issue.

使用32位的R并确保安装了32位的Java。这可能会解决你的问题。

#10


1  

You have checked that R is actually able to find the file, e.g. file.exists("C:/AB_DNA_Tag_Numbers.xlsx") ? – Ben Bolker Aug 14 '11 at 23:05

您已经检查了R是否能够找到文件,例如file.exist(“C:/AB_DNA_Tag_Numbers.xlsx”)?- Ben Bolker 8月14日23:05

Above comment should've solved your problem:

以上评论应该已经解决了你的问题:

require("xlsx")
read.xlsx("filepath/filename.xlsx",1) 

should work fine after that.

在那之后应该没问题。

#11


1  

For me the openxlx package worked in the easiest way.

对我来说,openxlx包以最简单的方式工作。

install.packages("openxlsx")
library(openxlsx)
rawData<-read.xlsx("your.xlsx");

#12


0  

You may be able to keep multiple tabs and more formatting information if you export to an OpenDocument Spreadsheet file (ods) or an older Excel format and import it with the ODS reader or the Excel reader you mentioned above.

如果您导出到OpenDocument电子表格文件(ods)或旧的Excel格式,并将其导入到您上面提到的ods reader或Excel reader中,那么您可能可以保存多个选项卡和更多格式化信息。

#13


0  

As stated by many here, I am writing the same thing but with an additional point!

正如这里的许多人所说,我写的是一样的东西,但还有一点!

At first we need to make sure that our R Studio has these two packages installed:

首先,我们需要确保我们的R Studio安装了这两个包:

  1. "readxl"
  2. “readxl”
  3. "XLConnect"
  4. “XLConnect”

In order to load a package in R you can use the below function:

为了在R中装载一个包,您可以使用以下函数:

install.packages("readxl/XLConnect")
library(XLConnect)
search()

search will display the list of current packages being available in your R Studio.

search将显示R Studio中可用的当前包的列表。

Now another catch, even though you might have these two packages but still you may encounter problem while reading "xlsx" file and the error could be like "error: more columns than column name"

现在,另一个问题是,尽管您可能有这两个包,但是在读取“xlsx”文件时仍然可能遇到问题,错误可能是“错误:列多于列名”

To solve this issue you can simply resave your excel sheet "xlsx" in to

要解决这个问题,只需将excel表“xlsx”重新保存到To中

"CSV (Comma delimited)"

“CSV(逗号分隔)”

and your life will be super easy....

和你的生活将是超级简单....

Have fun!!

玩得开心! !

#14


0  

I have tried very hard on all the answers above. However, they did not actually help because I used a mac. The rio library has this import function which can basically import any type of data file into Rstudio, even those file using languages other than English!

我对上面所有的答案都很努力。但是,他们实际上没有帮助,因为我使用的是mac。里约热内卢库有这个导入功能,基本上可以将任何类型的数据文件导入Rstudio,甚至是那些使用非英语语言的文件!

Try codes below:

试试下面的代码:

    library(rio)
    AB <- import("C:/AB_DNA_Tag_Numbers.xlsx")
    AB <- AB[,1]

Hope this help. For more detailed reference: https://cran.r-project.org/web/packages/rio/vignettes/rio.html

希望这个有帮助。更详细的参考:https://cran.r-project.org/web/packages/rio/vignettes/rio.html