使用RODBC从xls导入R后验证数据完整性

时间:2023-01-15 08:21:56

I am using RODBC for importing a xls file with ~10000 row and ~250 columns, using the script below:

我正在使用RODBC导入一个包含~10000行和~250列的xls文件,使用下面的脚本:

channel <- odbcConnectExcel(xls.file="s:/demo.xls")
demo <- sqlFetch(channel,"Sheet_1")
odbcClose(channel)

During the import process, I have open and closed that xls file.

在导入过程中,我打开并关闭了xls文件。

As a precaution, I checked the number of rows from R and from the xls file itself, just to make sure the import is not corrupted, and yup, the number of rows remain the same, so I assume the import process is OK.

作为预防措施,我检查了R和xls文件本身的行数,只是为了确保导入没有损坏,并且是的,行数保持不变,所以我假设导入过程没问题。

Problem arise when I start tabulating the data, I realize that even the row numbers are the same, some rows are actually being corrupted, I have an UID inside, and realize that at the later part of the dataframe, the UID are messed up (e.g. same row being duplicated twice).

当我开始列表数据时会出现问题,我意识到即使行号都是相同的,有些行实际上已经被破坏了,我内部有一个UID,并且意识到在数据帧的后面部分,UID搞砸了(例如,同一行被复制两次)。

I re-import the file again, dare not open or close the xls file, and this time, things are going well.

我再次重新导入文件,不敢打开或关闭xls文件,这次事情进展顺利。

I have learned my lesson of never open or close the xls file when it is being read by RODBC, but how can I be sure that the data will be not messed up, or to ensure that I have really imported the dataset without error, after it is imported?

我已经学会了在RODBC读取时从不打开或关闭xls文件的教训,但我怎样才能确定数据不会搞砸,或确保我确实导入了数据集而没有错误,之后它是进口的吗?

Thanks.

谢谢。

P.S. I am thinking of exporting the dataframe to csv, and the xls to csv also, then to compare the checksum, as an extra measure, but is there any other ways?

附:我正在考虑将数据帧导出到csv,并将xls导出到csv,然后比较校验和,作为额外的衡量标准,但还有其他方法吗?

1 个解决方案

#1


1  

I use the read.xls function from the gdata package (install using `install.packages(gdata)').

我使用gdata包中的read.xls函数(使用`install.packages(gdata)'安装)。

This allows you to read from a non-opened Excel file if I'm correct. It is a wrapper for some Perl modules, so it might depend on whether you have those installed if it works for you.

如果我是正确的,这允许您从未打开的Excel文件中读取。它是一些Perl模块的包装器,因此它可能取决于您是否安装了这些模块,如果它适合您。

Quoting the R manual for Data Export/Import:

引用数据导出/导入的R手册:

Perl users have contributed a module OLE::SpreadSheet::ParseExcel and a program xls2csv.pl to convert Excel 95–2003 spreadsheets to CSV files. Package gdata provides a basic wrapper in its read.xls function. With suitable Perl modules installed this function can also read Excel 2007 spreadsheets.

Perl用户提供了一个模块OLE :: SpreadSheet :: ParseExcel和一个程序xls2csv.pl来将Excel 95-2003电子表格转换为CSV文件。包gdata在read.xls函数中提供了一个基本包装器。安装了合适的Perl模块后,此功能还可以读取Excel 2007电子表格。

Generally, I find the safest way to get data from Excel into R is via a csv export.

通常,我发现从Excel获取数据到R的最安全的方法是通过csv导出。

#1


1  

I use the read.xls function from the gdata package (install using `install.packages(gdata)').

我使用gdata包中的read.xls函数(使用`install.packages(gdata)'安装)。

This allows you to read from a non-opened Excel file if I'm correct. It is a wrapper for some Perl modules, so it might depend on whether you have those installed if it works for you.

如果我是正确的,这允许您从未打开的Excel文件中读取。它是一些Perl模块的包装器,因此它可能取决于您是否安装了这些模块,如果它适合您。

Quoting the R manual for Data Export/Import:

引用数据导出/导入的R手册:

Perl users have contributed a module OLE::SpreadSheet::ParseExcel and a program xls2csv.pl to convert Excel 95–2003 spreadsheets to CSV files. Package gdata provides a basic wrapper in its read.xls function. With suitable Perl modules installed this function can also read Excel 2007 spreadsheets.

Perl用户提供了一个模块OLE :: SpreadSheet :: ParseExcel和一个程序xls2csv.pl来将Excel 95-2003电子表格转换为CSV文件。包gdata在read.xls函数中提供了一个基本包装器。安装了合适的Perl模块后,此功能还可以读取Excel 2007电子表格。

Generally, I find the safest way to get data from Excel into R is via a csv export.

通常,我发现从Excel获取数据到R的最安全的方法是通过csv导出。