I saved an Excel table as text (*.txt). Unfortunately, Excel don't let me choose the encoding. So I need to open it in Notepad (which opens as ANSI) and save it as UTF-8. Then, when I read it in R:
我将Excel表格保存为text (*.txt)。不幸的是,Excel不让我选择编码。因此,我需要在记事本中打开它(它以ANSI的形式打开),并将其保存为UTF-8。然后,当我在R:
data <- read.csv("my_file.txt",header=TRUE,sep="\t",encoding="UTF-8")
it shows the name of the first column beginning with "X.U.FEFF.". I know these are the bytes reserved to tell any program that the file is in UTF-8 format. So it shouldn't appear as text! Is this a bug? Or am I missing some option? Thanks in advance!
它显示了以“X.U.FEFF”开头的第一列的名称。我知道这些字节是用来告诉任何程序文件是UTF-8格式的。所以它不应该作为文本出现!这是一个错误吗?还是我错过了一些选择?提前谢谢!
3 个解决方案
#1
9
So I was going to give you instructions on how to manually open the file and check for and discard the BOM, but then I noticed this (in ?file
):
所以我打算给你一些关于如何手动打开文件并检查和丢弃BOM的说明,但是我注意到这个(在文件中):
As from R 3.0.0 the encoding "UTF-8-BOM" is accepted and will remove a Byte Order Mark if present (which it often is for files and webpages generated by Microsoft applications).
从r3.0.0开始,“UTF-8-BOM”编码被接受,并将删除一个字节顺序标记(它通常用于由微软应用程序生成的文件和网页)。
which means that if you have a sufficiently new R interpreter,
这意味着如果你有一个足够新的R解释器,
read.csv("my_file.txt", fileEncoding="UTF-8-BOM", ...other args...)
should do what you want.
应该做你想做的。
#2
1
most of the arguments in read.csv
are dummy args -- including fileEncoding
.
阅读中的大部分论点。csv是虚拟的args——包括文件编码。
use read.table
instead
使用阅读。表而不是
read.table("my_file.txt", header=TRUE, sep="\t", fileEncoding="UTF-8")
#3
0
Possible solution from the comments:
可能的解决方案:
Try it with the read.csv argument check.names=FALSE
. Note that if you use this, you will not be able to directly reference columns with the $
notation, unless you surround the name in quotes. For instance: yourdf$"first col"
.
试着读一下。csv论点check.names = FALSE。请注意,如果您使用这个,您将无法使用$ notation直接引用列,除非您在引号中包围该名称。例如:yourdf“第一坳”美元。
#1
9
So I was going to give you instructions on how to manually open the file and check for and discard the BOM, but then I noticed this (in ?file
):
所以我打算给你一些关于如何手动打开文件并检查和丢弃BOM的说明,但是我注意到这个(在文件中):
As from R 3.0.0 the encoding "UTF-8-BOM" is accepted and will remove a Byte Order Mark if present (which it often is for files and webpages generated by Microsoft applications).
从r3.0.0开始,“UTF-8-BOM”编码被接受,并将删除一个字节顺序标记(它通常用于由微软应用程序生成的文件和网页)。
which means that if you have a sufficiently new R interpreter,
这意味着如果你有一个足够新的R解释器,
read.csv("my_file.txt", fileEncoding="UTF-8-BOM", ...other args...)
should do what you want.
应该做你想做的。
#2
1
most of the arguments in read.csv
are dummy args -- including fileEncoding
.
阅读中的大部分论点。csv是虚拟的args——包括文件编码。
use read.table
instead
使用阅读。表而不是
read.table("my_file.txt", header=TRUE, sep="\t", fileEncoding="UTF-8")
#3
0
Possible solution from the comments:
可能的解决方案:
Try it with the read.csv argument check.names=FALSE
. Note that if you use this, you will not be able to directly reference columns with the $
notation, unless you surround the name in quotes. For instance: yourdf$"first col"
.
试着读一下。csv论点check.names = FALSE。请注意,如果您使用这个,您将无法使用$ notation直接引用列,除非您在引号中包围该名称。例如:yourdf“第一坳”美元。