使用read.delim来读取utf-8编码中的汉字文件。

时间:2023-01-15 10:41:14

I have a utf-8 encoded file test.txt, where the values are delimiter separated

我有一个utf-8编码的文件测试。txt,其中的值是分隔符分隔的。

a   b   c
小   小   大
小   大   小
大   小   小

And I read to read the data using the commands

我用命令读取数据。

Sys.setlocale("LC_CTYPE", "Chinese")
data <- read.delim("test.txt",encoding="UTF-8")

But data is read as a data frame with output

但是,数据被读取为具有输出的数据帧。

[1] X.U.FEFF.a b          c         
<0 rows> (or 0-length row.names)

My system is windows 7

我的系统是windows 7。

How to read the data correctly?

如何正确读取数据?

1 个解决方案

#1


2  

I just tried read.csv and it works fine. I tested your code, the read.delim works out of box, too.

我只是试着读。csv可以正常工作。我测试了你的代码,read.delim也从盒子里出来了。

> a <- read.csv('/tmp/test.txt', sep="\t", quote="", stringsAsFactors=FALSE)
> str(a)
'data.frame':   3 obs. of  3 variables:
 $ a: chr  "小" "小" "大"
 $ b: chr  "小" "大" "小"
 $ c: chr  "大" "小" "小"
> a
   a  b  c
1 小 小 大
2 小 大 小
3 大 小 小

> data <- read.delim("/tmp/test.txt", encoding="utf-8")
> data
   a  b  c
1 小 小 大
2 小 大 小
3 大 小 小

Then I tried your Sys.setlocale command, and it didn't work for me, which indicates the command itself is invalid after I tested set the locale to German.

然后我试用了你的系统。setlocale命令,它对我不起作用,这表明在我测试了将locale设置为德语后,命令本身是无效的。

# ?Sys.setlocale:
# "Attempts to set an invalid locale are ignored. There may or may not be a warning, depending on the OS."
> Sys.setlocale("LC_CTYPE", "Chinese")
[1] ""
Warning message:
In Sys.setlocale("LC_CTYPE", "Chinese") :
  OS reports request to set locale to "Chinese" cannot be honored
> Sys.setlocale("LC_TIME", "de_DE")  # Mac OS X, in UTF-8
[1] "de_DE"
> Sys.getlocale()
[1] "en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/de_DE/en_US.UTF-8"

To successfully set the locale to Chinese, you can try this(Cited From Here):

要成功地将地区设置为中文,您可以尝试以下方法(此处引用):

Sys.setlocale("LC_ALL","zh_CN.utf-8")
> Sys.getlocale()
[1] "zh_CN.utf-8/zh_CN.utf-8/zh_CN.utf-8/C/zh_CN.utf-8/en_US.UTF-8"

#1


2  

I just tried read.csv and it works fine. I tested your code, the read.delim works out of box, too.

我只是试着读。csv可以正常工作。我测试了你的代码,read.delim也从盒子里出来了。

> a <- read.csv('/tmp/test.txt', sep="\t", quote="", stringsAsFactors=FALSE)
> str(a)
'data.frame':   3 obs. of  3 variables:
 $ a: chr  "小" "小" "大"
 $ b: chr  "小" "大" "小"
 $ c: chr  "大" "小" "小"
> a
   a  b  c
1 小 小 大
2 小 大 小
3 大 小 小

> data <- read.delim("/tmp/test.txt", encoding="utf-8")
> data
   a  b  c
1 小 小 大
2 小 大 小
3 大 小 小

Then I tried your Sys.setlocale command, and it didn't work for me, which indicates the command itself is invalid after I tested set the locale to German.

然后我试用了你的系统。setlocale命令,它对我不起作用,这表明在我测试了将locale设置为德语后,命令本身是无效的。

# ?Sys.setlocale:
# "Attempts to set an invalid locale are ignored. There may or may not be a warning, depending on the OS."
> Sys.setlocale("LC_CTYPE", "Chinese")
[1] ""
Warning message:
In Sys.setlocale("LC_CTYPE", "Chinese") :
  OS reports request to set locale to "Chinese" cannot be honored
> Sys.setlocale("LC_TIME", "de_DE")  # Mac OS X, in UTF-8
[1] "de_DE"
> Sys.getlocale()
[1] "en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/de_DE/en_US.UTF-8"

To successfully set the locale to Chinese, you can try this(Cited From Here):

要成功地将地区设置为中文,您可以尝试以下方法(此处引用):

Sys.setlocale("LC_ALL","zh_CN.utf-8")
> Sys.getlocale()
[1] "zh_CN.utf-8/zh_CN.utf-8/zh_CN.utf-8/C/zh_CN.utf-8/en_US.UTF-8"