在R中显示UTF-8编码的中文字符

时间:2023-01-05 21:07:36

I try to open a UTF-8 encoded .csv file that contains (traditional) Chinese characters in R. For some reason, R displays the information sometimes as Chinese characters, sometimes as unicode characters.

我尝试在R中打开包含(传统)中文字符的UTF-8编码的.csv文件。出于某种原因,R有时将信息显示为中文字符,有时显示为unicode字符。

For instance:

data <-read.csv("mydata.csv", encoding="UTF-8")

data

will produce unicode characters, while:

将生成unicode字符,同时:

data <-read.csv("mydata.csv", encoding="UTF-8")

data[,1]

will actually display Chinese characters.

实际上会显示汉字。

If I turn it into a matrix, it will also display Chinese characters, but if I try to look at the data (command View(data) or fix(data)) it is in unicode again.

如果我把它变成一个矩阵,它也会显示中文字符,但如果我试着查看数据(命令视图(数据)或修复(数据)),它会再次显示为unicode。

I've asked for advice from people who use a Mac (I'm using a PC, Windows 7), and some of them got Chinese characters throughout, others didn't. I tried to save the original data as a table instead and read it into R this way - same result. I tried running the script in RStudio, Revolution R, and RGui. I tried to adjust the locale (e.g. to chinese), but either R didn't let me change it or else the result was gibberish instead of unicode characters.

我向使用Mac的人(我使用的是PC,Windows 7)征求了建议,其中一些人整个都有汉字,其他人没有。我试图将原始数据保存为表格,并以这种方式将其读入R - 结果相同。我尝试在RStudio,Revolution R和RGui中运行脚本。我试图调整语言环境(例如中文),但是R不允许我改变它,否则结果是乱码而不是unicode字符。

My current locale is:

我目前的语言环境是:

"LC_COLLATE=French_Switzerland.1252;LC_CTYPE=French_Switzerland.1252;LC_MONETARY=French_Switzerland.1252;LC_NUMERIC=C;LC_TIME=French_Switzerland.1252"

Any help to get R to consistently display Chinese characters would be greatly appreciated...

任何有助于让R持续展示汉字的帮助将不胜感激......

2 个解决方案

#1


2  

Not a bug, more a misunderstanding of the underlying type system conversions (the character type and the factor type) when constructing a data.frame.

在构造data.frame时,不是错误,更是对底层类型系统转换(字符类型和因子类型)的误解。

You could start first with data <-read.csv("mydata.csv", encoding="UTF-8", stringsAsFactors=FALSE) which will make your Chinese characters to be of the character type and so by printing them out you should see waht you are expecting.

您可以先使用数据<-read.csv(“mydata.csv”,encoding =“UTF-8”,stringsAsFactors = FALSE)开始,这将使您的中文字符成为字符类型,因此通过打印出来,您应该看到你期待的。

@nograpes: similarly x=c('中華民族');x; y <- data.frame(x, stringsAsFactors=FALSE) and everything should be ok.

@nograpes:同样x = c('中华民族'); x; y < - data.frame(x,stringsAsFactors = FALSE),一切都应该没问题。

#2


1  

In my case, the utf-8 encoding does not work in my r. But the Gb* encoding works.But the utf8 wroks in ubuntu. First you need to figure out the default encoding in your OS. And encode it as it is. Excel can not encode it as utf8 properly even it claims it save as etf8.

在我的情况下,utf-8编码在我的r中不起作用。但是Gb *编码可以工作。但utf8在ubuntu中肆虐。首先,您需要确定操作系统中的默认编码。并按原样编码。 Excel无法正确编码为utf8,即使它声称它保存为etf8。

(1) Download 'open sheet'.

(1)下载“打开表”。

(2) Open it properly. You can scroll the encoding method until you see the Chinese character displayed in the preview windows.

(2)正确打开。您可以滚动编码方法,直到看到预览窗口中显示的中文字符。

(3) Save it as utf-8(if you want utf-8). (UTF-8 is not solution to every problem, you HAVE TO know the default encoding in your system first)

(3)保存为utf-8(如果你想要utf-8)。 (UTF-8不能解决所有问题,您必须先了解系统中的默认编码)

#1


2  

Not a bug, more a misunderstanding of the underlying type system conversions (the character type and the factor type) when constructing a data.frame.

在构造data.frame时,不是错误,更是对底层类型系统转换(字符类型和因子类型)的误解。

You could start first with data <-read.csv("mydata.csv", encoding="UTF-8", stringsAsFactors=FALSE) which will make your Chinese characters to be of the character type and so by printing them out you should see waht you are expecting.

您可以先使用数据<-read.csv(“mydata.csv”,encoding =“UTF-8”,stringsAsFactors = FALSE)开始,这将使您的中文字符成为字符类型,因此通过打印出来,您应该看到你期待的。

@nograpes: similarly x=c('中華民族');x; y <- data.frame(x, stringsAsFactors=FALSE) and everything should be ok.

@nograpes:同样x = c('中华民族'); x; y < - data.frame(x,stringsAsFactors = FALSE),一切都应该没问题。

#2


1  

In my case, the utf-8 encoding does not work in my r. But the Gb* encoding works.But the utf8 wroks in ubuntu. First you need to figure out the default encoding in your OS. And encode it as it is. Excel can not encode it as utf8 properly even it claims it save as etf8.

在我的情况下,utf-8编码在我的r中不起作用。但是Gb *编码可以工作。但utf8在ubuntu中肆虐。首先,您需要确定操作系统中的默认编码。并按原样编码。 Excel无法正确编码为utf8,即使它声称它保存为etf8。

(1) Download 'open sheet'.

(1)下载“打开表”。

(2) Open it properly. You can scroll the encoding method until you see the Chinese character displayed in the preview windows.

(2)正确打开。您可以滚动编码方法,直到看到预览窗口中显示的中文字符。

(3) Save it as utf-8(if you want utf-8). (UTF-8 is not solution to every problem, you HAVE TO know the default encoding in your system first)

(3)保存为utf-8(如果你想要utf-8)。 (UTF-8不能解决所有问题,您必须先了解系统中的默认编码)