在r数据框中设置列类时，将####删除到NA错误。

I am working with a csv file which was originally formatted in excel. I want to convert the rate column into numeric and remove the "$" sign.

我正在处理一个csv文件，它最初是用excel格式格式化的。我要将rate列转换为数字，并删除“$”符号。

I read in the file with : > NImp <- read.csv("National_TV_Spots 6_30_14 to 8_31_14.csv", sep=",", header=TRUE, stringsAsFactors=FALSE, strip.white=TRUE, na.strings=c("Not Monitored"))

我在文件中读:> NImp <- read。csv(“National_TV_Spots 6 _30_14 8 _31_14。csv"， sep="，"， header=TRUE, stringsAsFactors=FALSE, strip。白色= TRUE,na。= c字符串(“不监控”))

The data frame looks like this:

数据帧是这样的:

HH.IMP..000.       ISCI                                          Creative          Program  Rate
1           NA     IT3896 Rising Costs30 (Opportunity Scholar - No Nursing)      NUVO CINEMA $0.00
2           NA     IT3896 Rising Costs30 (Opportunity Scholar - No Nursing)      NUVO CINEMA $0.00
3          141    IT14429 Rising Costs30 (Opportunity Scholar - No Nursing)            BONUS $0.00
4          476 ITES15443H     Matthew Traina (B. EECT/A. CEET) :60 (no loc) Law & Order: SVU $0.00
5           NA     IT3896 Rising Costs30 (Opportunity Scholar - No Nursing)      NUVO CINEMA $0.00

When I do do the conversion, I get an error message: > NImp$Rate <- as.numeric(gsub("$","", NImp$Rate)) Warning message: NAs introduced by coercion and all values get coerced to NAs.

当我进行转换时，会得到一个错误消息:> NImp$Rate <- as。数字(gsub(“$”、“”、“NImp$Rate”)警告消息:强制引入的NAs，所有值被强制转换为NAs。

I also tried, NImp$Rate <- as.numeric(sub("\\$","", NImp$Rate)) but again got the same warning message. However not all values became NAs - only specific ones. I opened the csv in excel to check and I realized that excel forces csv column width too narrow resulting in "####" cells. These cells are being coerced to "NA" by r.

我也尝试过，NImp$Rate <- as。数字(sub(“\$”、“”、“NImp$Rate”)，但再次得到相同的警告信息。然而，并不是所有的值都变成了NAs——只有特定的值。我在excel中打开了csv，并发现excel使csv列宽度太窄，导致了“####”单元格。这些细胞被r胁迫成NA。

I tried the option of opening the file in notepad and read the notepad file into r. But I get the same results. The values are correctly displayed in both Notepad and when I read the file into r. But when I change to numeric, everything that shows as "####" in excel, becomes NA.

我尝试了在记事本中打开文件并将记事本文件读入r的选项，但是得到了相同的结果。当我把文件读入r时，这些值都正确地显示在记事本中。但是当我把它变成数值时，所有在excel中显示为“#### ###”的东西都变成了NA。

What should I do?

我应该做什么?

Adding str(NImp)

添加str(NImp)

'data.frame':   9859 obs. of  19 variables:
$ Spot.ID         : int  13072903 13072904 13072898 13072793 13072905 13072899 13072397 13072476 13072398 13072681 ...
$ Date            : chr  "6/30/2014" "6/30/2014" "6/30/2014" "6/30/2014" ...
$ Hour            : int  0 0 0 0 0 0 1 1 1 2 ...
$ Time            : chr  "12:08 AM" "12:20 AM" "12:29 AM" "12:30 AM" ...
$ Local.Date      : chr  "6/30/2014" "6/30/2014" "6/30/2014" "6/30/2014" ...
$ Broadcast.Week  : int  1 1 1 1 1 1 1 1 1 1 ...
$ Local.Hour      : int  0 0 0 0 0 0 1 1 1 2 ...
$ Local.Time      : chr  "12:08 AM" "12:20 AM" "12:29 AM" "12:30 AM" ...
$ Market          : chr  "NATIONAL CABLE" "NATIONAL CABLE" "NATIONAL CABLE" "NATIONAL CABLE" ...
$ Vendor          : chr  "NUVO" "NUVO" "AFAM" "USA" ...
$ Station         : chr  "NUVO" "NUVO" "AFAM" "USA" ...
$ M18.34.IMP..000.: int  NA NA 3 88 NA 3 NA 53 NA 37 ...
$ W18.34.IMP..000.: int  NA NA 86 66 NA 86 NA 70 NA 60 ...
$ A18.34.IMP..000.: int  NA NA 89 154 NA 89 NA 123 NA 97 ...
$ HH.IMP..000.    : int  NA NA 141 476 NA 141 NA 461 NA 434 ...
$ ISCI            : chr  "IT3896" "IT3896" "IT14429" "ITES15443H" ...
$ Creative        : chr  "Rising Costs30 (Opportunity Scholar - No Nursing)" "Rising Costs30 (Opportunity Scholar - No Nursing)" "Rising Costs30 (Opportunity Scholar - No Nursing)" "Matthew Traina (B. EECT/A. CEET) :60 (no loc)" ...
$ Program         : chr  "NUVO CINEMA" "NUVO CINEMA" "BONUS" "Law & Order: SVU" ...
$ Rate            : chr  "$0.00" "$0.00" "$0.00" "$0.00" ...

1 个解决方案

#1

When a column was set as "Currency" in Excel, the values in the thousands or greater have a comma in them as well as the dollar sign prefix. For example, a value might look like $1,200.00. The problem you were having was because you were removing the dollar signs but not the commas, so when you tried to convert to numeric you get NA.

当一个列在Excel中被设置为“货币”时，数千或更大的值在它们和美元符号前缀中都有一个逗号。例如，一个值可能看起来像$1,200.00。你遇到的问题是你去掉了美元符号而不是逗号，所以当你试图转换成数字时，你得到了NA。

as.numeric(c("0", "0", "1,200"))
[1]  0  0 NA
Warning message:
NAs introduced by coercion

You can remove the dollar signs and commas in one step using gsub. I found an example of how to do this in a comment to this answer.

您可以使用gsub一步删除美元符号和逗号。我在对这个答案的评论中找到了这样做的一个例子。

as.numeric(gsub("[$,]", "", c("$0", "$0", "$1,200")))
[1]    0    0 1200

So the code that should work for your dataset is

所以应该适用于数据集的代码是

as.numeric(gsub("[$,]", "", NImp$Rate))

#1

as.numeric(c("0", "0", "1,200"))
[1]  0  0 NA
Warning message:
NAs introduced by coercion