“列名更多列”错误是什么意思?

时间:2021-02-14 19:47:35

I'm trying to read in a .csv file from the IRS and it doesn't appear to be formatted in any weird way.

我正在尝试从IRS读取一个.csv文件,它似乎没有以任何奇怪的方式格式化。

I'm using the read.table() function, which I have used several times in the past but it isn't working this time; instead, I get this error:

我正在使用read.table()函数,我过去曾多次使用它,但这次它不起作用;相反,我得到这个错误:

data_0910<-read.table("/Users/blahblahblah/countyinflow0910.csv",header=T,stringsAsFactors=FALSE,colClasses="character")

Error in read.table("/Users/blahblahblah/countyinflow0910.csv",  : 
  more columns than column names

Why is it doing this?

它为什么这样做?

For reference, the .csv files can be found at:

作为参考,可以在以下位置找到.csv文件:

http://www.irs.gov/uac/SOI-Tax-Stats-County-to-County-Migration-Data-Files

(The ones I need are under the county to county migration .csv section - either inflow or outflow.)

(我需要的是在县到县迁移.csv部分 - 流入或流出。)

3 个解决方案

#1


18  

It uses commas as separators. So you can either set sep="," or just use read.csv:

它使用逗号作为分隔符。所以你可以设置sep =“,”或者只使用read.csv:

x <- read.csv(file="http://www.irs.gov/file_source/pub/irs-soi/countyinflow1011.csv")
dim(x)
## [1] 113593      9

The error is caused by spaces in some of the values, and unmatched quotes. There are no spaces in the header, so read.table thinks that there is one column. Then it thinks it sees multiple columns in some of the rows. For example, the first two lines (header and first row):

该错误是由某些值中的空格和不匹配的引号引起的。标题中没有空格,因此read.table认为有一列。然后它认为它在某些行中看到多个列。例如,前两行(标题和第一行):

State_Code_Dest,County_Code_Dest,State_Code_Origin,County_Code_Origin,State_Abbrv,County_Name,Return_Num,Exmpt_Num,Aggr_AGI
00,000,96,000,US,Total Mig - US & For,6973489,12948316,303495582

And unmatched quotes, for example on line 1336 (row 1335) which will confuse read.table with the default quote argument (but not read.csv):

和无与伦比的引号,例如在第1336行(第1335行)上,这会使read.table与默认的引用参数(但不是read.csv)混淆:

01,089,24,033,MD,Prince George's County,13,30,1040

#2


3  

For the Germans:

对于德国人:

you have to change your decimal commas into a Full stop in your csv-file (in Excel:File -> Options -> Advanced -> "Decimal seperator") , then the error is solved.

你必须在csv文件中将小数点逗号更改为句号(在Excel中:文件 - >选项 - >高级 - >“十进制分隔符”),然后错误就解决了。

#3


1  

you have have strange characters in your heading # % -- or ,

你的标题中有奇怪的字符#% - 或者,

#1


18  

It uses commas as separators. So you can either set sep="," or just use read.csv:

它使用逗号作为分隔符。所以你可以设置sep =“,”或者只使用read.csv:

x <- read.csv(file="http://www.irs.gov/file_source/pub/irs-soi/countyinflow1011.csv")
dim(x)
## [1] 113593      9

The error is caused by spaces in some of the values, and unmatched quotes. There are no spaces in the header, so read.table thinks that there is one column. Then it thinks it sees multiple columns in some of the rows. For example, the first two lines (header and first row):

该错误是由某些值中的空格和不匹配的引号引起的。标题中没有空格,因此read.table认为有一列。然后它认为它在某些行中看到多个列。例如,前两行(标题和第一行):

State_Code_Dest,County_Code_Dest,State_Code_Origin,County_Code_Origin,State_Abbrv,County_Name,Return_Num,Exmpt_Num,Aggr_AGI
00,000,96,000,US,Total Mig - US & For,6973489,12948316,303495582

And unmatched quotes, for example on line 1336 (row 1335) which will confuse read.table with the default quote argument (but not read.csv):

和无与伦比的引号,例如在第1336行(第1335行)上,这会使read.table与默认的引用参数(但不是read.csv)混淆:

01,089,24,033,MD,Prince George's County,13,30,1040

#2


3  

For the Germans:

对于德国人:

you have to change your decimal commas into a Full stop in your csv-file (in Excel:File -> Options -> Advanced -> "Decimal seperator") , then the error is solved.

你必须在csv文件中将小数点逗号更改为句号(在Excel中:文件 - >选项 - >高级 - >“十进制分隔符”),然后错误就解决了。

#3


1  

you have have strange characters in your heading # % -- or ,

你的标题中有奇怪的字符#% - 或者,