I know that MatLab is not supposed to be ideal for this, but I wish to do some minimal work with my table-like data.dat file which looks like this:
我知道MatLab不应该是理想的,但我希望用类似于table的data.dat文件做一些最小的工作,如下所示:
ID,Name,Surname,Y,M,D,Num,Loc
1672399390,A,DULKINAS,1993,01,31,3019,Šiauliai
4157844163,D,SKARBALIUS,1993,12,08,3019,Tauragė
5541091033,E,LUKOŠEVIČIUS,1992,10,25,3019,Panevėžys
2005609387,M,DUBINSKAS,1991,03,31,3019,Kaunas
2716651285,P,ŽIEDELIS,1992,02,28,3019,Vilnius
Since the data is neatly formatted and separated by commas I decided to simply use readtable('data.dat')
and work from there.
由于数据格式整齐,并用逗号分隔,我决定只使用readtable('data.dat')并从那里开始工作。
Problem 1. MatLab doesn't tell where the faulty line is. Since there were a couple of redundant commas, it just threw error Each line of a text file must have the same number of delimiters. I solved this by counting commas in every line using other tools and manually correcting them afterwards.
问题1.MatLab没有告诉故障线路在哪里。由于有几个冗余逗号,它只是抛出错误文本文件的每一行必须具有相同数量的分隔符。我通过使用其他工具计算每行中的逗号并在之后手动更正它来解决这个问题。
Problem 2. For some reason it renames the first variable ID
(which is AFAIK a valid not reserved variable name) to x__ID
and gives a warning Variable names were modified to make them valid MATLAB identifiers. I don't really care about this one but it is weird.
问题2.由于某种原因,它将第一个变量ID(即AFAIK是一个有效的非保留变量名称)重命名为x__ID并给出警告变量名称被修改为使它们成为有效的MATLAB标识符。我真的不关心这个,但它很奇怪。
Problem 3. Well the UTF-8 symbols are not displayed correctly. Moreover after trying my luck with documentation and running readtable('data.dat','FileEncoding','UTF-8')
it gives me a flat out error Invalid parameter name: FileEncoding. I am confused.
问题3.好的UTF-8符号没有正确显示。此外,在运行文档并运行readtable('data.dat','FileEncoding','UTF-8')之后,它给出了一个平坦的错误无效的参数名称:FileEncoding。我很迷惑。
How should I approach this situation?
我应该如何处理这种情况?
2 个解决方案
#1
That is probably because you are using a version of matlab which is older than R2014b. The FileEncoding
option was added in R2014b. If you check the documentation in your installation by doc readtable
you probably find it missing.
这可能是因为您使用的是比R2014b更旧的matlab版本。在R2014b中添加了FileEncoding选项。如果您通过doc readtable检查安装中的文档,您可能会发现它丢失了。
The reason for the renaming of the ID is that it is interpreting Byte Order Mark in the beginning of your unicode document as part of the name
重命名ID的原因是它在您的unicode文档的开头解释字节顺序标记作为名称的一部分
#2
In addition, to address Problem 1 - the lines with extra commas are now flagged in the error message as of R2015a. I added an extra comma into your data file on line 4, and here is the result:
此外,为了解决问题1 - 现在在R2015a的错误消息中标记了带有额外逗号的行。我在第4行的数据文件中添加了一个额外的逗号,结果如下:
>> readtable('data.dat', 'FileEncoding', 'UTF-8')
Error using readtable (line 129)
Reading failed at line 4. All lines of a text file must have the same number of delimiters.
Line 4 has 8 delimiters, while preceding lines have 7.
#1
That is probably because you are using a version of matlab which is older than R2014b. The FileEncoding
option was added in R2014b. If you check the documentation in your installation by doc readtable
you probably find it missing.
这可能是因为您使用的是比R2014b更旧的matlab版本。在R2014b中添加了FileEncoding选项。如果您通过doc readtable检查安装中的文档,您可能会发现它丢失了。
The reason for the renaming of the ID is that it is interpreting Byte Order Mark in the beginning of your unicode document as part of the name
重命名ID的原因是它在您的unicode文档的开头解释字节顺序标记作为名称的一部分
#2
In addition, to address Problem 1 - the lines with extra commas are now flagged in the error message as of R2015a. I added an extra comma into your data file on line 4, and here is the result:
此外,为了解决问题1 - 现在在R2015a的错误消息中标记了带有额外逗号的行。我在第4行的数据文件中添加了一个额外的逗号,结果如下:
>> readtable('data.dat', 'FileEncoding', 'UTF-8')
Error using readtable (line 129)
Reading failed at line 4. All lines of a text file must have the same number of delimiters.
Line 4 has 8 delimiters, while preceding lines have 7.