是否可以强制Excel自动识别UTF-8 CSV文件?

时间:2022-02-24 11:02:06

I'm developing a part of an application that's responsible for exporting some data into CSV files. The application always uses UTF-8 because of its multilingual nature at all levels. But opening such CSV files (containing e.g. diacritics, cyrillic letters, Greek letters) in Excel does not achieve the expected results showing something like Г„/Г¤, Г–/Г¶. And I don't know how to force Excel understand that the open CSV file is encoded in UTF-8. I also tried specifying UTF-8 BOM EF BB BF, but Excel ignores that.

我正在开发一个应用程序的一部分,该应用程序负责将一些数据导出到CSV文件中。应用程序总是使用UTF-8,因为它在所有级别都具有多语言特性。但是打开这样的CSV文件(包含如变音符号,西里尔字母、希腊字母)在Excel中并没有达到预期的结果显示类似Г„/Г¤,Г- /Г¶。我不知道如何强制Excel理解open CSV文件是用UTF-8编码的。我还尝试指定UTF-8 BOM EF BB BF,但是Excel忽略了这一点。

Is there any workaround?

有什么解决方法吗?

P.S. Which tools may potentially behave like Excel does?

附注:哪些工具可能具有与Excel相似的性能?


UPDATE

更新

I have to say that I've confused the community with the formulation of the question. When I was asking this question, I asked for a way of opening a UTF-8 CSV file in Excel without any problems for a user, in a fluent and transparent way. However, I used a wrong formulation asking for doing it automatically. That is very confusing and it *es with VBA macro automation. There are two answers for this questions that I appreciate the most: the very first answer by Alex https://*.com/a/6002338/166589, and I've accepted this answer; and the second one by Mark https://*.com/a/6488070/166589 that have appeared a little later. From the usability point of view, Excel seemed to have lack of a good user-friendly UTF-8 CSV support, so I consider both answers are correct, and I have accepted Alex's answer first because it really stated that Excel was not able to do that transparently. That is what I confused with automatically here. Mark's answer promotes a more complicated way for more advanced users to achieve the expected result. Both answers are great, but Alex's one fits my not clearly specified question a little better.

我不得不说,我把这个问题的提法搞混了。当我问这个问题的时候,我问了一个在Excel中打开UTF-8 CSV文件的方法,这个方法对用户来说没有任何问题,而且是流畅和透明的。然而,我用了一个错误的公式要求自动完成。这非常令人困惑,并且与VBA宏自动化冲突。这个问题有两个答案是我最欣赏的:Alex https://*.com/a/6002338/166589的第一个答案,我接受了这个答案;第二个是Mark https://*.com/a/6488070/166589,稍晚一些。从可用性的角度来看,Excel似乎缺少良好的用户友好的UTF-8 CSV支持,所以我认为这两个答案都是正确的,我首先接受了Alex的回答,因为它确实说明了Excel不能够透明地做到这一点。这就是我在这里说的。Mark的回答为更高级的用户提供了一种更复杂的方法来实现预期的结果。两个答案都很好,但是Alex的答案更符合我没有明确指定的问题。


UPDATE 2

更新2

Five months later after the last edit, I've noticed that Alex's answer has disappeared for some reason. I really hope it wasn't a technical issue and I hope there is no more discussion on which answer is greater now. So I'm accepting Mark's answer as the best one.

在上次编辑五个月后,我注意到艾利克斯的答案因为某种原因消失了。我真的希望这不是一个技术问题,我希望现在不要再讨论哪个答案更好。所以我接受马克的回答作为最好的答案。

21 个解决方案

#1


311  

Alex is correct, but as you have to export to csv, you can give the users this advice when opening the csv files:

Alex是对的,但是由于你需要导出到csv,所以在打开csv文件时可以给用户以下建议:

  1. Save the exported file as a csv
  2. 将导出的文件保存为csv
  3. Open Excel
  4. 打开Excel
  5. Import the data using Data-->Import External Data --> Import Data
  6. 使用数据导入数据——>导入外部数据——>导入数据
  7. Select the file type of "csv" and browse to your file
  8. 选择“csv”的文件类型并浏览到您的文件
  9. In the import wizard change the File_Origin to "65001 UTF" (or choose correct language character identifier)
  10. 在导入向导中,将File_Origin更改为“65001 UTF”(或选择正确的语言字符标识符)
  11. Change the Delimiter to comma
  12. 将分隔符改为逗号。
  13. Select where to import to and Finish
  14. 选择要导入到何处并完成

This way the special characters should show correctly.

这样特殊的人物就应该正确地显示出来。

#2


108  

The UTF-8 Byte-order marker will clue Excel 2007+ in to the fact that you're using UTF-8. (See this SO post).

UTF-8字节顺序标记将提示您使用UTF-8。(看这帖子)。

In case anybody is having the same issues I was, .NET's UTF8 encoding class does not output a byte-order marker even when its constructor is explicitly told to. You have to use this workaround if you want to really save it with a byte-order marker.

如果任何人都有和我一样的问题,. net的UTF8编码类不会输出字节顺序标记,即使它的构造函数被显式地告知。如果你想用字节顺序标记来保存的话,你必须使用这个变通方法。

#3


53  

The bug with ignored BOM seems to be fixed for Excel 2013. I had same problem with Cyrillic letters, but adding BOM character \uFEFF did help.

BOM被忽略的bug似乎在Excel 2013中得到了修复。我对西里尔字母也有同样的问题,但是添加BOM字符\uFEFF确实有帮助。

#4


28  

We have used this workaround:

我们使用了这个方法:

  1. Convert CSV to UTF-16
  2. CSV转换为utf - 16
  3. Insert BOM at beginning of file
  4. 在文件开头插入BOM
  5. Use tab as field separator
  6. 使用选项卡作为字段分隔符

#5


23  

It is incredible that there are so many answers but none answers the question:

令人难以置信的是,有那么多的答案,但没有一个答案是:

"When I was asking this question, I asked for a way of opening a UTF-8 CSV file in Excel without any problems for a user,..."

“当我问这个问题时,我问了一个在Excel中打开UTF-8 CSV文件的方法,对用户来说没有任何问题……”

The answer marked as the accepted answer with 200+ up-votes is useless for me because I don't want to give my users a manual how to configure Excel. Apart from that: this manual will apply to one Excel version but other Excel versions have different menus and configuration dialogs. You would need a manual for each Excel version.

被标记为200+向上投票的公认答案对我来说毫无用处,因为我不想给我的用户一个如何配置Excel的手册。除此之外:本手册将适用于一个Excel版本,但其他Excel版本有不同的菜单和配置对话框。每个Excel版本都需要一本手册。

So the question is how to make Excel show UTF8 data with a simple double click?

所以问题是如何用简单的双击使Excel显示UTF8数据?

Well at least in Excel 2007 this is not possible if you use CSV files because the UTF8 BOM is ignored and you will see only garbage. This is already part of the question of Lyubomyr Shaydariv:

至少在Excel 2007中,如果你使用CSV文件,这是不可能的,因为UTF8 BOM被忽略了,你只会看到垃圾。Lyubomyr Shaydariv:

"I also tried specifying UTF-8 BOM EF BB BF, but Excel ignores that."

“我也尝试过指定UTF-8 BOM EF BB BF,但Excel忽略了这一点。”

I make the same experience: Writing russian or greek data into a UTF8 CSV file with BOM results in garbage in Excel:

我也有同样的经历:用BOM将俄文或希腊文数据写入UTF8 CSV文件会导致Excel中的垃圾:

Content of UTF8 CSV file:

UTF8 CSV文件内容:

Colum1;Column2
Val1;Val2
Авиабилет;Tλληνικ

Result in Excel 2007:

结果在Excel 2007:

是否可以强制Excel自动识别UTF-8 CSV文件?

A solution is to not use CSV at all. This format is implemented so stupidly by Microsoft that it depends on the region settings in control panel if comma or semicolon is used as separator. So the same CSV file may open correctly on one computer but on anther computer not. "CSV" means "Comma Separated Values" but for example on a german Windows by default semicolon must be used as separator while comma does not work. (Here it should be named SSV = Semicolon Separated Values) CSV files cannot be interchanged between different language versions of Windows. This is an additional Problem to the UTF-8 problem.

解决方案是完全不使用CSV。这种格式是由Microsoft愚蠢地实现的,如果使用逗号或分号作为分隔符,则取决于控制面板中的区域设置。所以相同的CSV文件可以在一台计算机上正确打开,但在另一台计算机上不能。“CSV”的意思是“逗号分隔值”,但是在德国的Windows上,默认的分号必须用作分隔符,而逗号不能工作。(这里应该命名为SSV =分号分隔值)CSV文件不能在Windows的不同语言版本之间交换。这是UTF-8问题的另一个问题。

Excel exists since decades. It is a shame that Microsoft was not able to implement such a basic thing as CSV import in all these years.

几十年以来Excel的存在。遗憾的是,微软在这些年里没有实现CSV导入这样的基本功能。

If you put the same values into a HTML file and save that file as UTF8 file with BOM with the file extension XLS you will get the correct result.

如果您将相同的值放入HTML文件中,并将该文件保存为带有扩展名为XLS的BOM的UTF8文件,您将得到正确的结果。

Content of UTF8 XLS file:

UTF8 XLS文件内容:

<table>
<tr><td>Colum1</td><td>Column2</td></tr>
<tr><td>Val1</td><td>Val2</td></tr>
<tr><td>Авиабилет</td><td>Tλληνικ</td></tr>
</table>

Result in Excel 2007:

结果在Excel 2007:

是否可以强制Excel自动识别UTF-8 CSV文件?

You can even use colors in HTML which Excel will show correctly.

您甚至可以在HTML中使用Excel将正确显示的颜色。

<style>
.Head { background-color:gray; color:white; }
.Red  { color:red; }
</style>
<table border=1>
<tr><td class=Head>Colum1</td><td class=Head>Column2</td></tr>
<tr><td>Val1</td><td>Val2</td></tr>
<tr><td class=Red>Авиабилет</td><td class=Red>Tλληνικ</td></tr>
</table>

Result in Excel 2007:

结果在Excel 2007:

是否可以强制Excel自动识别UTF-8 CSV文件?

In this case only the table itself has a black border and lines. If you want ALL cells to display gridlines this is also possible in HTML:

在这种情况下,只有表本身具有黑色边框和线。如果您希望所有单元格显示网格线,这在HTML中也是可能的:

<html xmlns:x="urn:schemas-microsoft-com:office:excel">
    <head>
        <meta http-equiv="content-type" content="text/plain; charset=UTF-8"/>
        <xml>
            <x:ExcelWorkbook>
                <x:ExcelWorksheets>
                    <x:ExcelWorksheet>
                        <x:Name>MySuperSheet</x:Name>
                        <x:WorksheetOptions>
                            <x:DisplayGridlines/>
                        </x:WorksheetOptions>
                    </x:ExcelWorksheet>
                </x:ExcelWorksheets>
            </x:ExcelWorkbook>
        </xml>
    </head>
    <body>
        <table>
            <tr><td>Colum1</td><td>Column2</td></tr>
            <tr><td>Val1</td><td>Val2</td></tr>
            <tr><td>Авиабилет</td><td>Tλληνικ</td></tr>
        </table>
    </body>
</html>

This code even allows to specify the name of the worksheet (here "MySuperSheet")

这段代码甚至允许指定工作表的名称(这里是“MySuperSheet”)

Result in Excel 2007:

结果在Excel 2007:

是否可以强制Excel自动识别UTF-8 CSV文件?

#6


16  

Had the same problems with PHP-generated CSV files. Excel ignored the BOM when the Separator was defined via "sep=,\n" at the beginning of the content (but of course after the BOM).

php生成的CSV文件也有同样的问题。当分隔符在内容开头通过“sep=,\n”定义时,Excel忽略了BOM(当然是在BOM之后)。

So adding a BOM ("\xEF\xBB\xBF") at the beginning of the content and setting the semicolon as separator via fputcsv($fh, $data_array, ";"); does the trick.

因此,在内容的开头添加一个BOM(“\xEF\xBB\xBF”),并通过fputcsv将分号设置为分隔符($fh, $data_array,”;就可以了。

#7


9  

I have had the same issue in the past (how to produce files that Excel can read, and other tools can also read). I was using TSV rather than CSV, but the same problem with encodings came up.

我过去也遇到过同样的问题(如何生成Excel可以读取的文件,以及其他工具也可以读取的文件)。我使用的是TSV而不是CSV,但是编码也有同样的问题。

I failed to find any way to get Excel to recognize UTF-8 automatically, and I was not willing/able to inflict on the consumers of the files complicated instructions how to open them. So I encoded them as UTF-16le (with a BOM) instead of UTF-8. Twice the size, but Excel can recognize the encoding. And they compress well, so the size rarely (but sadly not never) matters.

我没有找到任何方法可以让Excel自动识别UTF-8,而且我也不愿意/能够给文件的使用者带来复杂的如何打开它们的指令。所以我把它们编码为UTF-16le(带有BOM)而不是UTF-8。两倍的大小,但是Excel可以识别编码。而且它们压缩得很好,所以尺寸很少(但遗憾的是从来没有)重要。

#8


9  

Old question but heck, the simplest solution is:

老问题,但是,最简单的解决方法是:

  1. Open CSV in Notepad
  2. 在记事本打开CSV
  3. Save As -> select the right encoding
  4. 保存为->选择正确的编码。
  5. Open the new file
  6. 打开新文件

#9


3  

This is an old question but comes up in the search at top. I found after a lot of efforts that adding BOM characters at the beginning of csv file helps.

这是一个古老的问题,但在搜索的顶部。我发现在csv文件的开头添加BOM字符是有帮助的。

I have briefed it here: https://sites.google.com/site/ritechtips/home/the-multi-line-fields-csv-file-and-excel-import---ha

我在这里简要地介绍了一下:https://sites.google.com/site/ritechtips/home/the multi-line-fields-csv-file- excel-import---ha

#10


3  

As I posted on http://thinkinginsoftware.blogspot.com/2017/12/correctly-generate-csv-that-excel-can.html:

正如我在http://thinkinginsoftware.blogspot.com/2017/12/correct -generate-csv-that- excelcan.html上所言:

Tell the software developer in charge of generating the CSV to correct it. As a quick workaround you can use gsed to insert the UTF-8 BOM at the beginning of the string:

告诉负责生成CSV的软件开发人员纠正它。作为一个快速的解决方案,您可以使用gsed在字符串的开头插入UTF-8 BOM:

gsed -i '1s/^\(\xef\xbb\xbf\)\?/\xef\xbb\xbf/' file.csv

This command inserts the UTF-4 BOM if not present. Therefore it is an idempotent command. Now you should be able to double click the file and open it in Excel.

如果不存在,此命令将插入UTF-4 BOM。因此这是一个幂等的命令。现在您应该可以双击该文件并在Excel中打开它。

#11


2  

Simple vba macro for opening utf-8 text and csv files

简单的vba宏,用于打开utf-8文本和csv文件

Sub OpenTextFile()

   filetoopen = Application.GetOpenFilename("Text Files (*.txt;*.csv), *.txt;*.csv")
   If filetoopen = Null Or filetoopen = Empty Then Exit Sub

   Workbooks.OpenText Filename:=filetoopen, _
   Origin:=65001, DataType:=xlDelimited, Comma:=True

End Sub

Origin:=65001 is UTF-8. Comma:True for .csv files distributed in colums

产地:= 65001是utf - 8。逗号:用于在列中分布的.csv文件。

Save it in Personal.xlsb to have it always available. Personalise excel toolbar adding a macro call button and open files from there. You can add more formating to the macro, like column autofit , alignment,etc.

将其保存在个人。xlsb总是可用。个性化excel工具栏,添加宏调用按钮和打开文件。您可以向宏添加更多的格式化,如列自动拟合、对齐等。

#12


2  

Just for help users interested on opening the file on Excel that achieve this thread like me.

只是为了帮助有兴趣在Excel上打开文件的用户像我一样实现这个线程。

I have used the wizard below and it worked fine for me, importing an UTF-8 file. Not transparent, but useful if you already have the file.

我使用了下面的向导,它在导入UTF-8文件时运行良好。不是透明的,但是如果您已经有了这个文件,那么它是有用的。

  1. Open Microsoft Excel 2007.
  2. Microsoft Excel 2007开放。
  3. Click on the Data menu bar option.
  4. 点击数据菜单栏选项。
  5. Click on the From Text icon.
  6. 单击“从文本”图标。
  7. Navigate to the location of the file that you want to import. Click on the filename and then click on the Import button. The Text Import Wizard - Step 1 or 3 window will now appear on the screen.
  8. 导航到要导入的文件的位置。单击文件名,然后单击Import按钮。文本导入向导——步骤1或3窗口现在将出现在屏幕上。
  9. Choose the file type that best describes your data - Delimited or Fixed Width.
  10. 选择最能描述数据分隔或固定宽度的文件类型。
  11. Choose 65001: Unicode (UTF-8) from the drop-down list that appears next to File origin.
  12. 从出现在文件原点旁边的下拉列表中选择65001:Unicode (UTF-8)。
  13. Click on the Next button to display the Text Import Wizard - Step 2 or 3 window.
  14. 单击Next按钮,显示文本导入向导——步骤2或3窗口。
  15. Place a checkmark next to the delimiter that was used in the file you wish to import into Microsoft Excel 2007. The Data preview window will show you how your data will appear based on the delimiter that you chose.
  16. 在您希望导入到Microsoft Excel 2007的文件中使用的分隔符旁边放置一个复选标记。数据预览窗口将根据所选择的分隔符显示数据的显示方式。
  17. Click on the Next button to display the Text Import Wizard - Step 3 of 3.
  18. 单击Next按钮以显示文本导入向导-步骤3(3)。
  19. Choose the appropriate data format for each column of data that you want to import. You also have the option to not import one or more columns of data if you want.
  20. 为要导入的每一列数据选择适当的数据格式。如果需要,您还可以选择不导入一个或多个数据列。
  21. Click on the Finish button to finish importing your data into Microsoft Excel 2007.
  22. 单击Finish按钮,将数据导入到Microsoft Excel 2007。

Source: https://www.itg.ias.edu/content/how-import-csv-file-uses-utf-8-character-encoding-0

来源:https://www.itg.ias.edu/content/how-import-csv-file-uses-utf-8-character-encoding-0

#13


2  

A truly amazing list of answers, but since one pretty good one is still missing, I'll mention it here: open the csv file with google sheets and save it back to your local computer as an excel file.

这是一个非常棒的答案列表,但是由于仍然缺少一个非常好的答案,我将在这里提到它:使用谷歌表打开csv文件并将其保存到本地计算机作为excel文件。

In contrast to Microsoft, Google has managed to support UTF-8 csv files so it just works to open the file there. And the export to excel format also just works. So even though this may not be the preferred solution for all, it is pretty fail safe and the number of clicks is not as high as it may sound, especially when you're already logged into google anyway.

与微软不同的是,谷歌支持UTF-8 csv文件,所以在那里打开文件就可以了。导出到excel格式也可以。因此,尽管这可能不是所有人都喜欢的解决方案,但它是相当失败安全的,点击次数也没有听起来那么多,尤其是当您已经登录到谷歌时。

#14


1  

This is my working solution:

这是我的工作解决方案:

vbFILEOPEN = "your_utf8_file.csv"
Workbooks.OpenText Filename:=vbFILEOPEN, DataType:=xlDelimited, Semicolon:=True, Local:=True, Origin:=65001

The key is Origin:=65001

关键是产地:= 65001

#15


1  

Yes it is possible. When writing the stream creating the csv, the first thing to do is this:

是的,这是可能的。当编写创建csv的流时,首先要做的是:

myStream.Write(Encoding.UTF8.GetPreamble(), 0, Encoding.UTF8.GetPreamble().Length)

#16


1  

Yes, this is possible. As previously noted by multiple users, there seems to be a problem with excel reading the correct Byte Order Mark when the file is encoded in UTF-8. With UTF-16 it does not seem to have a problem, so it is endemic to UTF-8. The solution I use for this is adding the BOM, TWICE. For this I execute the following sed command twice:

是的,这是可能的。正如前面提到的,当文件编码为UTF-8时,excel读取正确的字节顺序标记似乎存在问题。对于UTF-16,它似乎没有问题,所以它是UTF-8特有的。我使用的解决方法是两次加入BOM。为此,我执行以下sed命令两次:

sed -I '1s/^/\xef\xbb\xbf/' *.csv

, where the wildcard can be replaced with any file name. However, this leads to a mutation of the sep= at the beginning of the .csv file. The .csv file will then open normally in excel, but with an extra row with "sep=" in the first cell. The "sep=" can also be removed in the source .csv itself, but when opening the file with VBA the delimiter should be specified:

,其中可以用任何文件名替换通配符。但是,这会导致.csv文件开头的sep=发生突变。然后,.csv文件将在excel中正常打开,但是在第一个单元格中有一个带有“sep=”的额外行。“sep=”也可以在源.csv本身中删除,但是在打开带有VBA的文件时,应该指定分隔符:

Workbooks.Open(name, Format:=6, Delimiter:=";", Local:=True)

Format 6 is the .csv format. Set Local to true, in case there are dates in the file. If Local is not set to true the dates will be Americanized, which in some cases will corrupt the .csv format.

格式6是.csv格式。将Local设置为true,以防文件中有日期。如果本地没有设置为true,日期将被美国化,在某些情况下将会损坏.csv格式。

#17


1  

This is not accurately addressing the question but since i stumbled across this and the above solutions didn't work for me or had requirements i couldn't meet, here is another way to add the BOM when you have access to vim:

这并不能准确地解决问题,但是由于我偶然发现了这个问题,并且上面的解决方案对我不起作用,或者有我无法满足的需求,所以当您能够访问vim时,这里有另一种添加BOM的方法:

vim -e -s +"set bomb|set encoding=utf-8|wq" filename.csv

#18


0  

This is an old question but I've just encountered had a similar problem and the solution may help others:

这是一个老问题,但我刚刚遇到了一个类似的问题,解决方案可能会帮助别人:

Had the same issue where writing out CSV text data to a file, then opening the resulting .csv in Excel shifts all the text into a single column. After having a read of the above answers I tried the following, which seems to sort the problem out.

将CSV文本数据写到文件中,然后在Excel中打开结果的. CSV将所有文本转移到一个列中。读了上面的答案之后,我尝试了下面的方法,这似乎可以解决问题。

Apply an encoding of UTF-8 when you create your StreamWriter. That's it.

创建StreamWriter时应用UTF-8编码。就是这样。

Example:

例子:

using (StreamWriter output = new StreamWriter(outputFileName, false, Encoding.UTF8, 2 << 22)) {
   /* ... do stuff .... */
   output.Close();
}

#19


0  

If you want to make it fully automatic, one click, or to load automatically into Excel from say a web page, but can't generate proper Excel files, then I would suggest looking at SYLK format as an alternative. OK it is not as simple as CSV but it is text based and very easy to implement and it supports UTF-8 with no issues.

如果您想让它完全自动化,单击一下,或者从web页面自动加载到Excel中,但是不能生成适当的Excel文件,那么我建议您查看SYLK格式作为替代。它不像CSV那么简单,但它是基于文本的,很容易实现,并且支持UTF-8,没有问题。

I wrote a PHP class that receives the data and outputs a SYLK file which will open directly in Excel by just clicking the file (or will auto-launch Excel if you write the file to a web page with the correct mime type. You can even add formatting (like bold, format numbers in particular ways etc) and change column sizes, or auto size columns to the text in the columns and all in all the code is probably not more than about 100 lines.

我编写了一个PHP类,它接收数据并输出一个SYLK文件,该文件只需单击该文件就可以直接在Excel中打开(如果您将该文件写入具有正确mime类型的web页面,则将自动启动Excel)。您甚至可以添加格式(比如粗体、特定的格式号等)和更改列大小,或者将自动大小列添加到列中的文本中,所有代码可能不超过100行。

It is dead easy to reverse engineer SYLK by creating a simple spreadsheet and saving as SYLK and then reading it with a text editor. The first block are headers and standard number formats that you will recognise (which you just regurgitate in every file you create), then the data is simply an X/Y coordinate and a value.

通过创建一个简单的电子表格并将其保存为SYLK,然后使用文本编辑器读取,很容易对SYLK进行反向工程。第一个块是您将识别的标头和标准数字格式(您只需在创建的每个文件中对其进行反刍),然后数据就是一个X/Y坐标和一个值。

#20


0  

  1. Download & install LibreOffice Calc
  2. 下载并安装LibreOffice Calc
  3. Open the csv file of your choice in LibreOffice Calc
  4. 在LibreOffice Calc中打开您选择的csv文件
  5. Thank the heavens that an import text wizard shows up...
  6. 谢天谢地,导入文本向导出现了……
  7. ...select your delimiter and character encoding options
  8. …选择分隔符和字符编码选项
  9. Select the resulting data in Calc and copy paste to Excel
  10. 在Calc中选择结果数据并将粘贴复制到Excel中

#21


-1  

First save the Excel spreadsheet as Unicode text. Open the TXT file using Internet explorer and click "Save as" TXT Encoding - choose the appropriate encoding, i.e. for Win Cyrillic 1251

首先将Excel电子表格保存为Unicode文本。使用Internet explorer打开TXT文件并单击“Save as”TXT编码——选择合适的编码,即Win Cyrillic 1251

#1


311  

Alex is correct, but as you have to export to csv, you can give the users this advice when opening the csv files:

Alex是对的,但是由于你需要导出到csv,所以在打开csv文件时可以给用户以下建议:

  1. Save the exported file as a csv
  2. 将导出的文件保存为csv
  3. Open Excel
  4. 打开Excel
  5. Import the data using Data-->Import External Data --> Import Data
  6. 使用数据导入数据——>导入外部数据——>导入数据
  7. Select the file type of "csv" and browse to your file
  8. 选择“csv”的文件类型并浏览到您的文件
  9. In the import wizard change the File_Origin to "65001 UTF" (or choose correct language character identifier)
  10. 在导入向导中,将File_Origin更改为“65001 UTF”(或选择正确的语言字符标识符)
  11. Change the Delimiter to comma
  12. 将分隔符改为逗号。
  13. Select where to import to and Finish
  14. 选择要导入到何处并完成

This way the special characters should show correctly.

这样特殊的人物就应该正确地显示出来。

#2


108  

The UTF-8 Byte-order marker will clue Excel 2007+ in to the fact that you're using UTF-8. (See this SO post).

UTF-8字节顺序标记将提示您使用UTF-8。(看这帖子)。

In case anybody is having the same issues I was, .NET's UTF8 encoding class does not output a byte-order marker even when its constructor is explicitly told to. You have to use this workaround if you want to really save it with a byte-order marker.

如果任何人都有和我一样的问题,. net的UTF8编码类不会输出字节顺序标记,即使它的构造函数被显式地告知。如果你想用字节顺序标记来保存的话,你必须使用这个变通方法。

#3


53  

The bug with ignored BOM seems to be fixed for Excel 2013. I had same problem with Cyrillic letters, but adding BOM character \uFEFF did help.

BOM被忽略的bug似乎在Excel 2013中得到了修复。我对西里尔字母也有同样的问题,但是添加BOM字符\uFEFF确实有帮助。

#4


28  

We have used this workaround:

我们使用了这个方法:

  1. Convert CSV to UTF-16
  2. CSV转换为utf - 16
  3. Insert BOM at beginning of file
  4. 在文件开头插入BOM
  5. Use tab as field separator
  6. 使用选项卡作为字段分隔符

#5


23  

It is incredible that there are so many answers but none answers the question:

令人难以置信的是,有那么多的答案,但没有一个答案是:

"When I was asking this question, I asked for a way of opening a UTF-8 CSV file in Excel without any problems for a user,..."

“当我问这个问题时,我问了一个在Excel中打开UTF-8 CSV文件的方法,对用户来说没有任何问题……”

The answer marked as the accepted answer with 200+ up-votes is useless for me because I don't want to give my users a manual how to configure Excel. Apart from that: this manual will apply to one Excel version but other Excel versions have different menus and configuration dialogs. You would need a manual for each Excel version.

被标记为200+向上投票的公认答案对我来说毫无用处,因为我不想给我的用户一个如何配置Excel的手册。除此之外:本手册将适用于一个Excel版本,但其他Excel版本有不同的菜单和配置对话框。每个Excel版本都需要一本手册。

So the question is how to make Excel show UTF8 data with a simple double click?

所以问题是如何用简单的双击使Excel显示UTF8数据?

Well at least in Excel 2007 this is not possible if you use CSV files because the UTF8 BOM is ignored and you will see only garbage. This is already part of the question of Lyubomyr Shaydariv:

至少在Excel 2007中,如果你使用CSV文件,这是不可能的,因为UTF8 BOM被忽略了,你只会看到垃圾。Lyubomyr Shaydariv:

"I also tried specifying UTF-8 BOM EF BB BF, but Excel ignores that."

“我也尝试过指定UTF-8 BOM EF BB BF,但Excel忽略了这一点。”

I make the same experience: Writing russian or greek data into a UTF8 CSV file with BOM results in garbage in Excel:

我也有同样的经历:用BOM将俄文或希腊文数据写入UTF8 CSV文件会导致Excel中的垃圾:

Content of UTF8 CSV file:

UTF8 CSV文件内容:

Colum1;Column2
Val1;Val2
Авиабилет;Tλληνικ

Result in Excel 2007:

结果在Excel 2007:

是否可以强制Excel自动识别UTF-8 CSV文件?

A solution is to not use CSV at all. This format is implemented so stupidly by Microsoft that it depends on the region settings in control panel if comma or semicolon is used as separator. So the same CSV file may open correctly on one computer but on anther computer not. "CSV" means "Comma Separated Values" but for example on a german Windows by default semicolon must be used as separator while comma does not work. (Here it should be named SSV = Semicolon Separated Values) CSV files cannot be interchanged between different language versions of Windows. This is an additional Problem to the UTF-8 problem.

解决方案是完全不使用CSV。这种格式是由Microsoft愚蠢地实现的,如果使用逗号或分号作为分隔符,则取决于控制面板中的区域设置。所以相同的CSV文件可以在一台计算机上正确打开,但在另一台计算机上不能。“CSV”的意思是“逗号分隔值”,但是在德国的Windows上,默认的分号必须用作分隔符,而逗号不能工作。(这里应该命名为SSV =分号分隔值)CSV文件不能在Windows的不同语言版本之间交换。这是UTF-8问题的另一个问题。

Excel exists since decades. It is a shame that Microsoft was not able to implement such a basic thing as CSV import in all these years.

几十年以来Excel的存在。遗憾的是,微软在这些年里没有实现CSV导入这样的基本功能。

If you put the same values into a HTML file and save that file as UTF8 file with BOM with the file extension XLS you will get the correct result.

如果您将相同的值放入HTML文件中,并将该文件保存为带有扩展名为XLS的BOM的UTF8文件,您将得到正确的结果。

Content of UTF8 XLS file:

UTF8 XLS文件内容:

<table>
<tr><td>Colum1</td><td>Column2</td></tr>
<tr><td>Val1</td><td>Val2</td></tr>
<tr><td>Авиабилет</td><td>Tλληνικ</td></tr>
</table>

Result in Excel 2007:

结果在Excel 2007:

是否可以强制Excel自动识别UTF-8 CSV文件?

You can even use colors in HTML which Excel will show correctly.

您甚至可以在HTML中使用Excel将正确显示的颜色。

<style>
.Head { background-color:gray; color:white; }
.Red  { color:red; }
</style>
<table border=1>
<tr><td class=Head>Colum1</td><td class=Head>Column2</td></tr>
<tr><td>Val1</td><td>Val2</td></tr>
<tr><td class=Red>Авиабилет</td><td class=Red>Tλληνικ</td></tr>
</table>

Result in Excel 2007:

结果在Excel 2007:

是否可以强制Excel自动识别UTF-8 CSV文件?

In this case only the table itself has a black border and lines. If you want ALL cells to display gridlines this is also possible in HTML:

在这种情况下,只有表本身具有黑色边框和线。如果您希望所有单元格显示网格线,这在HTML中也是可能的:

<html xmlns:x="urn:schemas-microsoft-com:office:excel">
    <head>
        <meta http-equiv="content-type" content="text/plain; charset=UTF-8"/>
        <xml>
            <x:ExcelWorkbook>
                <x:ExcelWorksheets>
                    <x:ExcelWorksheet>
                        <x:Name>MySuperSheet</x:Name>
                        <x:WorksheetOptions>
                            <x:DisplayGridlines/>
                        </x:WorksheetOptions>
                    </x:ExcelWorksheet>
                </x:ExcelWorksheets>
            </x:ExcelWorkbook>
        </xml>
    </head>
    <body>
        <table>
            <tr><td>Colum1</td><td>Column2</td></tr>
            <tr><td>Val1</td><td>Val2</td></tr>
            <tr><td>Авиабилет</td><td>Tλληνικ</td></tr>
        </table>
    </body>
</html>

This code even allows to specify the name of the worksheet (here "MySuperSheet")

这段代码甚至允许指定工作表的名称(这里是“MySuperSheet”)

Result in Excel 2007:

结果在Excel 2007:

是否可以强制Excel自动识别UTF-8 CSV文件?

#6


16  

Had the same problems with PHP-generated CSV files. Excel ignored the BOM when the Separator was defined via "sep=,\n" at the beginning of the content (but of course after the BOM).

php生成的CSV文件也有同样的问题。当分隔符在内容开头通过“sep=,\n”定义时,Excel忽略了BOM(当然是在BOM之后)。

So adding a BOM ("\xEF\xBB\xBF") at the beginning of the content and setting the semicolon as separator via fputcsv($fh, $data_array, ";"); does the trick.

因此,在内容的开头添加一个BOM(“\xEF\xBB\xBF”),并通过fputcsv将分号设置为分隔符($fh, $data_array,”;就可以了。

#7


9  

I have had the same issue in the past (how to produce files that Excel can read, and other tools can also read). I was using TSV rather than CSV, but the same problem with encodings came up.

我过去也遇到过同样的问题(如何生成Excel可以读取的文件,以及其他工具也可以读取的文件)。我使用的是TSV而不是CSV,但是编码也有同样的问题。

I failed to find any way to get Excel to recognize UTF-8 automatically, and I was not willing/able to inflict on the consumers of the files complicated instructions how to open them. So I encoded them as UTF-16le (with a BOM) instead of UTF-8. Twice the size, but Excel can recognize the encoding. And they compress well, so the size rarely (but sadly not never) matters.

我没有找到任何方法可以让Excel自动识别UTF-8,而且我也不愿意/能够给文件的使用者带来复杂的如何打开它们的指令。所以我把它们编码为UTF-16le(带有BOM)而不是UTF-8。两倍的大小,但是Excel可以识别编码。而且它们压缩得很好,所以尺寸很少(但遗憾的是从来没有)重要。

#8


9  

Old question but heck, the simplest solution is:

老问题,但是,最简单的解决方法是:

  1. Open CSV in Notepad
  2. 在记事本打开CSV
  3. Save As -> select the right encoding
  4. 保存为->选择正确的编码。
  5. Open the new file
  6. 打开新文件

#9


3  

This is an old question but comes up in the search at top. I found after a lot of efforts that adding BOM characters at the beginning of csv file helps.

这是一个古老的问题,但在搜索的顶部。我发现在csv文件的开头添加BOM字符是有帮助的。

I have briefed it here: https://sites.google.com/site/ritechtips/home/the-multi-line-fields-csv-file-and-excel-import---ha

我在这里简要地介绍了一下:https://sites.google.com/site/ritechtips/home/the multi-line-fields-csv-file- excel-import---ha

#10


3  

As I posted on http://thinkinginsoftware.blogspot.com/2017/12/correctly-generate-csv-that-excel-can.html:

正如我在http://thinkinginsoftware.blogspot.com/2017/12/correct -generate-csv-that- excelcan.html上所言:

Tell the software developer in charge of generating the CSV to correct it. As a quick workaround you can use gsed to insert the UTF-8 BOM at the beginning of the string:

告诉负责生成CSV的软件开发人员纠正它。作为一个快速的解决方案,您可以使用gsed在字符串的开头插入UTF-8 BOM:

gsed -i '1s/^\(\xef\xbb\xbf\)\?/\xef\xbb\xbf/' file.csv

This command inserts the UTF-4 BOM if not present. Therefore it is an idempotent command. Now you should be able to double click the file and open it in Excel.

如果不存在,此命令将插入UTF-4 BOM。因此这是一个幂等的命令。现在您应该可以双击该文件并在Excel中打开它。

#11


2  

Simple vba macro for opening utf-8 text and csv files

简单的vba宏,用于打开utf-8文本和csv文件

Sub OpenTextFile()

   filetoopen = Application.GetOpenFilename("Text Files (*.txt;*.csv), *.txt;*.csv")
   If filetoopen = Null Or filetoopen = Empty Then Exit Sub

   Workbooks.OpenText Filename:=filetoopen, _
   Origin:=65001, DataType:=xlDelimited, Comma:=True

End Sub

Origin:=65001 is UTF-8. Comma:True for .csv files distributed in colums

产地:= 65001是utf - 8。逗号:用于在列中分布的.csv文件。

Save it in Personal.xlsb to have it always available. Personalise excel toolbar adding a macro call button and open files from there. You can add more formating to the macro, like column autofit , alignment,etc.

将其保存在个人。xlsb总是可用。个性化excel工具栏,添加宏调用按钮和打开文件。您可以向宏添加更多的格式化,如列自动拟合、对齐等。

#12


2  

Just for help users interested on opening the file on Excel that achieve this thread like me.

只是为了帮助有兴趣在Excel上打开文件的用户像我一样实现这个线程。

I have used the wizard below and it worked fine for me, importing an UTF-8 file. Not transparent, but useful if you already have the file.

我使用了下面的向导,它在导入UTF-8文件时运行良好。不是透明的,但是如果您已经有了这个文件,那么它是有用的。

  1. Open Microsoft Excel 2007.
  2. Microsoft Excel 2007开放。
  3. Click on the Data menu bar option.
  4. 点击数据菜单栏选项。
  5. Click on the From Text icon.
  6. 单击“从文本”图标。
  7. Navigate to the location of the file that you want to import. Click on the filename and then click on the Import button. The Text Import Wizard - Step 1 or 3 window will now appear on the screen.
  8. 导航到要导入的文件的位置。单击文件名,然后单击Import按钮。文本导入向导——步骤1或3窗口现在将出现在屏幕上。
  9. Choose the file type that best describes your data - Delimited or Fixed Width.
  10. 选择最能描述数据分隔或固定宽度的文件类型。
  11. Choose 65001: Unicode (UTF-8) from the drop-down list that appears next to File origin.
  12. 从出现在文件原点旁边的下拉列表中选择65001:Unicode (UTF-8)。
  13. Click on the Next button to display the Text Import Wizard - Step 2 or 3 window.
  14. 单击Next按钮,显示文本导入向导——步骤2或3窗口。
  15. Place a checkmark next to the delimiter that was used in the file you wish to import into Microsoft Excel 2007. The Data preview window will show you how your data will appear based on the delimiter that you chose.
  16. 在您希望导入到Microsoft Excel 2007的文件中使用的分隔符旁边放置一个复选标记。数据预览窗口将根据所选择的分隔符显示数据的显示方式。
  17. Click on the Next button to display the Text Import Wizard - Step 3 of 3.
  18. 单击Next按钮以显示文本导入向导-步骤3(3)。
  19. Choose the appropriate data format for each column of data that you want to import. You also have the option to not import one or more columns of data if you want.
  20. 为要导入的每一列数据选择适当的数据格式。如果需要,您还可以选择不导入一个或多个数据列。
  21. Click on the Finish button to finish importing your data into Microsoft Excel 2007.
  22. 单击Finish按钮,将数据导入到Microsoft Excel 2007。

Source: https://www.itg.ias.edu/content/how-import-csv-file-uses-utf-8-character-encoding-0

来源:https://www.itg.ias.edu/content/how-import-csv-file-uses-utf-8-character-encoding-0

#13


2  

A truly amazing list of answers, but since one pretty good one is still missing, I'll mention it here: open the csv file with google sheets and save it back to your local computer as an excel file.

这是一个非常棒的答案列表,但是由于仍然缺少一个非常好的答案,我将在这里提到它:使用谷歌表打开csv文件并将其保存到本地计算机作为excel文件。

In contrast to Microsoft, Google has managed to support UTF-8 csv files so it just works to open the file there. And the export to excel format also just works. So even though this may not be the preferred solution for all, it is pretty fail safe and the number of clicks is not as high as it may sound, especially when you're already logged into google anyway.

与微软不同的是,谷歌支持UTF-8 csv文件,所以在那里打开文件就可以了。导出到excel格式也可以。因此,尽管这可能不是所有人都喜欢的解决方案,但它是相当失败安全的,点击次数也没有听起来那么多,尤其是当您已经登录到谷歌时。

#14


1  

This is my working solution:

这是我的工作解决方案:

vbFILEOPEN = "your_utf8_file.csv"
Workbooks.OpenText Filename:=vbFILEOPEN, DataType:=xlDelimited, Semicolon:=True, Local:=True, Origin:=65001

The key is Origin:=65001

关键是产地:= 65001

#15


1  

Yes it is possible. When writing the stream creating the csv, the first thing to do is this:

是的,这是可能的。当编写创建csv的流时,首先要做的是:

myStream.Write(Encoding.UTF8.GetPreamble(), 0, Encoding.UTF8.GetPreamble().Length)

#16


1  

Yes, this is possible. As previously noted by multiple users, there seems to be a problem with excel reading the correct Byte Order Mark when the file is encoded in UTF-8. With UTF-16 it does not seem to have a problem, so it is endemic to UTF-8. The solution I use for this is adding the BOM, TWICE. For this I execute the following sed command twice:

是的,这是可能的。正如前面提到的,当文件编码为UTF-8时,excel读取正确的字节顺序标记似乎存在问题。对于UTF-16,它似乎没有问题,所以它是UTF-8特有的。我使用的解决方法是两次加入BOM。为此,我执行以下sed命令两次:

sed -I '1s/^/\xef\xbb\xbf/' *.csv

, where the wildcard can be replaced with any file name. However, this leads to a mutation of the sep= at the beginning of the .csv file. The .csv file will then open normally in excel, but with an extra row with "sep=" in the first cell. The "sep=" can also be removed in the source .csv itself, but when opening the file with VBA the delimiter should be specified:

,其中可以用任何文件名替换通配符。但是,这会导致.csv文件开头的sep=发生突变。然后,.csv文件将在excel中正常打开,但是在第一个单元格中有一个带有“sep=”的额外行。“sep=”也可以在源.csv本身中删除,但是在打开带有VBA的文件时,应该指定分隔符:

Workbooks.Open(name, Format:=6, Delimiter:=";", Local:=True)

Format 6 is the .csv format. Set Local to true, in case there are dates in the file. If Local is not set to true the dates will be Americanized, which in some cases will corrupt the .csv format.

格式6是.csv格式。将Local设置为true,以防文件中有日期。如果本地没有设置为true,日期将被美国化,在某些情况下将会损坏.csv格式。

#17


1  

This is not accurately addressing the question but since i stumbled across this and the above solutions didn't work for me or had requirements i couldn't meet, here is another way to add the BOM when you have access to vim:

这并不能准确地解决问题,但是由于我偶然发现了这个问题,并且上面的解决方案对我不起作用,或者有我无法满足的需求,所以当您能够访问vim时,这里有另一种添加BOM的方法:

vim -e -s +"set bomb|set encoding=utf-8|wq" filename.csv

#18


0  

This is an old question but I've just encountered had a similar problem and the solution may help others:

这是一个老问题,但我刚刚遇到了一个类似的问题,解决方案可能会帮助别人:

Had the same issue where writing out CSV text data to a file, then opening the resulting .csv in Excel shifts all the text into a single column. After having a read of the above answers I tried the following, which seems to sort the problem out.

将CSV文本数据写到文件中,然后在Excel中打开结果的. CSV将所有文本转移到一个列中。读了上面的答案之后,我尝试了下面的方法,这似乎可以解决问题。

Apply an encoding of UTF-8 when you create your StreamWriter. That's it.

创建StreamWriter时应用UTF-8编码。就是这样。

Example:

例子:

using (StreamWriter output = new StreamWriter(outputFileName, false, Encoding.UTF8, 2 << 22)) {
   /* ... do stuff .... */
   output.Close();
}

#19


0  

If you want to make it fully automatic, one click, or to load automatically into Excel from say a web page, but can't generate proper Excel files, then I would suggest looking at SYLK format as an alternative. OK it is not as simple as CSV but it is text based and very easy to implement and it supports UTF-8 with no issues.

如果您想让它完全自动化,单击一下,或者从web页面自动加载到Excel中,但是不能生成适当的Excel文件,那么我建议您查看SYLK格式作为替代。它不像CSV那么简单,但它是基于文本的,很容易实现,并且支持UTF-8,没有问题。

I wrote a PHP class that receives the data and outputs a SYLK file which will open directly in Excel by just clicking the file (or will auto-launch Excel if you write the file to a web page with the correct mime type. You can even add formatting (like bold, format numbers in particular ways etc) and change column sizes, or auto size columns to the text in the columns and all in all the code is probably not more than about 100 lines.

我编写了一个PHP类,它接收数据并输出一个SYLK文件,该文件只需单击该文件就可以直接在Excel中打开(如果您将该文件写入具有正确mime类型的web页面,则将自动启动Excel)。您甚至可以添加格式(比如粗体、特定的格式号等)和更改列大小,或者将自动大小列添加到列中的文本中,所有代码可能不超过100行。

It is dead easy to reverse engineer SYLK by creating a simple spreadsheet and saving as SYLK and then reading it with a text editor. The first block are headers and standard number formats that you will recognise (which you just regurgitate in every file you create), then the data is simply an X/Y coordinate and a value.

通过创建一个简单的电子表格并将其保存为SYLK,然后使用文本编辑器读取,很容易对SYLK进行反向工程。第一个块是您将识别的标头和标准数字格式(您只需在创建的每个文件中对其进行反刍),然后数据就是一个X/Y坐标和一个值。

#20


0  

  1. Download & install LibreOffice Calc
  2. 下载并安装LibreOffice Calc
  3. Open the csv file of your choice in LibreOffice Calc
  4. 在LibreOffice Calc中打开您选择的csv文件
  5. Thank the heavens that an import text wizard shows up...
  6. 谢天谢地,导入文本向导出现了……
  7. ...select your delimiter and character encoding options
  8. …选择分隔符和字符编码选项
  9. Select the resulting data in Calc and copy paste to Excel
  10. 在Calc中选择结果数据并将粘贴复制到Excel中

#21


-1  

First save the Excel spreadsheet as Unicode text. Open the TXT file using Internet explorer and click "Save as" TXT Encoding - choose the appropriate encoding, i.e. for Win Cyrillic 1251

首先将Excel电子表格保存为Unicode文本。使用Internet explorer打开TXT文件并单击“Save as”TXT编码——选择合适的编码,即Win Cyrillic 1251