listfiles()使用JDK 6 (unicode标准化问题)来管理unicode名称

时间:2021-03-30 20:23:22

I'm struggling with a strange file name encoding issue when listing directory contents in Java 6 on both OS X and Linux: the File.listFiles() and related methods seem to return file names in a different encoding than the rest of the system.

在Java 6中,在OS X和Linux上列出Java 6中的目录内容时,我正在纠结于一个奇怪的文件名称编码问题:文件. listfiles()和相关的方法似乎以不同的编码方式返回文件名,而不是系统的其余部分。

Note that it is not merely the display of these file names that is causing me problems. I'm mainly interested in doing a comparison of file names with a remote file storage system, so I care more about the content of the name strings than the character encoding used to print output.

请注意,不仅仅是这些文件名的显示导致了我的问题。我主要关心的是将文件名与远程文件存储系统进行比较,所以我更关心名称字符串的内容,而不是用于打印输出的字符编码。

Here is a program to demonstrate. It creates a file with a Unicode name then prints out URL-encoded versions of the file names obtained from the directly-created File, and the same file when listed under a parent directory (you should run this code in an empty directory). The results show the different encoding returned by the File.listFiles() method.

这是一个演示的程序。它使用Unicode名称创建一个文件,然后打印从直接创建的文件中获得的文件名的url编码版本,以及在父目录下列出的相同文件(您应该在空目录中运行此代码)。结果显示File.listFiles()方法返回的不同编码。

String fileName = "Trîcky Nåme";
File file = new File(fileName);
file.createNewFile();
System.out.println("File name: " + URLEncoder.encode(file.getName(), "UTF-8"));

// Get parent (current) dir and list file contents
File parentDir = file.getAbsoluteFile().getParentFile();
File[] children = parentDir.listFiles();
for (File child: children) {
    System.out.println("Listed name: " + URLEncoder.encode(child.getName(), "UTF-8"));
}

Here's what I get when I run this test code on my systems. Note the %CC versus %C3 character representations.

这是我在我的系统上运行这个测试代码时得到的结果。注意%CC与%C3字符表示。

OS X Snow Leopard:

OS X Snow Leopard操作系统:

File name: Tri%CC%82cky+Na%CC%8Ame
Listed name: Tr%C3%AEcky+N%C3%A5me

$ java -version
java version "1.6.0_20"
Java(TM) SE Runtime Environment (build 1.6.0_20-b02-279-10M3065)
Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01-279, mixed mode)

KUbuntu Linux (running in a VM on same OS X system):

KUbuntu Linux(运行在同一个OS X系统上的VM):

File name: Tri%CC%82cky+Na%CC%8Ame
Listed name: Tr%C3%AEcky+N%C3%A5me

$ java -version
java version "1.6.0_18"
OpenJDK Runtime Environment (IcedTea6 1.8.1) (6b18-1.8.1-0ubuntu1)
OpenJDK Client VM (build 16.0-b13, mixed mode, sharing)

I have tried various hacks to get the strings to agree, including setting the file.encoding system property and various LC_CTYPE and LANG environment variables. Nothing helps, nor do I want to resort to such hacks.

我尝试了各种各样的方法来让字符串达成一致,包括设置文件。编码系统属性和各种LC_CTYPE和LANG环境变量。没有任何帮助,我也不想求助于这样的黑客。

Unlike this (somewhat related?) question, I am able to read data from the listed files despite the odd names

与这个(有点关联的)问题不同,我可以从列出的文件中读取数据,尽管名字很奇怪。

6 个解决方案

#1


15  

Using Unicode, there is more than one valid way to represent the same letter. The characters you're using in your Tricky Name are a "latin small letter i with circumflex" and a "latin small letter a with ring above".

使用Unicode,有多种有效的方式来表示同一字母。你在你的复杂名字中使用的字符是一个“拉丁字母i与环曲”和一个“拉丁字母a与上面的戒指”。

You say "Note the %CC versus %C3 character representations", but looking closer what you see are the sequences

您说“注意%CC与%C3字符表示”,但是仔细查看您看到的是序列。

i 0xCC 0x82 vs. 0xC3 0xAE
a 0xCC 0x8A vs. 0xC3 0xA5

That is, the first is letter i followed by 0xCC82 which is the UTF-8 encoding of the Unicode\u0302 "combining circumflex accent" character while the second is UTF-8 for \u00EE "latin small letter i with circumflex". Similarly for the other pair, the first is the letter a followed by 0xCC8A the "combining ring above" character and the second is "latin small letter a with ring above". Both of these are valid UTF-8 encodings of valid Unicode character strings, but one is in "composed" and the other in "decomposed" format.

也就是说,第一个字母是0xCC82,这是UTF-8编码的Unicode\u0302“结合了绕线口音”字符,而第二个是UTF-8,用于\u00EE“拉丁小写字母i与环曲”。对于另一对,第一个字母是字母a,后面是0xCC8A,是“结合在上面”的字符,第二个是“拉丁字母a和上面的戒指”。这两个都是有效的UTF-8编码的有效Unicode字符字符串,但是其中一个是“组合”的,另一个是“分解”格式。

OS X HFS Plus volumes store strings (e.g. filenames) as "fully decomposed". A Unix file-system is really stored according to how the filesystem driver chooses to store it. You can't make any blanket statements across different types of filesystems.

OS X HFS加上卷存储字符串(例如文件名)被“完全分解”。一个Unix文件系统实际上是根据文件系统驱动程序选择存储它的方式存储的。您不能在不同类型的文件系统中进行任何覆盖声明。

See the Wikipedia article on Unicode Equivalence for general discussion of composed vs decomposed forms, which mentions OS X specifically.

请参阅Wikipedia上关于Unicode等价的文章,以便对组合vs分解形式进行一般性讨论,其中特别提到了OS X。

See Apple's Tech Q&A QA1235 (in Objective-C unfortunately) for information on converting forms.

在Objective-C中,请参见苹果公司的技术问答,以了解转换表单的信息。

A recent email thread on Apple's java-dev mailing list could be of some help to you.

最近在苹果的java-dev邮件列表上的一个邮件线程可能会对你有所帮助。

Basically, you need to normalize the decomposed form into a composed form before you can compare the strings.

基本上,在比较字符串之前,需要将分解的表单规范化为一个组合的表单。

#2


2  

Solution extracted from question:

Thanks to Stephen P for putting me on the right track.

感谢Stephen P让我走上了正确的道路。

The fix first, for the impatient. If you are compiling with Java 6 you can use the java.text.Normalizer class to normalize strings into a common form of your choice, e.g.

首先要解决的问题是不耐烦。如果用Java 6编译,可以使用Java .text。Normalizer类将字符串规范化为您选择的常见形式。

// Normalize to "Normalization Form Canonical Decomposition" (NFD)
protected String normalizeUnicode(String str) {
    Normalizer.Form form = Normalizer.Form.NFD;
    if (!Normalizer.isNormalized(str, form)) {
        return Normalizer.normalize(str, form);
    }
    return str;
}

Since java.text.Normalizer is only available in Java 6 and later, if you need to compile with Java 5 you might have to resort to the sun.text.Normalizer implementation and something like this reflection-based hack See also How does this normalize function work?

因为text。Normalizer只能在Java 6中使用,如果需要用Java 5编译的话,您可能不得不求助于sun.text。Normalizer实现和类似这种基于反射的hack也看到了这一正常功能是如何工作的?

This alone is enough for me to decide I won't support compilation of my project with Java 5 :|

仅这一点就足以让我决定我不支持用Java 5:|编写我的项目。

Here are other interesting things I learned in this sordid adventure.

下面是我在这个肮脏的冒险中学到的其他有趣的东西。

  • The confusion is caused by the file names being in one of two normalization forms which cannot be directly compared: Normalization Form Canonical Decomposition (NFD) or Normalization Form Canonical Composition (NFC). The former tends to have ASCII letters followed by "modifiers" to add accents etc, while the latter has only the extended characters with no ACSCII leading character. Read the wiki page Stephen P references for a better explanation.

    这种混淆是由两个标准化表单中的一个文件名称所引起的,不能直接比较:规范化表单规范分解(NFD)或规范化表单规范组合(NFC)。前者倾向于使用ASCII码,其次是“修饰符”,以添加重音等,而后者只有具有不具有ACSCII主导字符的扩展字符。请阅读wiki页面的Stephen P参考,以获得更好的解释。

  • Unicode string literals like the one contained in the example code (and those received via HTTP in my real app) are in the NFD form, while file names returned by the File.listFiles() method are NFC. The following mini-example demonstrates the differences:

    如示例代码中所包含的Unicode字符串(在我的真实应用程序中通过HTTP接收的)是NFD格式的,而File.listFiles()方法返回的文件名则是NFC。下面的小例子说明了不同之处:

    String name = "Trîcky Nåme";
    System.out.println("Original name: " + URLEncoder.encode(name, "UTF-8"));
    System.out.println("NFC Normalized name: " + URLEncoder.encode(
        Normalizer.normalize(name, Normalizer.Form.NFC), "UTF-8"));
    System.out.println("NFD Normalized name: " + URLEncoder.encode(
        Normalizer.normalize(name, Normalizer.Form.NFD), "UTF-8"));
    

    Output:

    输出:

    Original name: Tri%CC%82cky+Na%CC%8Ame
    NFC Normalized name: Tr%C3%AEcky+N%C3%A5me
    NFD Normalized name: Tri%CC%82cky+Na%CC%8Ame
    
  • If you construct a File object with a string name, the File.getName() method will return the name in whatever form you gave it originally. However, if you call File methods that discover names on their own, they seem to return names in NFC form. This is a potentially a nasty gotcha. It certainly gotchme.

    如果您用一个字符串名称构造一个文件对象,那么File.getName()方法将以您最初给出的任何形式返回该名称。但是,如果您调用的文件方法自己发现名称,它们似乎会返回NFC表单中的名称。这是一个潜在的棘手问题。当然gotchme。

  • According to the quote below from Apple's documentation file names are stored in decomposed (NFD) form on the HFS Plus file system:

    根据以下引用,苹果文档文件的名称存储在HFS +文件系统的分解(NFD)格式:

    When working within Mac OS you will find yourself using a mixture of precomposed and decomposed Unicode. For example, HFS Plus converts all file names to decomposed Unicode, while Macintosh keyboards generally produce precomposed Unicode.

    在Mac OS中工作时,您会发现自己使用了混合的预合成和分解的Unicode。例如,HFS +将所有文件名称转换为分解的Unicode,而Macintosh键盘通常会生成预合成的Unicode。

    So the File.listFiles() method helpfully (?) converts file names to the (pre)composed (NFC) form.

    因此,File.listFiles()方法可以帮助(?)将文件名转换为(pre)组合(NFC)形式。

#3


1  

I've seen something similar before. People that uploadde files from their Mac to a webapp used filenames with é.

我以前也见过类似的东西。将文件从Mac上传至webapp的用户使用的是e的文件名。

a) In OS that char is normal e + "sign for ´ applied to the previous char"

操作系统),字符是正常的e +”标志´应用前面的字符”

b) In Windows it's a special char: é

在Windows中它是一个特殊的字符:e。

Both are Unicode. So... I understand you pass the (b) option to File create and at some point Mac OS converts it to the (a) option. Maybe if you find the double representation issue over the internet you can get a way to handle both situations successfully.

两者都是Unicode。所以…我理解您通过(b)选项来创建文件,并在某个点Mac OS将其转换为(a)选项。如果你在互联网上找到双重表示的问题,你就可以找到一种成功处理这两种情况的方法。

Hope it helps!

希望它可以帮助!

#4


0  

On Unix file-system, a file name really is a null-terminated byte[]. So the java runtime has to perform conversion from java.lang.String to byte[] during the createNewFile() operation. The char-to-byte conversion is governed by the locale. I've been testing setting LC_ALL to en_US.UTF-8 and en_US.ISO-8859-1 and got coherent results. This is with Sun (...Oracle) java 1.6.0_20. However, For LC_ALL=en_US.POSIX, the result is:

在Unix文件系统中,文件名实际上是一个空终止字节[]。因此,java运行时必须执行来自java.lang的转换。在createNewFile()操作中,字符串到byte[]。对字节的转换由语言环境管理。我一直在测试将LC_ALL设置为en_US。utf - 8和en_US。ISO-8859-1得到了一致的结果。这是Sun(…Oracle) java 1.6.0_20。然而,对于LC_ALL = en_US。POSIX,结果是:

File name:   Tr%C3%AEcky+N%C3%A5me
Listed name: Tr%3Fcky+N%3Fme

3F is a question mark. It tells me that the conversion was not successful for the non-ASCII character. Then again, everything is as expected.

3F是一个问号。它告诉我,非ascii字符的转换并不成功。然后,一切都如预期的那样。

But the reason why your two strings are different is because of the equivalence between the \u00EE character (or C3 AE in UTF-8) and the sequence i+\u0302 (69 CC 82 in UTF-8). \u0302 is a combining diacritical mark (combining circumflex accent). Some sort of normalization occurred during the file creation. I'm not sure if it's done in the Java run-time or the OS.

但是您的两个字符串不同的原因是由于\u00EE字符(或UTF-8的C3 AE)和序列i+\u0302 (UTF-8的69 CC 82)之间的等价性。u0302是一种组合的区别标记(结合了绕线发音)。在文件创建过程中出现了某种规范化。我不确定它是在Java运行时还是在操作系统中完成的。

NOTE: I took me some time to figure it out since the code snippet that you've posted do not have a combining diacritical mark but the equivalent character î (e.g. \u00ee). You should have embedded the Unicode escape sequence in the string literal (but it's easy to say that afterward...).

注意:我花了一些时间才算出来,因为你发布的代码片段并没有一个组合的区别标记,而是一个相同的字符I(例如,\u00ee)。您应该在字符串文字中嵌入Unicode转义序列(但事后很容易说…)。

#5


0  

I suspect that you just have to instruct javac what encoding to use to compile the .java file containing the special characters with since you've hardcoded it in the source file. Otherwise the platform default encoding will be used, which may not be UTF-8 at all.

我怀疑您只需要指示javac使用什么编码来编译包含特殊字符的.java文件,因为您已经在源文件中硬编码了它。否则将使用平台默认编码,这可能不是UTF-8。

You can use the VM argument -encoding for this.

您可以使用VM参数编码。

javac -encoding UTF-8 com/example/Foo.java

This way the resulting .class file will end up containing the correct characters and you will be able to create and list the correct filename as well.

这样,生成的.class文件将包含正确的字符,您将能够创建和列出正确的文件名。

#6


-2  

An alternative solution is to use the new java.nio.Path api in place of the java.io.File api which works perfectly.

另一种解决方案是使用新的java.nio。路径api代替java.io。文件api非常好用。

#1


15  

Using Unicode, there is more than one valid way to represent the same letter. The characters you're using in your Tricky Name are a "latin small letter i with circumflex" and a "latin small letter a with ring above".

使用Unicode,有多种有效的方式来表示同一字母。你在你的复杂名字中使用的字符是一个“拉丁字母i与环曲”和一个“拉丁字母a与上面的戒指”。

You say "Note the %CC versus %C3 character representations", but looking closer what you see are the sequences

您说“注意%CC与%C3字符表示”,但是仔细查看您看到的是序列。

i 0xCC 0x82 vs. 0xC3 0xAE
a 0xCC 0x8A vs. 0xC3 0xA5

That is, the first is letter i followed by 0xCC82 which is the UTF-8 encoding of the Unicode\u0302 "combining circumflex accent" character while the second is UTF-8 for \u00EE "latin small letter i with circumflex". Similarly for the other pair, the first is the letter a followed by 0xCC8A the "combining ring above" character and the second is "latin small letter a with ring above". Both of these are valid UTF-8 encodings of valid Unicode character strings, but one is in "composed" and the other in "decomposed" format.

也就是说,第一个字母是0xCC82,这是UTF-8编码的Unicode\u0302“结合了绕线口音”字符,而第二个是UTF-8,用于\u00EE“拉丁小写字母i与环曲”。对于另一对,第一个字母是字母a,后面是0xCC8A,是“结合在上面”的字符,第二个是“拉丁字母a和上面的戒指”。这两个都是有效的UTF-8编码的有效Unicode字符字符串,但是其中一个是“组合”的,另一个是“分解”格式。

OS X HFS Plus volumes store strings (e.g. filenames) as "fully decomposed". A Unix file-system is really stored according to how the filesystem driver chooses to store it. You can't make any blanket statements across different types of filesystems.

OS X HFS加上卷存储字符串(例如文件名)被“完全分解”。一个Unix文件系统实际上是根据文件系统驱动程序选择存储它的方式存储的。您不能在不同类型的文件系统中进行任何覆盖声明。

See the Wikipedia article on Unicode Equivalence for general discussion of composed vs decomposed forms, which mentions OS X specifically.

请参阅Wikipedia上关于Unicode等价的文章,以便对组合vs分解形式进行一般性讨论,其中特别提到了OS X。

See Apple's Tech Q&A QA1235 (in Objective-C unfortunately) for information on converting forms.

在Objective-C中,请参见苹果公司的技术问答,以了解转换表单的信息。

A recent email thread on Apple's java-dev mailing list could be of some help to you.

最近在苹果的java-dev邮件列表上的一个邮件线程可能会对你有所帮助。

Basically, you need to normalize the decomposed form into a composed form before you can compare the strings.

基本上,在比较字符串之前,需要将分解的表单规范化为一个组合的表单。

#2


2  

Solution extracted from question:

Thanks to Stephen P for putting me on the right track.

感谢Stephen P让我走上了正确的道路。

The fix first, for the impatient. If you are compiling with Java 6 you can use the java.text.Normalizer class to normalize strings into a common form of your choice, e.g.

首先要解决的问题是不耐烦。如果用Java 6编译,可以使用Java .text。Normalizer类将字符串规范化为您选择的常见形式。

// Normalize to "Normalization Form Canonical Decomposition" (NFD)
protected String normalizeUnicode(String str) {
    Normalizer.Form form = Normalizer.Form.NFD;
    if (!Normalizer.isNormalized(str, form)) {
        return Normalizer.normalize(str, form);
    }
    return str;
}

Since java.text.Normalizer is only available in Java 6 and later, if you need to compile with Java 5 you might have to resort to the sun.text.Normalizer implementation and something like this reflection-based hack See also How does this normalize function work?

因为text。Normalizer只能在Java 6中使用,如果需要用Java 5编译的话,您可能不得不求助于sun.text。Normalizer实现和类似这种基于反射的hack也看到了这一正常功能是如何工作的?

This alone is enough for me to decide I won't support compilation of my project with Java 5 :|

仅这一点就足以让我决定我不支持用Java 5:|编写我的项目。

Here are other interesting things I learned in this sordid adventure.

下面是我在这个肮脏的冒险中学到的其他有趣的东西。

  • The confusion is caused by the file names being in one of two normalization forms which cannot be directly compared: Normalization Form Canonical Decomposition (NFD) or Normalization Form Canonical Composition (NFC). The former tends to have ASCII letters followed by "modifiers" to add accents etc, while the latter has only the extended characters with no ACSCII leading character. Read the wiki page Stephen P references for a better explanation.

    这种混淆是由两个标准化表单中的一个文件名称所引起的,不能直接比较:规范化表单规范分解(NFD)或规范化表单规范组合(NFC)。前者倾向于使用ASCII码,其次是“修饰符”,以添加重音等,而后者只有具有不具有ACSCII主导字符的扩展字符。请阅读wiki页面的Stephen P参考,以获得更好的解释。

  • Unicode string literals like the one contained in the example code (and those received via HTTP in my real app) are in the NFD form, while file names returned by the File.listFiles() method are NFC. The following mini-example demonstrates the differences:

    如示例代码中所包含的Unicode字符串(在我的真实应用程序中通过HTTP接收的)是NFD格式的,而File.listFiles()方法返回的文件名则是NFC。下面的小例子说明了不同之处:

    String name = "Trîcky Nåme";
    System.out.println("Original name: " + URLEncoder.encode(name, "UTF-8"));
    System.out.println("NFC Normalized name: " + URLEncoder.encode(
        Normalizer.normalize(name, Normalizer.Form.NFC), "UTF-8"));
    System.out.println("NFD Normalized name: " + URLEncoder.encode(
        Normalizer.normalize(name, Normalizer.Form.NFD), "UTF-8"));
    

    Output:

    输出:

    Original name: Tri%CC%82cky+Na%CC%8Ame
    NFC Normalized name: Tr%C3%AEcky+N%C3%A5me
    NFD Normalized name: Tri%CC%82cky+Na%CC%8Ame
    
  • If you construct a File object with a string name, the File.getName() method will return the name in whatever form you gave it originally. However, if you call File methods that discover names on their own, they seem to return names in NFC form. This is a potentially a nasty gotcha. It certainly gotchme.

    如果您用一个字符串名称构造一个文件对象,那么File.getName()方法将以您最初给出的任何形式返回该名称。但是,如果您调用的文件方法自己发现名称,它们似乎会返回NFC表单中的名称。这是一个潜在的棘手问题。当然gotchme。

  • According to the quote below from Apple's documentation file names are stored in decomposed (NFD) form on the HFS Plus file system:

    根据以下引用,苹果文档文件的名称存储在HFS +文件系统的分解(NFD)格式:

    When working within Mac OS you will find yourself using a mixture of precomposed and decomposed Unicode. For example, HFS Plus converts all file names to decomposed Unicode, while Macintosh keyboards generally produce precomposed Unicode.

    在Mac OS中工作时,您会发现自己使用了混合的预合成和分解的Unicode。例如,HFS +将所有文件名称转换为分解的Unicode,而Macintosh键盘通常会生成预合成的Unicode。

    So the File.listFiles() method helpfully (?) converts file names to the (pre)composed (NFC) form.

    因此,File.listFiles()方法可以帮助(?)将文件名转换为(pre)组合(NFC)形式。

#3


1  

I've seen something similar before. People that uploadde files from their Mac to a webapp used filenames with é.

我以前也见过类似的东西。将文件从Mac上传至webapp的用户使用的是e的文件名。

a) In OS that char is normal e + "sign for ´ applied to the previous char"

操作系统),字符是正常的e +”标志´应用前面的字符”

b) In Windows it's a special char: é

在Windows中它是一个特殊的字符:e。

Both are Unicode. So... I understand you pass the (b) option to File create and at some point Mac OS converts it to the (a) option. Maybe if you find the double representation issue over the internet you can get a way to handle both situations successfully.

两者都是Unicode。所以…我理解您通过(b)选项来创建文件,并在某个点Mac OS将其转换为(a)选项。如果你在互联网上找到双重表示的问题,你就可以找到一种成功处理这两种情况的方法。

Hope it helps!

希望它可以帮助!

#4


0  

On Unix file-system, a file name really is a null-terminated byte[]. So the java runtime has to perform conversion from java.lang.String to byte[] during the createNewFile() operation. The char-to-byte conversion is governed by the locale. I've been testing setting LC_ALL to en_US.UTF-8 and en_US.ISO-8859-1 and got coherent results. This is with Sun (...Oracle) java 1.6.0_20. However, For LC_ALL=en_US.POSIX, the result is:

在Unix文件系统中,文件名实际上是一个空终止字节[]。因此,java运行时必须执行来自java.lang的转换。在createNewFile()操作中,字符串到byte[]。对字节的转换由语言环境管理。我一直在测试将LC_ALL设置为en_US。utf - 8和en_US。ISO-8859-1得到了一致的结果。这是Sun(…Oracle) java 1.6.0_20。然而,对于LC_ALL = en_US。POSIX,结果是:

File name:   Tr%C3%AEcky+N%C3%A5me
Listed name: Tr%3Fcky+N%3Fme

3F is a question mark. It tells me that the conversion was not successful for the non-ASCII character. Then again, everything is as expected.

3F是一个问号。它告诉我,非ascii字符的转换并不成功。然后,一切都如预期的那样。

But the reason why your two strings are different is because of the equivalence between the \u00EE character (or C3 AE in UTF-8) and the sequence i+\u0302 (69 CC 82 in UTF-8). \u0302 is a combining diacritical mark (combining circumflex accent). Some sort of normalization occurred during the file creation. I'm not sure if it's done in the Java run-time or the OS.

但是您的两个字符串不同的原因是由于\u00EE字符(或UTF-8的C3 AE)和序列i+\u0302 (UTF-8的69 CC 82)之间的等价性。u0302是一种组合的区别标记(结合了绕线发音)。在文件创建过程中出现了某种规范化。我不确定它是在Java运行时还是在操作系统中完成的。

NOTE: I took me some time to figure it out since the code snippet that you've posted do not have a combining diacritical mark but the equivalent character î (e.g. \u00ee). You should have embedded the Unicode escape sequence in the string literal (but it's easy to say that afterward...).

注意:我花了一些时间才算出来,因为你发布的代码片段并没有一个组合的区别标记,而是一个相同的字符I(例如,\u00ee)。您应该在字符串文字中嵌入Unicode转义序列(但事后很容易说…)。

#5


0  

I suspect that you just have to instruct javac what encoding to use to compile the .java file containing the special characters with since you've hardcoded it in the source file. Otherwise the platform default encoding will be used, which may not be UTF-8 at all.

我怀疑您只需要指示javac使用什么编码来编译包含特殊字符的.java文件,因为您已经在源文件中硬编码了它。否则将使用平台默认编码,这可能不是UTF-8。

You can use the VM argument -encoding for this.

您可以使用VM参数编码。

javac -encoding UTF-8 com/example/Foo.java

This way the resulting .class file will end up containing the correct characters and you will be able to create and list the correct filename as well.

这样,生成的.class文件将包含正确的字符,您将能够创建和列出正确的文件名。

#6


-2  

An alternative solution is to use the new java.nio.Path api in place of the java.io.File api which works perfectly.

另一种解决方案是使用新的java.nio。路径api代替java.io。文件api非常好用。