使用Java的toLowerCase()和toUpperCase()的地区

时间:2021-10-20 07:29:55

I wanted code to convert all the characters in strings to uppercase or lowercase in Java.

我希望代码将字符串中的所有字符在Java中转换为大写或小写。

I found a method that goes something like this:

我发现一种方法是这样的

public static String changelowertoupper()
{
         String str = "CyBeRdRaGoN";
         str=str.toLowerCase(Locale.ENGLISH);
         return str;
}

Now I've read that using certain Locales, like Turkish, "returns i (without dot) instead of i (with dot)."

现在我已经读到使用特定的地区,比如土耳其,“返回I(没有点)而不是I(带点)”。

Is it safe to use Locales like UK, US, ENGLISH, etc.? Are there any big differences between them when applied to strings?

使用英国、美国、英国等地是否安全?当应用到字符串时,它们之间有什么大的区别吗?

Which is the most preferred Locale for Strings?

哪个是字符串的首选区域?

4 个解决方案

#1


51  

I think you should use locale ,

我认为你应该使用locale,

For instance, "TITLE".toLowerCase() in a Turkish locale returns "tıtle", where 'ı' is the LATIN SMALL LETTER DOTLESS I character. To obtain correct results for locale insensitive strings, use toLowerCase(Locale.ENGLISH).

例如,“标题”.toLowerCase()在土耳其地区返回“tıtle”,其中“ı”是拉丁语小写字母DOTLESS我性格。要为语言环境不敏感的字符串获取正确的结果,请使用toLowerCase(Locale.ENGLISH)。

I refer to these links as solution to your problem and it has point to keep in mind in you situation "Turkish"

我将这些链接作为你的问题的解决方案它在你的情况下是有意义的"土耳其"

**FROM THE LINKS**

toLowerCase() respects internationalization (i18n). It performs the case conversion with respect to your Locale. When you call toLowerCase(), internally toLowerCase(Locale.getDefault()) is getting called. It is locale sensitive and you should not write a logic around it interpreting locale independently.

toLowerCase()方面国际化(i18n)。它对您的语言环境执行案例转换。当您调用toLowerCase()时,内部toLowerCase(Locale.getDefault())被调用。它是语言环境敏感的,您不应该在它周围独立地编写逻辑。

import java.util.Locale;

public class ToLocaleTest {
    public static void main(String[] args) throws Exception {
        Locale.setDefault(new Locale("lt")); //setting Lithuanian as locale
        String str = "\u00cc";
    System.out.println("Before case conversion is "+str+
" and length is "+str.length());// Ì
        String lowerCaseStr = str.toLowerCase();
    System.out.println("Lower case is "+lowerCaseStr+
" and length is "+lowerCaseStr.length());// iı`
    }
}

In the above program, look at the string length before and after conversion. It will be 1 and 3. Yes the length of the string before and after case conversion is different. Your logic will go for a toss when you depend on string length on this scenario. When your program gets executed in a different environment, it may fail. This will be a nice catch in code review.

在上面的程序中,查看转换前后的字符串长度。是1和3。是的,字符串在大小写转换前后的长度是不同的。当您在此场景中依赖于字符串长度时,您的逻辑将进行一次折腾。当您的程序在不同的环境中执行时,它可能会失败。这在代码检查中是一个很好的捕获。

To make it safer, you may use another method toLowerCase(Locale.English) and override the locale to English always. But then you are not internationalized.

为了更安全,您可以使用另一个方法toLowerCase(Locale.English)并将语言环境覆盖为English always。但是你没有国际化。

So the crux is, toLowerCase() is locale specific.

所以关键是,toLowerCase()是地区特定的。

reference 1
reference 2
reference 3

参考文献1参考文献2参考文献3


Dotless-i, is a lowercase 'i' without dot. The uppercase of this character is the usual "I". There is another character, "I with dot". The lowercase of this character is the usual lowercase "i".

Dotless-i是一个小写的“i”,没有点。这个字符的大写字母是通常的“I”。还有另外一个字符,"I with dot"。这个字符的小写是通常的小写“i”。

Have you noticed the problem? This unsymetrical conversion causes a serious problem in programming. We face this problem mostly in Java applications because of (IMHO) poor implementation of toLowerCase and toUpperCase functions.

你注意到问题了吗?这种非对称转换在编程中引起了严重的问题。由于(IMHO) toLowerCase和toUpperCase函数的糟糕实现,我们在Java应用程序中主要面临这个问题。

In Java, String.toLowerCase() method converts characters to lowercase according to the default locale. This causes problems if your application works in Turkish locale and especially if you are using this function for a file name or a url that must obey a certain character set.

在Java中,String.toLowerCase()方法根据缺省语言环境将字符转换为小写。如果您的应用程序在土耳其语言环境中工作,这就会造成问题,特别是如果您正在使用该函数作为文件名或必须遵守某个字符集的url时。

I have blogged about two serious examples before: The compile errors with Script libraries with "i" in their names and XSP Manager's fault if an XPage is in a database with "I" in its name.

我以前曾写过两个严肃的例子:名称中带有“I”的脚本库的编译错误,以及如果XPage在名称中带有“I”的数据库中,则XSP Manager的错误。

There is a long history, as I said. For instance in some R7 version, router was unable to send a message to a recipient if his/her name starts with "I". Message reporting agents was not running in Turkish locale until R8. Anyone with Turkish locale could not install Lotus Notes 8.5.1 (it's real!). The list goes on...

正如我所说,历史悠久。例如,在某些R7版本中,如果收件人的名字以“I”开头,路由器无法向他发送消息。消息报告代理直到R8才在土耳其语言环境中运行。任何具有土耳其语言环境的人都不能安装Lotus Notes 8.5.1(这是真实的!)的例子不胜枚举…

There is almost no beta tester from Turkey and customers don't open PMR for these problems. So these problems are not going up to the first priority for development teams.

土耳其几乎没有beta测试仪,客户也不会为这些问题打开PMR。因此,这些问题不会成为开发团队的首要任务。

Even Java team has added a special warning to the latest documentation:

甚至Java团队也对最新的文档添加了特别的警告:

This method is locale sensitive, and may produce unexpected results if used for strings that are intended to be interpreted locale independently. Examples are programming language identifiers, protocol keys, and HTML tags. For instance, "TITLE".toLowerCase() in a Turkish locale returns "tıtle", where 'ı' is the LATIN SMALL LETTER DOTLESS I character. To obtain correct results for locale insensitive strings, use toLowerCase(Locale.ENGLISH).

此方法是区域设置敏感的,如果用于希望独立解释区域设置的字符串,则可能产生意外结果。例如编程语言标识符、协议键和HTML标记。例如,“标题”.toLowerCase()在土耳其地区返回“tıtle”,其中“ı”是拉丁语小写字母DOTLESS我性格。要为语言环境不敏感的字符串获取正确的结果,请使用toLowerCase(Locale.ENGLISH)。

PLEASE READ THE LINKS I CANT POST ALL OF IT "THIS IS REPLY TO YOUR COMMENT"

请阅读链接,我不能全部张贴“这是对你的评论的回复”

#2


5  

String str = "CyBeRdRaGoN";

str = str.toLowerCase(); // str = "cyberdragon"

str = str.toUpperCase(); // str = "CYBERDRAGON"

Your application will choose default locale, so if someone will run your application in Turkish with turkish locale he will see i without dot

您的应用程序将选择默认的语言环境,因此如果有人使用土耳其语语言环境运行您的应用程序,他将看到i没有点

#3


2  

You can create appropriate locale for your String's language.

可以为字符串的语言创建适当的语言环境。

For example:

例如:

toUpperCase(new Locale("tr","TR"));

will do the trick for Turkish.

这对土耳其人来说很有帮助。

#4


0  

If you are using this function for checking a string (e.g. search) It is safe to use the strings in a lowercase or uppercase form to check. You may use it like this:

如果您正在使用此函数检查字符串(例如搜索),那么使用小写或大写形式的字符串进行检查是安全的。你可以这样使用:

if (mViewData.list.data[i].Name.toLowerCase(new Locale("tr", "TR"))
   .contains(mViewHolder.tctSearch.getText().toString().trim()
                                      .toLowerCase(new Locale("tr", "TR")))) {
    // your code here...
}

I confront the same issue but in a case of search in listview. I added this answer that it may help someone who has the same issue.

我遇到了同样的问题,但是在listview中搜索。我补充了这个答案,它可以帮助有同样问题的人。

#1


51  

I think you should use locale ,

我认为你应该使用locale,

For instance, "TITLE".toLowerCase() in a Turkish locale returns "tıtle", where 'ı' is the LATIN SMALL LETTER DOTLESS I character. To obtain correct results for locale insensitive strings, use toLowerCase(Locale.ENGLISH).

例如,“标题”.toLowerCase()在土耳其地区返回“tıtle”,其中“ı”是拉丁语小写字母DOTLESS我性格。要为语言环境不敏感的字符串获取正确的结果,请使用toLowerCase(Locale.ENGLISH)。

I refer to these links as solution to your problem and it has point to keep in mind in you situation "Turkish"

我将这些链接作为你的问题的解决方案它在你的情况下是有意义的"土耳其"

**FROM THE LINKS**

toLowerCase() respects internationalization (i18n). It performs the case conversion with respect to your Locale. When you call toLowerCase(), internally toLowerCase(Locale.getDefault()) is getting called. It is locale sensitive and you should not write a logic around it interpreting locale independently.

toLowerCase()方面国际化(i18n)。它对您的语言环境执行案例转换。当您调用toLowerCase()时,内部toLowerCase(Locale.getDefault())被调用。它是语言环境敏感的,您不应该在它周围独立地编写逻辑。

import java.util.Locale;

public class ToLocaleTest {
    public static void main(String[] args) throws Exception {
        Locale.setDefault(new Locale("lt")); //setting Lithuanian as locale
        String str = "\u00cc";
    System.out.println("Before case conversion is "+str+
" and length is "+str.length());// Ì
        String lowerCaseStr = str.toLowerCase();
    System.out.println("Lower case is "+lowerCaseStr+
" and length is "+lowerCaseStr.length());// iı`
    }
}

In the above program, look at the string length before and after conversion. It will be 1 and 3. Yes the length of the string before and after case conversion is different. Your logic will go for a toss when you depend on string length on this scenario. When your program gets executed in a different environment, it may fail. This will be a nice catch in code review.

在上面的程序中,查看转换前后的字符串长度。是1和3。是的,字符串在大小写转换前后的长度是不同的。当您在此场景中依赖于字符串长度时,您的逻辑将进行一次折腾。当您的程序在不同的环境中执行时,它可能会失败。这在代码检查中是一个很好的捕获。

To make it safer, you may use another method toLowerCase(Locale.English) and override the locale to English always. But then you are not internationalized.

为了更安全,您可以使用另一个方法toLowerCase(Locale.English)并将语言环境覆盖为English always。但是你没有国际化。

So the crux is, toLowerCase() is locale specific.

所以关键是,toLowerCase()是地区特定的。

reference 1
reference 2
reference 3

参考文献1参考文献2参考文献3


Dotless-i, is a lowercase 'i' without dot. The uppercase of this character is the usual "I". There is another character, "I with dot". The lowercase of this character is the usual lowercase "i".

Dotless-i是一个小写的“i”,没有点。这个字符的大写字母是通常的“I”。还有另外一个字符,"I with dot"。这个字符的小写是通常的小写“i”。

Have you noticed the problem? This unsymetrical conversion causes a serious problem in programming. We face this problem mostly in Java applications because of (IMHO) poor implementation of toLowerCase and toUpperCase functions.

你注意到问题了吗?这种非对称转换在编程中引起了严重的问题。由于(IMHO) toLowerCase和toUpperCase函数的糟糕实现,我们在Java应用程序中主要面临这个问题。

In Java, String.toLowerCase() method converts characters to lowercase according to the default locale. This causes problems if your application works in Turkish locale and especially if you are using this function for a file name or a url that must obey a certain character set.

在Java中,String.toLowerCase()方法根据缺省语言环境将字符转换为小写。如果您的应用程序在土耳其语言环境中工作,这就会造成问题,特别是如果您正在使用该函数作为文件名或必须遵守某个字符集的url时。

I have blogged about two serious examples before: The compile errors with Script libraries with "i" in their names and XSP Manager's fault if an XPage is in a database with "I" in its name.

我以前曾写过两个严肃的例子:名称中带有“I”的脚本库的编译错误,以及如果XPage在名称中带有“I”的数据库中,则XSP Manager的错误。

There is a long history, as I said. For instance in some R7 version, router was unable to send a message to a recipient if his/her name starts with "I". Message reporting agents was not running in Turkish locale until R8. Anyone with Turkish locale could not install Lotus Notes 8.5.1 (it's real!). The list goes on...

正如我所说,历史悠久。例如,在某些R7版本中,如果收件人的名字以“I”开头,路由器无法向他发送消息。消息报告代理直到R8才在土耳其语言环境中运行。任何具有土耳其语言环境的人都不能安装Lotus Notes 8.5.1(这是真实的!)的例子不胜枚举…

There is almost no beta tester from Turkey and customers don't open PMR for these problems. So these problems are not going up to the first priority for development teams.

土耳其几乎没有beta测试仪,客户也不会为这些问题打开PMR。因此,这些问题不会成为开发团队的首要任务。

Even Java team has added a special warning to the latest documentation:

甚至Java团队也对最新的文档添加了特别的警告:

This method is locale sensitive, and may produce unexpected results if used for strings that are intended to be interpreted locale independently. Examples are programming language identifiers, protocol keys, and HTML tags. For instance, "TITLE".toLowerCase() in a Turkish locale returns "tıtle", where 'ı' is the LATIN SMALL LETTER DOTLESS I character. To obtain correct results for locale insensitive strings, use toLowerCase(Locale.ENGLISH).

此方法是区域设置敏感的,如果用于希望独立解释区域设置的字符串,则可能产生意外结果。例如编程语言标识符、协议键和HTML标记。例如,“标题”.toLowerCase()在土耳其地区返回“tıtle”,其中“ı”是拉丁语小写字母DOTLESS我性格。要为语言环境不敏感的字符串获取正确的结果,请使用toLowerCase(Locale.ENGLISH)。

PLEASE READ THE LINKS I CANT POST ALL OF IT "THIS IS REPLY TO YOUR COMMENT"

请阅读链接,我不能全部张贴“这是对你的评论的回复”

#2


5  

String str = "CyBeRdRaGoN";

str = str.toLowerCase(); // str = "cyberdragon"

str = str.toUpperCase(); // str = "CYBERDRAGON"

Your application will choose default locale, so if someone will run your application in Turkish with turkish locale he will see i without dot

您的应用程序将选择默认的语言环境,因此如果有人使用土耳其语语言环境运行您的应用程序,他将看到i没有点

#3


2  

You can create appropriate locale for your String's language.

可以为字符串的语言创建适当的语言环境。

For example:

例如:

toUpperCase(new Locale("tr","TR"));

will do the trick for Turkish.

这对土耳其人来说很有帮助。

#4


0  

If you are using this function for checking a string (e.g. search) It is safe to use the strings in a lowercase or uppercase form to check. You may use it like this:

如果您正在使用此函数检查字符串(例如搜索),那么使用小写或大写形式的字符串进行检查是安全的。你可以这样使用:

if (mViewData.list.data[i].Name.toLowerCase(new Locale("tr", "TR"))
   .contains(mViewHolder.tctSearch.getText().toString().trim()
                                      .toLowerCase(new Locale("tr", "TR")))) {
    // your code here...
}

I confront the same issue but in a case of search in listview. I added this answer that it may help someone who has the same issue.

我遇到了同样的问题,但是在listview中搜索。我补充了这个答案,它可以帮助有同样问题的人。