字母顺序不是从左到右的比较吗?

时间:2022-08-22 11:50:11

I thought that in .NET strings were compared alphabetically and that they were compared from left to right.

我认为。net字符串是按字母顺序进行比较的,它们是从左到右进行比较的。

string[] strings = { "-1", "1", "1Foo", "-1Foo" };
Array.Sort(strings);
Console.WriteLine(string.Join(",", strings));

I'd expect this (or the both with minus at the beginning first):

我希望是这样的(或者开头都是负号):

1,1Foo,-1,-1Foo

But the result is:

但结果是:

1,-1,1Foo,-1Foo

It seems to be a mixture, either the minus sign is ignored or multiple characters are compared even if the first character was already different.

它似乎是一种混合,要么忽略负号,要么比较多个字符,即使第一个字符已经不同。

Edit: I've now tested OrdinalIgnoreCase and i get the expected order:

编辑:我现在测试了普通排列酶,得到了预期的顺序:

Array.Sort(strings, StringComparer.OrdinalIgnoreCase);

But even if i use InvariantCultureIgnoreCase i get the unexpected order.

但即使我使用不变量文化无知,我也会得到意想不到的顺序。

2 个解决方案

#1


2  

Jon Skeet to the rescue here

乔恩·斯凯特来救我

Specifically:

具体地说:

The .NET Framework uses three distinct ways of sorting: word sort, string sort, and ordinal sort. Word sort performs a culture-sensitive comparison of strings. Certain nonalphanumeric characters might have special weights assigned to them. For example, the hyphen ("-") might have a very small weight assigned to it so that "coop" and "co-op" appear next to each other in a sorted list. String sort is similar to word sort, except that there are no special cases. Therefore, all nonalphanumeric symbols come before all alphanumeric characters. Ordinal sort compares strings based on the Unicode values of each element of the string.

. net框架使用三种不同的排序方式:单词排序、字符串排序和顺序排序。单词排序执行对字符串的文化敏感的比较。某些非字母数字字符可能有分配给它们的特殊权重。例如,连字符(“-”)可能有一个很小的权重分配给它,以便“coop”和“co-op”在排序列表中彼此相邻。字符串排序类似于单词排序,除了没有特殊情况。因此,所有非字母数字符号都出现在所有字母数字字符之前。顺序排序根据字符串的每个元素的Unicode值对字符串进行比较。

But adding the StringComparer.Ordinal makes it behave as you want:

但添加StringComparer。序数使它按照你的意愿运行:

string[] strings = { "-1", "1", "10", "-10", "a", "ba","-a" };      
Array.Sort(strings,StringComparer.Ordinal );
Console.WriteLine(string.Join(",", strings));
// prints: -1,-10,-a,1,10,a,ba

Edit:
About the Ordinal, quoting from MSDN CompareOptions Enumeration

编辑:关于序号,引用MSDN比较选项枚举

Ordinal Indicates that the string comparison must use successive Unicode UTF-16 encoded values of the string (code unit by code unit comparison), leading to a fast comparison but one that is culture-insensitive. A string starting with a code unit XXXX16 comes before a string starting with YYYY16, if XXXX16 is less than YYYY16. This value cannot be combined with other CompareOptions values and must be used alone.

序号表示字符串比较必须使用字符串的连续Unicode UTF-16编码值(代码单元比较的代码单元),从而导致快速的比较,但是不受区域性影响。以代码单元XXXX16开头的字符串在以yyyyy16开头的字符串之前,如果XXXX16小于yyyyy16的话。此值不能与其他CompareOptions值组合,必须单独使用。

Also seems you have String.CompareOrdinal if you want the ordinal of 2 strings.

看起来你也有字符串。如果你想要两个弦的序数。

Here's another note of interest:

下面是另一个值得注意的地方:

When possible, the application should use string comparison methods that accept a CompareOptions value to specify the kind of comparison expected. As a general rule, user-facing comparisons are best served by the use of linguistic options (using the current culture), while security comparisons should specify Ordinal or OrdinalIgnoreCase.

如果可能,应用程序应该使用接受CompareOptions值的字符串比较方法来指定预期的比较类型。一般来说,面向用户的比较最好使用语言选项(使用当前的文化),而安全性比较应该指定序号或序号。

I guess we humans expect ordinal when dealing with strings :)

我想我们人类在处理字符串的时候是期望顺序的

#2


2  

There is a small note on the String.CompareTo method documentation:

绳子上有一个小音符。CompareTo方法文档:

Notes to Callers:

记录来电者:

Character sets include ignorable characters. The CompareTo(String) method does not consider such characters when it performs a culture-sensitive comparison. For example, if the following code is run on the .NET Framework 4 or later, a comparison of "animal" with "ani-mal" (using a soft hyphen, or U+00AD) indicates that the two strings are equivalent.

字符集包括可忽略的字符。在执行区域性敏感比较时,CompareTo(String)方法不考虑这些字符。例如,如果以下代码在. net Framework 4上运行,那么“animal”与“ani-mal”(使用软连字符,或U+00AD)的比较表明这两个字符串是等价的。

And then a little later states:

然后又过了一段时间

To recognize ignorable characters in a string comparison, call the CompareOrdinal(String, String) method.

要识别字符串比较中的可忽略字符,请调用CompareOrdinal(string, string)方法。

These two statements seem to be consistent with the results you are seeing.

这两种说法似乎与你看到的结果一致。

#1


2  

Jon Skeet to the rescue here

乔恩·斯凯特来救我

Specifically:

具体地说:

The .NET Framework uses three distinct ways of sorting: word sort, string sort, and ordinal sort. Word sort performs a culture-sensitive comparison of strings. Certain nonalphanumeric characters might have special weights assigned to them. For example, the hyphen ("-") might have a very small weight assigned to it so that "coop" and "co-op" appear next to each other in a sorted list. String sort is similar to word sort, except that there are no special cases. Therefore, all nonalphanumeric symbols come before all alphanumeric characters. Ordinal sort compares strings based on the Unicode values of each element of the string.

. net框架使用三种不同的排序方式:单词排序、字符串排序和顺序排序。单词排序执行对字符串的文化敏感的比较。某些非字母数字字符可能有分配给它们的特殊权重。例如,连字符(“-”)可能有一个很小的权重分配给它,以便“coop”和“co-op”在排序列表中彼此相邻。字符串排序类似于单词排序,除了没有特殊情况。因此,所有非字母数字符号都出现在所有字母数字字符之前。顺序排序根据字符串的每个元素的Unicode值对字符串进行比较。

But adding the StringComparer.Ordinal makes it behave as you want:

但添加StringComparer。序数使它按照你的意愿运行:

string[] strings = { "-1", "1", "10", "-10", "a", "ba","-a" };      
Array.Sort(strings,StringComparer.Ordinal );
Console.WriteLine(string.Join(",", strings));
// prints: -1,-10,-a,1,10,a,ba

Edit:
About the Ordinal, quoting from MSDN CompareOptions Enumeration

编辑:关于序号,引用MSDN比较选项枚举

Ordinal Indicates that the string comparison must use successive Unicode UTF-16 encoded values of the string (code unit by code unit comparison), leading to a fast comparison but one that is culture-insensitive. A string starting with a code unit XXXX16 comes before a string starting with YYYY16, if XXXX16 is less than YYYY16. This value cannot be combined with other CompareOptions values and must be used alone.

序号表示字符串比较必须使用字符串的连续Unicode UTF-16编码值(代码单元比较的代码单元),从而导致快速的比较,但是不受区域性影响。以代码单元XXXX16开头的字符串在以yyyyy16开头的字符串之前,如果XXXX16小于yyyyy16的话。此值不能与其他CompareOptions值组合,必须单独使用。

Also seems you have String.CompareOrdinal if you want the ordinal of 2 strings.

看起来你也有字符串。如果你想要两个弦的序数。

Here's another note of interest:

下面是另一个值得注意的地方:

When possible, the application should use string comparison methods that accept a CompareOptions value to specify the kind of comparison expected. As a general rule, user-facing comparisons are best served by the use of linguistic options (using the current culture), while security comparisons should specify Ordinal or OrdinalIgnoreCase.

如果可能,应用程序应该使用接受CompareOptions值的字符串比较方法来指定预期的比较类型。一般来说,面向用户的比较最好使用语言选项(使用当前的文化),而安全性比较应该指定序号或序号。

I guess we humans expect ordinal when dealing with strings :)

我想我们人类在处理字符串的时候是期望顺序的

#2


2  

There is a small note on the String.CompareTo method documentation:

绳子上有一个小音符。CompareTo方法文档:

Notes to Callers:

记录来电者:

Character sets include ignorable characters. The CompareTo(String) method does not consider such characters when it performs a culture-sensitive comparison. For example, if the following code is run on the .NET Framework 4 or later, a comparison of "animal" with "ani-mal" (using a soft hyphen, or U+00AD) indicates that the two strings are equivalent.

字符集包括可忽略的字符。在执行区域性敏感比较时,CompareTo(String)方法不考虑这些字符。例如,如果以下代码在. net Framework 4上运行,那么“animal”与“ani-mal”(使用软连字符,或U+00AD)的比较表明这两个字符串是等价的。

And then a little later states:

然后又过了一段时间

To recognize ignorable characters in a string comparison, call the CompareOrdinal(String, String) method.

要识别字符串比较中的可忽略字符,请调用CompareOrdinal(string, string)方法。

These two statements seem to be consistent with the results you are seeing.

这两种说法似乎与你看到的结果一致。