utf8_general_ci和utf8_unicode_ci的区别是什么!

时间:2021-11-26 20:17:17

Between utf8_general_ci and utf8_unicode_ci, are there any differences in terms of performance?

在utf8_general_ci和utf8_unicode_ci之间,在性能方面是否存在差异?

5 个解决方案

#1


1212  

These two collations are both for the UTF-8 character encoding. The differences are in how text is sorted and compared.

这两个排序是UTF-8字符编码。区别在于文本是如何排序和比较的。

Note: Since MySQL 5.5.3 you should use utf8mb4 rather than utf8. They both refer to the UTF-8 encoding, but the older utf8 had a MySQL-specific limitation preventing use of characters numbered above 0xFFFD.

注意:从MySQL 5.5.3开始,应该使用utf8mb4而不是utf8。它们都引用UTF-8编码,但是旧的utf8有一个mysql特定的限制,防止使用超过0xFFFD的字符。

  • Accuracy

    精度

    utf8mb4_unicode_ci is based on the Unicode standard for sorting and comparison, which sorts accurately in a very wide range of languages.

    utf8mb4_unicode_ci是基于Unicode标准进行排序和比较的,这在非常广泛的语言中进行了精确排序。

    utf8mb4_general_ci fails to implement all of the Unicode sorting rules, which will result in undesirable sorting in some situations, such as when using particular languages or characters.

    utf8mb4_general_ci未能实现所有的Unicode排序规则,这将导致在某些情况下(例如使用特定语言或字符时)不受欢迎的排序。

  • Performance

    性能

    utf8mb4_general_ci is faster at comparisons and sorting, because it takes a bunch of performance-related shortcuts.

    在比较和排序方面,utf8mb4_general_ci比较快,因为它需要一些与性能相关的快捷方式。

    On modern servers, this performance boost will be all but negligible. It was devised in a time when servers had a tiny fraction of the CPU performance of today's computers.

    在现代服务器上,这种性能提升几乎可以忽略不计。它是在服务器在今天的计算机的CPU性能中占很小一部分的时候设计出来的。

    utf8mb4_unicode_ci, which uses the Unicode rules for sorting and comparison, employs a fairly complex algorithm for correct sorting in a wide range of languages and when using a wide range of special characters. These rules need to take into account language-specific conventions; not everybody sorts their characters in what we would call 'alphabetical order'.

    使用Unicode规则进行排序和比较的utf8mb4_unicode_ci使用了一种相当复杂的算法,可以在多种语言中进行正确的排序,并使用广泛的特殊字符。这些规则需要考虑到语言特定的约定;不是每个人都按照字母顺序排列他们的角色。

As far as Latin (ie "European") languages go, there is not much difference between the Unicode sorting and the simplified utf8mb4_general_ci sorting in MySQL, but there are still a few differences:

就拉丁语而言,在MySQL中,Unicode排序和简化的utf8mb4_general_ci排序之间并没有太大的差别,但是仍然存在一些差异:

  • For examples, the Unicode collation sorts "ß" like "ss", and "Œ" like "OE" as people using those characters would normally want, whereas utf8mb4_general_ci sorts them as single characters (presumably like "s" and "e" respectively).

    举例来说,Unicode排序规则像“ss”,和“”,就像使用这些字符的人通常想要的那样,而utf8mb4_general_ci将它们排序为单个字符(可能分别是“s”和“e”)。

  • Some Unicode characters are defined as ignorable, which means they shouldn't count toward the sort order and the comparison should move on to the next character instead. utf8mb4_unicode_ci handles these properly.

    一些Unicode字符被定义为可忽略的,这意味着它们不应该对排序顺序进行计数,而比较应该转向下一个字符。utf8mb4_unicode_ci妥善处理这些。

In non-latin languages, such as Asian languages or languages with different alphabets, there may be a lot more differences between Unicode sorting and the simplified utf8mb4_general_ci sorting. The suitability of utf8mb4_general_ci will depend heavily on the language used. For some languages, it'll be quite inadequate.

在非拉丁语言中,如亚洲语言或具有不同字母的语言,在Unicode排序和简化的utf8mb4_general_ci排序之间可能存在更多差异。utf8mb4_general_ci的适用性很大程度上取决于所使用的语言。对于某些语言来说,这是非常不合适的。

What should you use?

你应该使用什么呢?

There is almost certainly no reason to use utf8mb4_general_ci anymore, as we have left behind the point where CPU speed is low enough that the performance difference would be important. Your database will almost certainly be limited by other bottlenecks than this.

几乎可以肯定没有理由使用utf8mb4_general_ci,因为我们已经在CPU速度足够低的情况下留下了性能差异很重要的问题。您的数据库几乎肯定会受到其他瓶颈的限制。

The difference in performance is only going to be measurable in extremely specialised situations, and if that's you, you probably already know about it. If you're experiencing slow sorting, in almost all cases it'll be an issue with your indexes/query plan. Changing your collation function should not be high on the list of things to troubleshoot.

在非常特殊的情况下,性能上的差异只是可以测量的,如果是你,你可能已经知道了。如果您正在经历缓慢的排序,那么在几乎所有情况下,它都会成为索引/查询计划的问题。更改排序函数不应该在排除故障的列表中名列前茅。

In the past, some people recommended to use utf8mb4_general_ci except when accurate sorting was going to be important enough to justify the performance cost. Today, that performance cost has all but disappeared, and developers are treating internationalization more seriously.

在过去,有些人建议使用utf8mb4_general_ci,除非正确的排序足够重要,以证明性能成本。今天,性能成本几乎消失了,开发人员对待国际化的态度也更加严肃。

One other thing I'll add is that even if you know your application only supports the English language, it may still need to deal with people's names, which can often contain characters used in other languages in which it is just as important to sort correctly. Using the Unicode rules for everything helps add peace of mind that the very smart Unicode people have worked very hard to make sorting work properly.

我还要补充一点,即使您知道您的应用程序只支持英语,它可能仍然需要处理人们的名称,因为它通常包含其他语言中使用的字符,在这些字符中,正确排序同样重要。使用Unicode规则对所有事情都有帮助,这有助于增加一种内心的平静,即非常聪明的Unicode人员非常努力地工作,使排序工作正常。

#2


121  

I wanted to know what is the performance difference between using utf8_general_ci and utf8_unicode_ci, but I did not find any benchmarks listed on the Internet, so I decided to create benchmarks myself.

我想知道使用utf8_general_ci和utf8_unicode_ci之间的性能差异,但是我没有找到在Internet上列出的任何基准,所以我决定自己创建基准。

I created a very simple table with 500000 rows:

我创建了一个非常简单的表格,有50万行:

CREATE TABLE test(
  ID INT(11) DEFAULT NULL,
  Description VARCHAR(20) DEFAULT NULL
)
ENGINE = INNODB
CHARACTER SET utf8
COLLATE utf8_general_ci;

Then I filled it with random data by running this stored procedure:

然后我通过运行这个存储过程来填充随机数据:

CREATE PROCEDURE randomizer()
BEGIN
  DECLARE i INT DEFAULT 0;
  DECLARE random CHAR(20) ;

  theloop: loop
    SET random = CONV(FLOOR(RAND() * 99999999999999), 20, 36);

    INSERT INTO test VALUES (i+1, random);

    SET i=i+1;

    IF i = 500000 THEN
      LEAVE theloop;
    END IF;

  END LOOP theloop;
END

Then I created the following stored procedures to benchmark simple SELECT, SELECT with LIKE, and sorting (SELECT with ORDER BY):

然后,我创建了以下存储过程,以基准测试简单的SELECT、SELECT和排序(按顺序选择):

CREATE benchmark_simple_select()
BEGIN
  DECLARE i INT DEFAULT 0;

  theloop: loop

    SELECT * FROM test WHERE Description = 'test' COLLATE utf8_general_ci;

    SET i = i + 1;

    IF i = 30 THEN
      LEAVE theloop;
      END IF;

  END LOOP theloop;

END

CREATE PROCEDURE benchmark_select_like()
BEGIN
  DECLARE i INT DEFAULT 0;

  theloop: loop

    SELECT * FROM test WHERE Description LIKE '%test' COLLATE utf8_general_ci;

    SET i = i + 1;

    IF i = 30 THEN
      LEAVE theloop;
      END IF;

  END LOOP theloop;

END

CREATE PROCEDURE benchmark_order_by()
BEGIN
  DECLARE i INT DEFAULT 0;

  theloop: loop

    SELECT * FROM test WHERE ID > FLOOR(1 + RAND() * (400000 - 1)) ORDER BY Description COLLATE utf8_general_ci LIMIT 1000;

    SET i = i + 1;

    IF i = 10 THEN
      LEAVE theloop;
      END IF;

  END LOOP theloop;

END

In the stored procedures above utf8_general_ci collation is used, but of course during the tests I used both utf8_general_ci and utf8_unicode_ci.

在使用utf8_general_ci排序的存储过程中,当然在测试期间,我使用了utf8_general_ci和utf8_unicode_ci。

I called each stored procedure 5 times for each collation (5 times for utf8_general_ci and 5 times for utf8_unicode_ci) and then calculated the average values.

我将每个存储过程调用5次,每次排序(utf8_general_ci的5次,utf8_unicode_ci的5次),然后计算平均值。

My results are:

我的结果是:

benchmark_simple_select() with utf8_general_ci: 9957 ms
benchmark_simple_select() with utf8_unicode_ci: 10271 ms
In this benchmark using utf8_unicode_ci is slower than utf8_general_ci by 3.2%.

在这个基准中,使用utf8_unicode_ci: 10271 ms的utf8_uniode_ci: 10271 ms,比utf8_general_ci慢了3.2%。

benchmark_select_like() with utf8_general_ci: 11441 ms
benchmark_select_like() with utf8_unicode_ci: 12811 ms
In this benchmark using utf8_unicode_ci is slower than utf8_general_ci by 12%.

使用utf8_general_ci: 11441 ms benchmark . select_like()与utf8_unicode_ci: 12811 ms在这个基准中使用utf8_unicode_ci比utf8_general_ci慢了12%。

benchmark_order_by() with utf8_general_ci: 11944 ms
benchmark_order_by() with utf8_unicode_ci: 12887 ms
In this benchmark using utf8_unicode_ci is slower than utf8_general_ci by 7.9%.

使用utf8_general_ci: 11944 ms benchmark_order_by()与utf8_unicode_ci: 12887 ms在这个基准中使用utf8_unicode_ci比utf8_general_ci慢7.9%。

#3


33  

This post describes it very nicely.

这篇文章描述得很好。

In short: utf8_unicode_ci uses the Unicode Collation Algorithm as defined in the Unicode standards, whereas utf8_general_ci is a more simple sort order which results in "less accurate" sorting results.

简而言之:utf8_unicode_ci使用Unicode标准中定义的Unicode排序算法,而utf8_general_ci是一个更简单的排序顺序,导致“不太准确”的排序结果。

#4


4  

See the mysql manual, Unicode Character Sets section:

请参阅mysql手册、Unicode字符集部分:

For any Unicode character set, operations performed using the _general_ci collation are faster than those for the _unicode_ci collation. For example, comparisons for the utf8_general_ci collation are faster, but slightly less correct, than comparisons for utf8_unicode_ci. The reason for this is that utf8_unicode_ci supports mappings such as expansions; that is, when one character compares as equal to combinations of other characters. For example, in German and some other languages “ß” is equal to “ss”. utf8_unicode_ci also supports contractions and ignorable characters. utf8_general_ci is a legacy collation that does not support expansions, contractions, or ignorable characters. It can make only one-to-one comparisons between characters.

对于任何Unicode字符集,使用_general_ci排序的操作要比_unicode_ci排序的操作快。例如,与utf8_unicode_ci相比,对utf8_general_ci排序的比较要快一些,但稍微不正确。这是因为utf8_unicode_ci支持诸如扩展之类的映射;也就是说,当一个字符与其他字符的组合相等时。例如,在德国和其他一些语言“ß”等于“党*”。utf8_unicode_ci也支持收缩和可忽略字符。utf8_general_ci是不支持扩展、收缩或可忽略字符的遗留排序。它只能对字符进行一对一的比较。

So to summarize, utf_general_ci uses a smaller and less correct (according to the standard) set of comparisons than utf_unicode_ci which should implement the entire standard. The general_ci set will be faster because there is less computation to do.

因此,总的来说,utf_general_ci使用了一组比utf_unicode_ci更小且更不正确的比较(根据标准),它应该实现整个标准。general_ci集将会更快,因为计算量较少。

#5


3  

In brief words:

在简短的词:

If you need better sorting order - use utf8_unicode_ci (this is the preferred method),

如果您需要更好的排序顺序—使用utf8_unicode_ci(这是首选方法),

but if you utterly interested in performance - use utf8_general_ci, but know that it is a little outdated.

但如果您对性能完全感兴趣,请使用utf8_general_ci,但要知道它有点过时。

The differences in terms of performance are very slight.

在性能方面的差异非常小。

#1


1212  

These two collations are both for the UTF-8 character encoding. The differences are in how text is sorted and compared.

这两个排序是UTF-8字符编码。区别在于文本是如何排序和比较的。

Note: Since MySQL 5.5.3 you should use utf8mb4 rather than utf8. They both refer to the UTF-8 encoding, but the older utf8 had a MySQL-specific limitation preventing use of characters numbered above 0xFFFD.

注意:从MySQL 5.5.3开始,应该使用utf8mb4而不是utf8。它们都引用UTF-8编码,但是旧的utf8有一个mysql特定的限制,防止使用超过0xFFFD的字符。

  • Accuracy

    精度

    utf8mb4_unicode_ci is based on the Unicode standard for sorting and comparison, which sorts accurately in a very wide range of languages.

    utf8mb4_unicode_ci是基于Unicode标准进行排序和比较的,这在非常广泛的语言中进行了精确排序。

    utf8mb4_general_ci fails to implement all of the Unicode sorting rules, which will result in undesirable sorting in some situations, such as when using particular languages or characters.

    utf8mb4_general_ci未能实现所有的Unicode排序规则,这将导致在某些情况下(例如使用特定语言或字符时)不受欢迎的排序。

  • Performance

    性能

    utf8mb4_general_ci is faster at comparisons and sorting, because it takes a bunch of performance-related shortcuts.

    在比较和排序方面,utf8mb4_general_ci比较快,因为它需要一些与性能相关的快捷方式。

    On modern servers, this performance boost will be all but negligible. It was devised in a time when servers had a tiny fraction of the CPU performance of today's computers.

    在现代服务器上,这种性能提升几乎可以忽略不计。它是在服务器在今天的计算机的CPU性能中占很小一部分的时候设计出来的。

    utf8mb4_unicode_ci, which uses the Unicode rules for sorting and comparison, employs a fairly complex algorithm for correct sorting in a wide range of languages and when using a wide range of special characters. These rules need to take into account language-specific conventions; not everybody sorts their characters in what we would call 'alphabetical order'.

    使用Unicode规则进行排序和比较的utf8mb4_unicode_ci使用了一种相当复杂的算法,可以在多种语言中进行正确的排序,并使用广泛的特殊字符。这些规则需要考虑到语言特定的约定;不是每个人都按照字母顺序排列他们的角色。

As far as Latin (ie "European") languages go, there is not much difference between the Unicode sorting and the simplified utf8mb4_general_ci sorting in MySQL, but there are still a few differences:

就拉丁语而言,在MySQL中,Unicode排序和简化的utf8mb4_general_ci排序之间并没有太大的差别,但是仍然存在一些差异:

  • For examples, the Unicode collation sorts "ß" like "ss", and "Œ" like "OE" as people using those characters would normally want, whereas utf8mb4_general_ci sorts them as single characters (presumably like "s" and "e" respectively).

    举例来说,Unicode排序规则像“ss”,和“”,就像使用这些字符的人通常想要的那样,而utf8mb4_general_ci将它们排序为单个字符(可能分别是“s”和“e”)。

  • Some Unicode characters are defined as ignorable, which means they shouldn't count toward the sort order and the comparison should move on to the next character instead. utf8mb4_unicode_ci handles these properly.

    一些Unicode字符被定义为可忽略的,这意味着它们不应该对排序顺序进行计数,而比较应该转向下一个字符。utf8mb4_unicode_ci妥善处理这些。

In non-latin languages, such as Asian languages or languages with different alphabets, there may be a lot more differences between Unicode sorting and the simplified utf8mb4_general_ci sorting. The suitability of utf8mb4_general_ci will depend heavily on the language used. For some languages, it'll be quite inadequate.

在非拉丁语言中,如亚洲语言或具有不同字母的语言,在Unicode排序和简化的utf8mb4_general_ci排序之间可能存在更多差异。utf8mb4_general_ci的适用性很大程度上取决于所使用的语言。对于某些语言来说,这是非常不合适的。

What should you use?

你应该使用什么呢?

There is almost certainly no reason to use utf8mb4_general_ci anymore, as we have left behind the point where CPU speed is low enough that the performance difference would be important. Your database will almost certainly be limited by other bottlenecks than this.

几乎可以肯定没有理由使用utf8mb4_general_ci,因为我们已经在CPU速度足够低的情况下留下了性能差异很重要的问题。您的数据库几乎肯定会受到其他瓶颈的限制。

The difference in performance is only going to be measurable in extremely specialised situations, and if that's you, you probably already know about it. If you're experiencing slow sorting, in almost all cases it'll be an issue with your indexes/query plan. Changing your collation function should not be high on the list of things to troubleshoot.

在非常特殊的情况下,性能上的差异只是可以测量的,如果是你,你可能已经知道了。如果您正在经历缓慢的排序,那么在几乎所有情况下,它都会成为索引/查询计划的问题。更改排序函数不应该在排除故障的列表中名列前茅。

In the past, some people recommended to use utf8mb4_general_ci except when accurate sorting was going to be important enough to justify the performance cost. Today, that performance cost has all but disappeared, and developers are treating internationalization more seriously.

在过去,有些人建议使用utf8mb4_general_ci,除非正确的排序足够重要,以证明性能成本。今天,性能成本几乎消失了,开发人员对待国际化的态度也更加严肃。

One other thing I'll add is that even if you know your application only supports the English language, it may still need to deal with people's names, which can often contain characters used in other languages in which it is just as important to sort correctly. Using the Unicode rules for everything helps add peace of mind that the very smart Unicode people have worked very hard to make sorting work properly.

我还要补充一点,即使您知道您的应用程序只支持英语,它可能仍然需要处理人们的名称,因为它通常包含其他语言中使用的字符,在这些字符中,正确排序同样重要。使用Unicode规则对所有事情都有帮助,这有助于增加一种内心的平静,即非常聪明的Unicode人员非常努力地工作,使排序工作正常。

#2


121  

I wanted to know what is the performance difference between using utf8_general_ci and utf8_unicode_ci, but I did not find any benchmarks listed on the Internet, so I decided to create benchmarks myself.

我想知道使用utf8_general_ci和utf8_unicode_ci之间的性能差异,但是我没有找到在Internet上列出的任何基准,所以我决定自己创建基准。

I created a very simple table with 500000 rows:

我创建了一个非常简单的表格,有50万行:

CREATE TABLE test(
  ID INT(11) DEFAULT NULL,
  Description VARCHAR(20) DEFAULT NULL
)
ENGINE = INNODB
CHARACTER SET utf8
COLLATE utf8_general_ci;

Then I filled it with random data by running this stored procedure:

然后我通过运行这个存储过程来填充随机数据:

CREATE PROCEDURE randomizer()
BEGIN
  DECLARE i INT DEFAULT 0;
  DECLARE random CHAR(20) ;

  theloop: loop
    SET random = CONV(FLOOR(RAND() * 99999999999999), 20, 36);

    INSERT INTO test VALUES (i+1, random);

    SET i=i+1;

    IF i = 500000 THEN
      LEAVE theloop;
    END IF;

  END LOOP theloop;
END

Then I created the following stored procedures to benchmark simple SELECT, SELECT with LIKE, and sorting (SELECT with ORDER BY):

然后,我创建了以下存储过程,以基准测试简单的SELECT、SELECT和排序(按顺序选择):

CREATE benchmark_simple_select()
BEGIN
  DECLARE i INT DEFAULT 0;

  theloop: loop

    SELECT * FROM test WHERE Description = 'test' COLLATE utf8_general_ci;

    SET i = i + 1;

    IF i = 30 THEN
      LEAVE theloop;
      END IF;

  END LOOP theloop;

END

CREATE PROCEDURE benchmark_select_like()
BEGIN
  DECLARE i INT DEFAULT 0;

  theloop: loop

    SELECT * FROM test WHERE Description LIKE '%test' COLLATE utf8_general_ci;

    SET i = i + 1;

    IF i = 30 THEN
      LEAVE theloop;
      END IF;

  END LOOP theloop;

END

CREATE PROCEDURE benchmark_order_by()
BEGIN
  DECLARE i INT DEFAULT 0;

  theloop: loop

    SELECT * FROM test WHERE ID > FLOOR(1 + RAND() * (400000 - 1)) ORDER BY Description COLLATE utf8_general_ci LIMIT 1000;

    SET i = i + 1;

    IF i = 10 THEN
      LEAVE theloop;
      END IF;

  END LOOP theloop;

END

In the stored procedures above utf8_general_ci collation is used, but of course during the tests I used both utf8_general_ci and utf8_unicode_ci.

在使用utf8_general_ci排序的存储过程中,当然在测试期间,我使用了utf8_general_ci和utf8_unicode_ci。

I called each stored procedure 5 times for each collation (5 times for utf8_general_ci and 5 times for utf8_unicode_ci) and then calculated the average values.

我将每个存储过程调用5次,每次排序(utf8_general_ci的5次,utf8_unicode_ci的5次),然后计算平均值。

My results are:

我的结果是:

benchmark_simple_select() with utf8_general_ci: 9957 ms
benchmark_simple_select() with utf8_unicode_ci: 10271 ms
In this benchmark using utf8_unicode_ci is slower than utf8_general_ci by 3.2%.

在这个基准中,使用utf8_unicode_ci: 10271 ms的utf8_uniode_ci: 10271 ms,比utf8_general_ci慢了3.2%。

benchmark_select_like() with utf8_general_ci: 11441 ms
benchmark_select_like() with utf8_unicode_ci: 12811 ms
In this benchmark using utf8_unicode_ci is slower than utf8_general_ci by 12%.

使用utf8_general_ci: 11441 ms benchmark . select_like()与utf8_unicode_ci: 12811 ms在这个基准中使用utf8_unicode_ci比utf8_general_ci慢了12%。

benchmark_order_by() with utf8_general_ci: 11944 ms
benchmark_order_by() with utf8_unicode_ci: 12887 ms
In this benchmark using utf8_unicode_ci is slower than utf8_general_ci by 7.9%.

使用utf8_general_ci: 11944 ms benchmark_order_by()与utf8_unicode_ci: 12887 ms在这个基准中使用utf8_unicode_ci比utf8_general_ci慢7.9%。

#3


33  

This post describes it very nicely.

这篇文章描述得很好。

In short: utf8_unicode_ci uses the Unicode Collation Algorithm as defined in the Unicode standards, whereas utf8_general_ci is a more simple sort order which results in "less accurate" sorting results.

简而言之:utf8_unicode_ci使用Unicode标准中定义的Unicode排序算法,而utf8_general_ci是一个更简单的排序顺序,导致“不太准确”的排序结果。

#4


4  

See the mysql manual, Unicode Character Sets section:

请参阅mysql手册、Unicode字符集部分:

For any Unicode character set, operations performed using the _general_ci collation are faster than those for the _unicode_ci collation. For example, comparisons for the utf8_general_ci collation are faster, but slightly less correct, than comparisons for utf8_unicode_ci. The reason for this is that utf8_unicode_ci supports mappings such as expansions; that is, when one character compares as equal to combinations of other characters. For example, in German and some other languages “ß” is equal to “ss”. utf8_unicode_ci also supports contractions and ignorable characters. utf8_general_ci is a legacy collation that does not support expansions, contractions, or ignorable characters. It can make only one-to-one comparisons between characters.

对于任何Unicode字符集,使用_general_ci排序的操作要比_unicode_ci排序的操作快。例如,与utf8_unicode_ci相比,对utf8_general_ci排序的比较要快一些,但稍微不正确。这是因为utf8_unicode_ci支持诸如扩展之类的映射;也就是说,当一个字符与其他字符的组合相等时。例如,在德国和其他一些语言“ß”等于“党*”。utf8_unicode_ci也支持收缩和可忽略字符。utf8_general_ci是不支持扩展、收缩或可忽略字符的遗留排序。它只能对字符进行一对一的比较。

So to summarize, utf_general_ci uses a smaller and less correct (according to the standard) set of comparisons than utf_unicode_ci which should implement the entire standard. The general_ci set will be faster because there is less computation to do.

因此,总的来说,utf_general_ci使用了一组比utf_unicode_ci更小且更不正确的比较(根据标准),它应该实现整个标准。general_ci集将会更快,因为计算量较少。

#5


3  

In brief words:

在简短的词:

If you need better sorting order - use utf8_unicode_ci (this is the preferred method),

如果您需要更好的排序顺序—使用utf8_unicode_ci(这是首选方法),

but if you utterly interested in performance - use utf8_general_ci, but know that it is a little outdated.

但如果您对性能完全感兴趣,请使用utf8_general_ci,但要知道它有点过时。

The differences in terms of performance are very slight.

在性能方面的差异非常小。