在MySQL中使用希腊字符搜索功能

时间:2022-09-13 09:36:00

I have a search function for old Greek words (Wearch.nl).
The old greek words have mutch accents, ῦ is not the same as ὐ, but I want that if you type a "u" you get the results for ῦ and ὐ (and the other 5 variations). I am using the LIKE function of MySQL to get the results.
I could search for all of them but I hope it can be shorter and faster.

我有一个旧希腊语单词的搜索功能(Wearch.nl)。古希腊词语有mutch口音,ῦ和ὐ不一样,但我想如果你键入“u”,你会得到ῦ和ὐ(以及其他5种变体)的结果。我正在使用MySQL的LIKE函数来获得结果。我可以搜索所有这些但我希望它可以更短更快。

1 个解决方案

#1


2  

If you are able to change the character set of your column (or table) then set it to utf8_general_ci (link to manual):

如果您能够更改列(或表)的字符集,则将其设置为utf8_general_ci(链接到手册):

ALTER TABLE tbl_name CONVERT TO CHARACTER SET utf8_general_ci;

With this character set (which is case insensitive, as denoted by _ci), accented characters have the same weight (the value used for collation), so they return true when compared with each other (link to manual):

使用此字符集(不区分大小写,如_ci所示),重音字符具有相同的权重(用于整理的值),因此它们在相互比较时返回true(链接到手动):

Non-UCA collations have a one-to-one mapping from character code to weight. In MySQL, such collations are case insensitive and accent insensitive. utf8_general_ci is an example: 'a', 'A', 'À', and 'á' each have different character codes but all have a weight of 0x0041 and compare as equal.

非UCA排序规则具有从字符代码到权重的一对一映射。在MySQL中,这种排序规则不区分大小写并且不区分重音。 utf8_general_ci就是一个例子:'a','A','À'和'á'每个都有不同的字符代码,但它们的权重都是0x0041并且相等。

mysql> SET NAMES 'utf8' COLLATE 'utf8_general_ci';
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT 'a' = 'A', 'a' = 'À', 'a' = 'á';
+-----------+-----------+-----------+
| 'a' = 'A' | 'a' = 'À' | 'a' = 'á' |
+-----------+-----------+-----------+
|         1 |         1 |         1 |
+-----------+-----------+-----------+
1 row in set (0.06 sec)

Alternatively, or if you cannot alter the database configuration in this way, you could write a function to replace accented characters with their non-accented equivalents (i.e. é -> e) and write this into a dedicated search field (a full-text search field is recommended). Perform searches on this field and return the accented field to the application.

或者,或者如果你不能以这种方式改变数据库配置,你可以编写一个函数来用它的非重音等价物(即é - > e)替换重音字符,并将其写入专用搜索字段(全文搜索)建议使用字段)。对此字段执行搜索并将重音字段返回给应用程序。

#1


2  

If you are able to change the character set of your column (or table) then set it to utf8_general_ci (link to manual):

如果您能够更改列(或表)的字符集,则将其设置为utf8_general_ci(链接到手册):

ALTER TABLE tbl_name CONVERT TO CHARACTER SET utf8_general_ci;

With this character set (which is case insensitive, as denoted by _ci), accented characters have the same weight (the value used for collation), so they return true when compared with each other (link to manual):

使用此字符集(不区分大小写,如_ci所示),重音字符具有相同的权重(用于整理的值),因此它们在相互比较时返回true(链接到手动):

Non-UCA collations have a one-to-one mapping from character code to weight. In MySQL, such collations are case insensitive and accent insensitive. utf8_general_ci is an example: 'a', 'A', 'À', and 'á' each have different character codes but all have a weight of 0x0041 and compare as equal.

非UCA排序规则具有从字符代码到权重的一对一映射。在MySQL中,这种排序规则不区分大小写并且不区分重音。 utf8_general_ci就是一个例子:'a','A','À'和'á'每个都有不同的字符代码,但它们的权重都是0x0041并且相等。

mysql> SET NAMES 'utf8' COLLATE 'utf8_general_ci';
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT 'a' = 'A', 'a' = 'À', 'a' = 'á';
+-----------+-----------+-----------+
| 'a' = 'A' | 'a' = 'À' | 'a' = 'á' |
+-----------+-----------+-----------+
|         1 |         1 |         1 |
+-----------+-----------+-----------+
1 row in set (0.06 sec)

Alternatively, or if you cannot alter the database configuration in this way, you could write a function to replace accented characters with their non-accented equivalents (i.e. é -> e) and write this into a dedicated search field (a full-text search field is recommended). Perform searches on this field and return the accented field to the application.

或者,或者如果你不能以这种方式改变数据库配置,你可以编写一个函数来用它的非重音等价物(即é - > e)替换重音字符,并将其写入专用搜索字段(全文搜索)建议使用字段)。对此字段执行搜索并将重音字段返回给应用程序。