在MySQL中搜索“全词匹配”

时间:2022-09-19 19:15:12

I would like to write an SQL query that searches for a keyword in a text field, but only if it is a "whole word match" (e.g. when I search for "rid", it should not match "arid", but it should match "a rid".

我想编写一个SQL查询,在一个文本字段中搜索关键字,但只有当它是一个“全词匹配”(例如,当我搜索“rid”时,它不应该匹配“arid”,但它应该匹配“arid”。

I am using MySQL.

我使用MySQL。

Fortunately, performance is not critical in this application, and the database size and string size are both comfortably small, but I would prefer to do it in the SQL than in the PHP driving it.

幸运的是,性能在这个应用程序中并不重要,而且数据库大小和字符串大小都非常小,但是我更喜欢在SQL中执行,而不是在PHP中执行。

6 个解决方案

#1


131  

You can use REGEXP and the [[:<:]] and [[:>:]] word-boundary markers:

您可以使用REGEXP和[[:<:]]]和[[:>:]]字边界标记:

SELECT *
FROM table 
WHERE keywords REGEXP '[[:<:]]rid[[:>:]]'

#2


24  

Found an answer to prevent the classic word boundary [[::<::]] *ing with special characters eg .@#$%^&*

找到一个答案,以防止经典的词界[[::<::]]冲突与特殊字符如.@ # $ % ^ & *

Replace..

代替. .

SELECT *
FROM table 
WHERE keywords REGEXP '[[:<:]]rid[[:>:]]'

With this..

这个. .

SELECT *
FROM table 
WHERE keywords REGEXP '([[:blank:][:punct:]]|^)rid([[:blank:][:punct:]]|$)'

The latter matches (space, tab, etc) || (comma, bracket etc) || start/end of line. A more 'finished' word boundary match.

后者匹配(空格、制表符等)||(逗号、括号等)||开始/结束行。一个更“完成”的单词边界匹配。

#3


2  

select blah blah blah
where column like 'rid %'
   or column like '% rid'
   or column like '% rid %'
   or column =    'rid'

#4


1  

select * from table where Locate('rid ', FieldToSearch) > 0 
      or Locate(' rid', FieldToSearch) > 0

This will handle finding rid where it is preceded or followed by a space, you could extend the approach to take account of .,?! and so on, not elegant but easy.

这将处理在前面或后面有空格的地方查找rid,您可以扩展方法来考虑。等等,不是优雅,而是简单。

#5


1  

This is the best answer I've come up myself with so far:

这是我至今找到的最好的答案:

SELECT * FROM table 
WHERE keywords REGEXP '^rid[ $]' OR keywords REGEXP ' rid[ $]'

I would have simplified it to:

我将它简化为:

SELECT *
FROM table
WHERE keywords REGEXP '[^ ]rid[ $]'

but [^ ] has a special meaning of "NOT a space", rather than "line-beginning or space".

但[^]有特殊意义的“不是一个空间”,而不是“line-beginning或空间”。

How does REGEXP compare to multiple LIKE conditions? (Not that performance matters in this app.)

REGEXP如何与多个类似的条件进行比较?(在这款应用中,性能并不重要。)

#6


1  

Use regexp with word boundaries, but if you want also accent insensitive search, please note that REGEXP is a single-byte operator, so it is Worth nothing to have utf8_general_ci collation, the match will not be accent insensitive.

使用带单词边界的regexp,但是如果您还希望重音不敏感的搜索,请注意regexp是一个单字节操作符,因此使用utf8_general_ci排序没有任何意义,匹配不会是重音不敏感的。

To have both accent insensitive and whole word match, specify the word written in the same way the (deprecated) PHP function sql_regcase() did.

要使重音不敏感和整个单词匹配,请指定使用(已弃用的)PHP函数sql_regcase()编写的单词。

In fact:

事实上:

  • utf8_general_ci allows you to make an equality (WHERE field = value) case and accent insensitive search but it doesn't allow you to specify an entire word match (word boundaries markers not recognized)

    utf8_general_ci允许进行等号(字段=值)大小写和重音不敏感搜索,但不允许指定整个单词匹配(单词边界标记无法识别)

  • LIKE allows you case and accent insensitive search but you have to manually specify all combinations of possible word boundaries charactes (word boundaries markers not recognized)

    比如允许你不区分大小写和重音但是你必须手动指定所有可能的单词边界字符的组合(单词边界标记不被识别)

  • word boundaries [[:<:]] and [[:>:]] are supported in REGEXP, who is a single byte functions so don't perform accent insensitive search.

    单词边界[[:<:]]]和[:>:]在REGEXP中得到支持,REGEXP是一个字节函数,因此不执行重音不敏感搜索。

The solution is to use REGEXP with word boundaries and the word modified in the way sql_regcase does.

解决方案是使用带单词边界的REGEXP并按照sql_regcase的方式修改单词。

Used on http://www.genovaperte.it

在http://www.genovaperte.it上使用

#1


131  

You can use REGEXP and the [[:<:]] and [[:>:]] word-boundary markers:

您可以使用REGEXP和[[:<:]]]和[[:>:]]字边界标记:

SELECT *
FROM table 
WHERE keywords REGEXP '[[:<:]]rid[[:>:]]'

#2


24  

Found an answer to prevent the classic word boundary [[::<::]] *ing with special characters eg .@#$%^&*

找到一个答案,以防止经典的词界[[::<::]]冲突与特殊字符如.@ # $ % ^ & *

Replace..

代替. .

SELECT *
FROM table 
WHERE keywords REGEXP '[[:<:]]rid[[:>:]]'

With this..

这个. .

SELECT *
FROM table 
WHERE keywords REGEXP '([[:blank:][:punct:]]|^)rid([[:blank:][:punct:]]|$)'

The latter matches (space, tab, etc) || (comma, bracket etc) || start/end of line. A more 'finished' word boundary match.

后者匹配(空格、制表符等)||(逗号、括号等)||开始/结束行。一个更“完成”的单词边界匹配。

#3


2  

select blah blah blah
where column like 'rid %'
   or column like '% rid'
   or column like '% rid %'
   or column =    'rid'

#4


1  

select * from table where Locate('rid ', FieldToSearch) > 0 
      or Locate(' rid', FieldToSearch) > 0

This will handle finding rid where it is preceded or followed by a space, you could extend the approach to take account of .,?! and so on, not elegant but easy.

这将处理在前面或后面有空格的地方查找rid,您可以扩展方法来考虑。等等,不是优雅,而是简单。

#5


1  

This is the best answer I've come up myself with so far:

这是我至今找到的最好的答案:

SELECT * FROM table 
WHERE keywords REGEXP '^rid[ $]' OR keywords REGEXP ' rid[ $]'

I would have simplified it to:

我将它简化为:

SELECT *
FROM table
WHERE keywords REGEXP '[^ ]rid[ $]'

but [^ ] has a special meaning of "NOT a space", rather than "line-beginning or space".

但[^]有特殊意义的“不是一个空间”,而不是“line-beginning或空间”。

How does REGEXP compare to multiple LIKE conditions? (Not that performance matters in this app.)

REGEXP如何与多个类似的条件进行比较?(在这款应用中,性能并不重要。)

#6


1  

Use regexp with word boundaries, but if you want also accent insensitive search, please note that REGEXP is a single-byte operator, so it is Worth nothing to have utf8_general_ci collation, the match will not be accent insensitive.

使用带单词边界的regexp,但是如果您还希望重音不敏感的搜索,请注意regexp是一个单字节操作符,因此使用utf8_general_ci排序没有任何意义,匹配不会是重音不敏感的。

To have both accent insensitive and whole word match, specify the word written in the same way the (deprecated) PHP function sql_regcase() did.

要使重音不敏感和整个单词匹配,请指定使用(已弃用的)PHP函数sql_regcase()编写的单词。

In fact:

事实上:

  • utf8_general_ci allows you to make an equality (WHERE field = value) case and accent insensitive search but it doesn't allow you to specify an entire word match (word boundaries markers not recognized)

    utf8_general_ci允许进行等号(字段=值)大小写和重音不敏感搜索,但不允许指定整个单词匹配(单词边界标记无法识别)

  • LIKE allows you case and accent insensitive search but you have to manually specify all combinations of possible word boundaries charactes (word boundaries markers not recognized)

    比如允许你不区分大小写和重音但是你必须手动指定所有可能的单词边界字符的组合(单词边界标记不被识别)

  • word boundaries [[:<:]] and [[:>:]] are supported in REGEXP, who is a single byte functions so don't perform accent insensitive search.

    单词边界[[:<:]]]和[:>:]在REGEXP中得到支持,REGEXP是一个字节函数,因此不执行重音不敏感搜索。

The solution is to use REGEXP with word boundaries and the word modified in the way sql_regcase does.

解决方案是使用带单词边界的REGEXP并按照sql_regcase的方式修改单词。

Used on http://www.genovaperte.it

在http://www.genovaperte.it上使用

相关文章