哪个更快 - INSTR或LIKE?

时间:2021-09-30 13:33:51

If your goal is to test if a string exists in a MySQL column (of type 'varchar', 'text', 'blob', etc) which of the following is faster / more efficient / better to use, and why?

如果你的目标是测试一个字符串是否存在于MySQL列('varchar','text','blob'等类型)中,以下哪个更快/更有效/更好使用,为什么?

Or, is there some other method that tops either of these?

或者,是否还有其他一些方法可以超越其中任何一个?

INSTR( columnname, 'mystring' ) > 0

vs

VS

columnname LIKE '%mystring%'

4 个解决方案

#1


45  

FULLTEXT searches are absolutely going to be faster, as kibibu noted in the comments above.

正如kibibu在上面的评论中指出的那样,FULLTEXT搜索绝对会更快。

However:

然而:

mysql> select COUNT(ID) FROM table WHERE INSTR(Name,'search') > 0;
+-----------+
| COUNT(ID) |
+-----------+
|     40735 | 
+-----------+
1 row in set (5.54 sec)

mysql> select COUNT(ID) FROM table WHERE Name LIKE '%search%';
+-----------+
| COUNT(ID) |
+-----------+
|     40735 | 
+-----------+
1 row in set (5.54 sec)

In my tests, they perform exactly the same. They are both case-insensitive, and generally they perform full-table scans, a general no-no when dealing with high-performance MySQL.

在我的测试中,它们表现完全一样。它们都不区分大小写,通常它们执行全表扫描,在处理高性能MySQL时通常是禁止。

Unless you are doing a prefix search on an indexed column:

除非您在索引列上执行前缀搜索:

mysql> select COUNT(ID) FROM table WHERE Name LIKE 'search%';
+-----------+
| COUNT(ID) |
+-----------+
|         7 | 
+-----------+
1 row in set (3.88 sec)

In which case, the LIKE with only a suffix wildcard is much faster.

在这种情况下,只有后缀通配符的LIKE要快得多。

#2


11  

MySQL - INSTR vs LOCATE vs LIKE vs REGEXP

MySQL - INSTR vs LOCATE vs LIKE vs REGEXP

For me the INSTR and LOCATE performed the fastest:

对我来说,INSTR和LOCATE的表现最快:

# 5.074 sec
SELECT BENCHMARK(100000000,INSTR('foobar','foo'));

# 5.086 sec
SELECT BENCHMARK(100000000,LOCATE('foo','foobar')); 

# 8.990 sec
SELECT BENCHMARK(100000000,'foobar' LIKE '%foo%');

# 14.433 sec
SELECT BENCHMARK(100000000,'foobar' REGEXP 'foo'); 

# 5.5.35-0ubuntu0.12.10.2 
SELECT @@version;

#3


10  

In the case of a "front wilcard" (i.e. a "LIKE '%...'" predicate) as seems to be the case here, INSTR and LIKE should perform roughly the same.

在“前wilcard”(即“LIKE'%......'”谓词)的情况下,似乎就是这种情况,INSTR和LIKE应该大致相同。

When the wildcard is not a "front wildcard", the LIKE approach should be faster, unless the wildcard is not very selective.

当通配符不是“前通配符”时,LIKE方法应该更快,除非通配符不是非常有选择性。

The reason why the type of wildcard and its selectivity matter is that a predicate with INSTR() will systematically result in a table scan (SQL cannot make any assumptions about the semantics of INSTR), whereby SQL can leverage its understanding of the semantics of the LIKE predicate to maybe use an index to help it only test a reduced set of possible matches.

通配符类型及其选择性重要的原因是具有INSTR()的谓词将系统地导致表扫描(SQL无法对INSTR的语义做出任何假设),从而SQL可以利用其对语义的理解。 LIKE谓词可能使用索引来帮助它只测试一组减少的可能匹配。

As suggested in comment under the question itself, a Full Text index will be much faster. The difference depends on the specific distribution of words within the text, and also the overall table size, etc. but expect anything from twice as fast to maybe 10 times as fast.

正如问题本身的评论所示,全文索引会更快。差异取决于文本中单词的具体分布,以及整体表格大小等,但期望从快两倍到可能快10倍。

A possible downside of using at fulltext index, in addition to the general overhead for creating such an index, is that unless one is very careful in configuring this index (ex: defining the stop word list, using specific search syntax to avoid inflectional forms and the like...), there may be cases where the results provided by FullText will not be as expected. For example, searching for a "SAW" (a tool to cut wood), one may get a lot of hits for records including the verb "to see", in its various conjugated forms.
Of course, these linguistic-aware features of fulltext indexes can typically be overridden and also one may consider that such features are effectively a advantage, not a drawback. I just mention this here since we're comparing this to a plain wildcard search.

除了创建这样一个索引的一般开销之外,使用全文索引的一个可能的缺点是,除非在配置此索引时非常小心(例如:定义停用词列表,使用特定的搜索语法来避免屈折形式和类似...),可能会出现FullText提供的结果不符合预期的情况。例如,搜索“SAW”(切割木材的工具),可以获得大量的记录,包括动词“看到”,以其各种共轭形式。当然,全文索引的这些语言感知特征通常可以被覆盖,并且还可以认为这些特征实际上是有利的,而不是缺点。我在这里提到这个,因为我们将它与普通的通配符搜索进行比较。

#4


3  

There is little to add to razzed's test. But apparently using regexp does incur a much heavier processing load, unlike what Seth points out in his comment.

razzed的测试几乎没有什么可补充的。但显然使用regexp确实会产生更大的处理负荷,这与Seth在评论中指出的不同。

The following tests assume that you set query_caching to On in my.ini

以下测试假定您在my.ini中将query_caching设置为On

query_cache_type = 1
query_cache_size = 64M

Tests

测试

  • The timings show the average performance, out of three measurements (with the cache cleared intermittently):

    时间显示三次测量中的平均性能(间歇性地清除缓存):

  • LIKE

    喜欢

    SELECT * FROM `domain_model_offers` WHERE `description` LIKE '%inform%' LIMIT 0 , 30
    

    Initial: 0.0035s
    Cached: 0.0005s

    初始值:0.0035s缓存:0.0005s

  • REGEXP

    REGEXP

    SELECT * FROM `domain_model_offers` WHERE `description` REGEXP 'inform' LIMIT 0 , 30
    

    Initial: 0.01s
    Cached: 0.0004s

    初始值:0.01s缓存:0.0004s

Result

结果

LIKE or INSTR is definitely faster than REGEXP.

LIKE或INSTR肯定比REGEXP更快。

Though minimal, the cache timing difference is probably sufficient to warrant further investigation.

虽然最小,缓存时序差异可能足以进一步调查。

On a probably configured MySQL system, fulltext indexing should generally be always faster or at least on par with a nonindexed search. So use indexing, especially on long human language texts, regardless of intermittent markup code.

在可能配置的MySQL系统上,全文索引通常应该总是更快或至少与非索引搜索相同。因此,无论间歇性标记代码如何,都要使用索引,尤其是长文本文本。

#1


45  

FULLTEXT searches are absolutely going to be faster, as kibibu noted in the comments above.

正如kibibu在上面的评论中指出的那样,FULLTEXT搜索绝对会更快。

However:

然而:

mysql> select COUNT(ID) FROM table WHERE INSTR(Name,'search') > 0;
+-----------+
| COUNT(ID) |
+-----------+
|     40735 | 
+-----------+
1 row in set (5.54 sec)

mysql> select COUNT(ID) FROM table WHERE Name LIKE '%search%';
+-----------+
| COUNT(ID) |
+-----------+
|     40735 | 
+-----------+
1 row in set (5.54 sec)

In my tests, they perform exactly the same. They are both case-insensitive, and generally they perform full-table scans, a general no-no when dealing with high-performance MySQL.

在我的测试中,它们表现完全一样。它们都不区分大小写,通常它们执行全表扫描,在处理高性能MySQL时通常是禁止。

Unless you are doing a prefix search on an indexed column:

除非您在索引列上执行前缀搜索:

mysql> select COUNT(ID) FROM table WHERE Name LIKE 'search%';
+-----------+
| COUNT(ID) |
+-----------+
|         7 | 
+-----------+
1 row in set (3.88 sec)

In which case, the LIKE with only a suffix wildcard is much faster.

在这种情况下,只有后缀通配符的LIKE要快得多。

#2


11  

MySQL - INSTR vs LOCATE vs LIKE vs REGEXP

MySQL - INSTR vs LOCATE vs LIKE vs REGEXP

For me the INSTR and LOCATE performed the fastest:

对我来说,INSTR和LOCATE的表现最快:

# 5.074 sec
SELECT BENCHMARK(100000000,INSTR('foobar','foo'));

# 5.086 sec
SELECT BENCHMARK(100000000,LOCATE('foo','foobar')); 

# 8.990 sec
SELECT BENCHMARK(100000000,'foobar' LIKE '%foo%');

# 14.433 sec
SELECT BENCHMARK(100000000,'foobar' REGEXP 'foo'); 

# 5.5.35-0ubuntu0.12.10.2 
SELECT @@version;

#3


10  

In the case of a "front wilcard" (i.e. a "LIKE '%...'" predicate) as seems to be the case here, INSTR and LIKE should perform roughly the same.

在“前wilcard”(即“LIKE'%......'”谓词)的情况下,似乎就是这种情况,INSTR和LIKE应该大致相同。

When the wildcard is not a "front wildcard", the LIKE approach should be faster, unless the wildcard is not very selective.

当通配符不是“前通配符”时,LIKE方法应该更快,除非通配符不是非常有选择性。

The reason why the type of wildcard and its selectivity matter is that a predicate with INSTR() will systematically result in a table scan (SQL cannot make any assumptions about the semantics of INSTR), whereby SQL can leverage its understanding of the semantics of the LIKE predicate to maybe use an index to help it only test a reduced set of possible matches.

通配符类型及其选择性重要的原因是具有INSTR()的谓词将系统地导致表扫描(SQL无法对INSTR的语义做出任何假设),从而SQL可以利用其对语义的理解。 LIKE谓词可能使用索引来帮助它只测试一组减少的可能匹配。

As suggested in comment under the question itself, a Full Text index will be much faster. The difference depends on the specific distribution of words within the text, and also the overall table size, etc. but expect anything from twice as fast to maybe 10 times as fast.

正如问题本身的评论所示,全文索引会更快。差异取决于文本中单词的具体分布,以及整体表格大小等,但期望从快两倍到可能快10倍。

A possible downside of using at fulltext index, in addition to the general overhead for creating such an index, is that unless one is very careful in configuring this index (ex: defining the stop word list, using specific search syntax to avoid inflectional forms and the like...), there may be cases where the results provided by FullText will not be as expected. For example, searching for a "SAW" (a tool to cut wood), one may get a lot of hits for records including the verb "to see", in its various conjugated forms.
Of course, these linguistic-aware features of fulltext indexes can typically be overridden and also one may consider that such features are effectively a advantage, not a drawback. I just mention this here since we're comparing this to a plain wildcard search.

除了创建这样一个索引的一般开销之外,使用全文索引的一个可能的缺点是,除非在配置此索引时非常小心(例如:定义停用词列表,使用特定的搜索语法来避免屈折形式和类似...),可能会出现FullText提供的结果不符合预期的情况。例如,搜索“SAW”(切割木材的工具),可以获得大量的记录,包括动词“看到”,以其各种共轭形式。当然,全文索引的这些语言感知特征通常可以被覆盖,并且还可以认为这些特征实际上是有利的,而不是缺点。我在这里提到这个,因为我们将它与普通的通配符搜索进行比较。

#4


3  

There is little to add to razzed's test. But apparently using regexp does incur a much heavier processing load, unlike what Seth points out in his comment.

razzed的测试几乎没有什么可补充的。但显然使用regexp确实会产生更大的处理负荷,这与Seth在评论中指出的不同。

The following tests assume that you set query_caching to On in my.ini

以下测试假定您在my.ini中将query_caching设置为On

query_cache_type = 1
query_cache_size = 64M

Tests

测试

  • The timings show the average performance, out of three measurements (with the cache cleared intermittently):

    时间显示三次测量中的平均性能(间歇性地清除缓存):

  • LIKE

    喜欢

    SELECT * FROM `domain_model_offers` WHERE `description` LIKE '%inform%' LIMIT 0 , 30
    

    Initial: 0.0035s
    Cached: 0.0005s

    初始值:0.0035s缓存:0.0005s

  • REGEXP

    REGEXP

    SELECT * FROM `domain_model_offers` WHERE `description` REGEXP 'inform' LIMIT 0 , 30
    

    Initial: 0.01s
    Cached: 0.0004s

    初始值:0.01s缓存:0.0004s

Result

结果

LIKE or INSTR is definitely faster than REGEXP.

LIKE或INSTR肯定比REGEXP更快。

Though minimal, the cache timing difference is probably sufficient to warrant further investigation.

虽然最小,缓存时序差异可能足以进一步调查。

On a probably configured MySQL system, fulltext indexing should generally be always faster or at least on par with a nonindexed search. So use indexing, especially on long human language texts, regardless of intermittent markup code.

在可能配置的MySQL系统上,全文索引通常应该总是更快或至少与非索引搜索相同。因此,无论间歇性标记代码如何,都要使用索引,尤其是长文本文本。