搜索包含另一个字符串中所有单词的varchar字段

时间:2023-02-10 01:33:10

trying to do a small stored procedure without needing to add freetext indexing just for this (SQL Server 2008)

试图做一个小的存储过程,而不需要为此添加*文本索引(SQL Server 2008)

Basically, I want to find all records where a certain field contains all the words from a parameter.

基本上,我想找到某个字段包含参数中所有单词的所有记录。

So if in the field I have "This is a test field", and the parameter to my SP would be "this test field" it would return it, as it would if the parameter was "field this test".

因此,如果在该字段中我有“这是一个测试字段”,并且我的SP的参数将是“此测试字段”,它将返回它,如果参数是“字段此测试”那样。

The table is very small (4000) record and load will be low, so efficiency is not a big deal. Right now the only solution i can think of is to split both strings with table valued function and go from there.

该表非常小(4000)记录,负载会很低,因此效率不是什么大问题。现在,我能想到的唯一解决方案是将具有表值函数的两个字符串拆分并从那里开始。

Any simpler idea?

任何更简单的想法?

Thanks!

谢谢!

2 个解决方案

#1


1  

Here is a solution using recursive CTEs. This actually uses two separate recursions. The first one splits the strings into tokens and the second one recursively filters the records using each token.

这是使用递归CTE的解决方案。这实际上使用了两个单独的递归。第一个将字符串拆分为标记,第二个使用每个标记递归过滤记录。

declare     
    @searchString varchar(max),
    @delimiter char;

select 
@searchString  = 'This is a test field'
,@delimiter = ' '

declare @tokens table(pos int, string varchar(max))

 ;WITH Tokens(pos, start, stop) AS (
      SELECT 1, 1, CONVERT(int, CHARINDEX(@delimiter, @searchString))
      UNION ALL
      SELECT pos + 1, stop + 1, CONVERT(int, CHARINDEX(@delimiter, @searchString, stop + 1))
      FROM Tokens
      WHERE stop > 0
    )
    INSERT INTO @tokens
    SELECT pos,
      SUBSTRING(@searchString, start, CASE WHEN stop > 0 THEN stop-start ELSE 512 END) AS string
    FROM Tokens
    OPTION (MAXRECURSION 25000) ;

;with filter(ind, myfield) as (
    select  1,myfield from mytable where myfield like '%'+(select string from @tokens where pos = 1)+'%'    
    union all
    select  ind + 1, myfield from filter where myfield like '%'+(select string from @tokens where pos = ind + 1)+'%'    
    )

    select * from filter where ind = (select COUNT(1) from @tokens)

This took me about 15 seconds to search a table of 10k records for the search string 'this is a test field'.. (the more words in the string, the longer it takes.. )

这花了我大约15秒的时间来搜索10k记录的表格,搜索字符串'这是一个测试字段'..(字符串中的单词越多,所需的时间越长......)

Edit
If you want a fuzzy search i.e return closely matching results even if there wasnt an exact match, you could modify the last line in the query to be -
select * from (select max(ind) as ind, myfield from filter group by myfield) t order by ind desc

编辑如果你想要一个模糊搜索,即使没有完全匹配也会返回非常匹配的结果,你可以修改查询中的最后一行 - 选择*从(选择max(ind)as ind,myfield from filter group by myfield )由ind desc命令

'ind' would give you the number of words from the search string found in myfield.

'ind'会显示myfield中搜索字符串中的单词数。

#2


2  

If efficency is not a big problem, why not go with a bit of dynamic SQL. Something like:

如果效率不是一个大问题,为什么不采用一些动态SQL。就像是:

create procedure myproc (@var varchar(100))
as
set @var = '%' + replace(@var, ' ', '%') + '%'
exec ('select * from mytable where myfield like '''+ @var + '''')

#1


1  

Here is a solution using recursive CTEs. This actually uses two separate recursions. The first one splits the strings into tokens and the second one recursively filters the records using each token.

这是使用递归CTE的解决方案。这实际上使用了两个单独的递归。第一个将字符串拆分为标记,第二个使用每个标记递归过滤记录。

declare     
    @searchString varchar(max),
    @delimiter char;

select 
@searchString  = 'This is a test field'
,@delimiter = ' '

declare @tokens table(pos int, string varchar(max))

 ;WITH Tokens(pos, start, stop) AS (
      SELECT 1, 1, CONVERT(int, CHARINDEX(@delimiter, @searchString))
      UNION ALL
      SELECT pos + 1, stop + 1, CONVERT(int, CHARINDEX(@delimiter, @searchString, stop + 1))
      FROM Tokens
      WHERE stop > 0
    )
    INSERT INTO @tokens
    SELECT pos,
      SUBSTRING(@searchString, start, CASE WHEN stop > 0 THEN stop-start ELSE 512 END) AS string
    FROM Tokens
    OPTION (MAXRECURSION 25000) ;

;with filter(ind, myfield) as (
    select  1,myfield from mytable where myfield like '%'+(select string from @tokens where pos = 1)+'%'    
    union all
    select  ind + 1, myfield from filter where myfield like '%'+(select string from @tokens where pos = ind + 1)+'%'    
    )

    select * from filter where ind = (select COUNT(1) from @tokens)

This took me about 15 seconds to search a table of 10k records for the search string 'this is a test field'.. (the more words in the string, the longer it takes.. )

这花了我大约15秒的时间来搜索10k记录的表格,搜索字符串'这是一个测试字段'..(字符串中的单词越多,所需的时间越长......)

Edit
If you want a fuzzy search i.e return closely matching results even if there wasnt an exact match, you could modify the last line in the query to be -
select * from (select max(ind) as ind, myfield from filter group by myfield) t order by ind desc

编辑如果你想要一个模糊搜索,即使没有完全匹配也会返回非常匹配的结果,你可以修改查询中的最后一行 - 选择*从(选择max(ind)as ind,myfield from filter group by myfield )由ind desc命令

'ind' would give you the number of words from the search string found in myfield.

'ind'会显示myfield中搜索字符串中的单词数。

#2


2  

If efficency is not a big problem, why not go with a bit of dynamic SQL. Something like:

如果效率不是一个大问题,为什么不采用一些动态SQL。就像是:

create procedure myproc (@var varchar(100))
as
set @var = '%' + replace(@var, ' ', '%') + '%'
exec ('select * from mytable where myfield like '''+ @var + '''')