Postgresql:连续列中的levenshtein距离,逗号分隔值

时间:2022-10-22 00:19:06

I have a table with the following content :

我有一张包含以下内容的表格:

ID | Name | Alias

ID |名称|别号

1 | William | Will,Willo,Wolli

1 |威廉|请问,Willo,伍利

I would like to return the row ID if the levenshtein distance (or metaphone, it does not matter) of a user-supplied string is lower than a defined threshold for the user name or any of the known aliases.

如果用户提供的字符串的levenshtein距离(或metaphone,无关紧要)低于用户名或任何已知别名的定义阈值,我想返回行ID。

I know that a possible solution is to use an additional table linking user IDs with user aliases, although I'd like to avoid it if possible.

我知道一个可能的解决方案是使用一个将用户ID与用户别名相关联的附加表,尽管如果可能的话我想避免使用它。

2 个解决方案

#1


2  

What you need is string split/explode. It could be done like this:

你需要的是字符串拆分/爆炸。可以这样做:

SELECT DISTINCT u.id FROM users AS u LEFT JOIN
(SELECT u.id,unnest(string_to_array(u.alias, ',')) AS ALIAS FROM users AS u) AS q
ON u.id=q.id
WHERE levenshtein(u.name,'Jill')<3
OR levenshtein(q.ALIAS,'Jill')<3;   

http://sqlfiddle.com/#!12/494e6/5

#2


2  

As usual, there is more than one solution:

像往常一样,有多个解决方案:

select  u.id
from    users u
where   3 >
any
(
    select  levenshtein ( 'Willey'::text, a )
    from    regexp_split_to_table
        (
            concat_ws ( ',' , u.name::text , u.alias::text )
        ,   ','
        ) as a
)

#1


2  

What you need is string split/explode. It could be done like this:

你需要的是字符串拆分/爆炸。可以这样做:

SELECT DISTINCT u.id FROM users AS u LEFT JOIN
(SELECT u.id,unnest(string_to_array(u.alias, ',')) AS ALIAS FROM users AS u) AS q
ON u.id=q.id
WHERE levenshtein(u.name,'Jill')<3
OR levenshtein(q.ALIAS,'Jill')<3;   

http://sqlfiddle.com/#!12/494e6/5

#2


2  

As usual, there is more than one solution:

像往常一样,有多个解决方案:

select  u.id
from    users u
where   3 >
any
(
    select  levenshtein ( 'Willey'::text, a )
    from    regexp_split_to_table
        (
            concat_ws ( ',' , u.name::text , u.alias::text )
        ,   ','
        ) as a
)