PHP / MySQL:突出显示“SOUNDS LIKE”查询结果

时间:2022-01-01 11:58:59

Quick MYSQL/PHP question. I'm using a "not-so-strict" search query as a fallback if no results are found with a normal search query, to the tune of:

快速MYSQL / PHP问题。如果没有找到普通搜索查询的结果,我使用“不那么严格”的搜索查询作为后备,调整为:

foreach($find_array as $word) { 
  clauses[] = "(firstname SOUNDS LIKE '$word%' OR lastname SOUNDS LIKE '$word%')";
}
if (!empty($clauses)) $filter='('.implode(' AND ', $clauses).')';
$query = "SELECT * FROM table WHERE $filter";

Now, I'm using PHP to highlight the results, like:

现在,我正在使用PHP来突出显示结果,例如:

foreach ($find_array as $term_to_highlight){
    foreach ($result as $key => $result_string){
        $result[$key]=highlight_stuff($result_string, $term_to_highlight);
    }
}

But this method falls on its ass when I don't know what to highlight. Is there any way to find out what the "sound-alike" match is when running that mysql query?

但是当我不知道要强调什么时,这种方法就会出现问题。有什么办法可以找出运行mysql查询时“声音相似”匹配的内容吗?

That is to say, if someone searches for "Joan" I want it to highlight "John" instead.

也就是说,如果有人搜索“Joan”,我希望它突出显示“John”。

2 个解决方案

#1


6  

The SOUND LIKE condition just compares the SOUNDEX key of both words, and you can use the PHP soundex() function to generate the same key.

SOUND LIKE条件只比较两个单词的SOUNDEX键,您可以使用PHP soundex()函数生成相同的键。

So, if you found a matching row and needed to find out which word to highlight, you can fetch both the firstname and lastname, and then use PHP to find which one matches and highlight just that word.

因此,如果找到匹配的行并需要找出要突出显示的单词,则可以同时获取firstname和lastname,然后使用PHP查找哪个匹配并突出显示该单词。

I made this code just to try this out. (Had to test my theory xD)

我做这个代码只是为了试试这个。 (不得不测试我的理论xD)

<?php
// A space seperated string of keywords, presumably from a search box somewhere.
$search_string = 'John Doe';

// Create a data array to contain the keywords and their matches.
// Keywords are grouped by their soundex keys.
$data = array();
foreach(explode(' ', $search_string) as $_word) {
    $data[soundex($_word)]['keywords'][] = $_word;
}

// Execute a query to find all rows matching the soundex keys for the words.
$soundex_list = "'". implode("','", array_keys($data)) ."'";
$sql = "SELECT id, firstname, lastname
        FROM   sounds_like
        WHERE  SOUNDEX(firstname) IN({$soundex_list})
        OR     SOUNDEX(lastname)  IN({$soundex_list})";
$sql_result = $dbLink->query($sql);

// Add the matches to their respective soundex key in the data array.
// This checks which word matched, the first or last name, and tags
// that word as the match so it can be highlighted later.
if($sql_result) {
    while($_row = $sql_result->fetch_assoc()) {
        foreach($data as $_soundex => &$_elem) {
            if(soundex($_row['firstname']) == $_soundex) {
                $_row['matches'] = 'firstname';
                $_elem['matches'][] = $_row;
            }
            else if(soundex($_row['lastname']) == $_soundex) {
                $_row['matches'] = 'lastname';
                $_elem['matches'][] = $_row;
            }
        }
    }
}

// Print the results as a simple text list.
header('content-type: text/plain');
echo "-- Possible results --\n";

foreach($data as $_group) {
    // Print the keywords for this group's soundex key.
    $keyword_list = "'". implode("', '", $_group['keywords']) ."'";
    echo "For keywords: {$keyword_list}\n";

    // Print all the matches for this group, if any.
    if(isset($_group['matches']) && count($_group['matches']) > 0) {
        foreach($_group['matches'] as $_match) {
            // Highlight the matching word by encapsulatin it in dashes.
            if($_match['matches'] == 'firstname') {
                $_match['firstname'] = "-{$_match['firstname']}-";
            }
            else {
                $_match['lastname'] = "-{$_match['lastname']}-";
            }

            echo " #{$_match['id']}: {$_match['firstname']} {$_match['lastname']}\n";
        }
    }
    else {
        echo " No matches.\n";
    }
}
?>

A more generalized function, to pull out the matching soundex word from a strings could look like:

一个更通用的函数,从字符串中提取匹配的soundex单词可能如下所示:

<?php
/**
 * Attempts to find the first word in the $heystack that is a soundex
 * match for the $needle.
 */
function find_soundex_match($heystack, $needle) {
    $words = explode(' ', $heystack);
    $needle_soundex = soundex($needle);
    foreach($words as $_word) {
        if(soundex($_word) == $needle_soundex) {
            return $_word;
        }
    }
    return false;
}
?>

Which, if I am understanding it correctly, could be used in your previously posted code as:

如果我理解正确,可以在以前发布的代码中使用:

foreach ($find_array as $term_to_highlight){
    foreach ($result as $key => $result_string){
        $match_to_highlight = find_soundex_match($result_string, $term_to_highlight);
        $result[$key]=highlight_stuff($result_string, $match_to_highlight);
    }
}

This wouldn't be as efficient tho, as the more targeted code in the first snippet.

这不会像第一个代码段中更具针对性的代码那样高效。

#2


8  

Note that SOUNDS LIKE does not work as you think it does. It is not equivalent to LIKE in MySQL, as it does not support the % wildcard.

请注意,SOUNDS LIKE并不像您认为的那样有效。它不等同于MySQL中的LIKE,因为它不支持%通配符。

This means your query will not find "John David" when searching for "John". This might be acceptable if this is just your fallback, but it is not ideal.

这意味着在搜索“John”时,您的查询将找不到“John David”。如果这只是你的后备,这可能是可以接受的,但这并不理想。

So here is a different suggestion (that might need improvement); first use PHPs soundex() function to find the soundex of the keyword you are looking for.

所以这是一个不同的建议(可能需要改进);首先使用PHPs soundex()函数来查找您要查找的关键字的soundex。

$soundex = soundex($word);
$soundexPrefix = substr($soundex, 0, 2); // first two characters of soundex
$sql = "SELECT lastname, firstname ".
    "FROM table WHERE SOUNDEX(lastname) LIKE '$soundexPrefix%' ".
    "OR SOUNDEX(firstname) LIKE '$soundexPrefix%'";

Now you'll have a list of firstnames and lastnames that has a vague similarity in sounding (this might be a lot entries, and you might want to increase the length of the soundex prefix you use for your search). You can then calculate the Levenshtein distance between the soundex of each word and your search term, and sort by that.

现在,您将拥有一个名词与名字相似的名字和姓氏的列表(这可能是很多条目,您可能希望增加用于搜索的soundex前缀的长度)。然后,您可以计算每个单词的soundex与搜索词之间的Levenshtein距离,并按此排序。

Second, you should look at parameterized queries in MySQL, to avoid SQL injection bugs.

其次,您应该查看MySQL中的参数化查询,以避免SQL注入错误。

#1


6  

The SOUND LIKE condition just compares the SOUNDEX key of both words, and you can use the PHP soundex() function to generate the same key.

SOUND LIKE条件只比较两个单词的SOUNDEX键,您可以使用PHP soundex()函数生成相同的键。

So, if you found a matching row and needed to find out which word to highlight, you can fetch both the firstname and lastname, and then use PHP to find which one matches and highlight just that word.

因此,如果找到匹配的行并需要找出要突出显示的单词,则可以同时获取firstname和lastname,然后使用PHP查找哪个匹配并突出显示该单词。

I made this code just to try this out. (Had to test my theory xD)

我做这个代码只是为了试试这个。 (不得不测试我的理论xD)

<?php
// A space seperated string of keywords, presumably from a search box somewhere.
$search_string = 'John Doe';

// Create a data array to contain the keywords and their matches.
// Keywords are grouped by their soundex keys.
$data = array();
foreach(explode(' ', $search_string) as $_word) {
    $data[soundex($_word)]['keywords'][] = $_word;
}

// Execute a query to find all rows matching the soundex keys for the words.
$soundex_list = "'". implode("','", array_keys($data)) ."'";
$sql = "SELECT id, firstname, lastname
        FROM   sounds_like
        WHERE  SOUNDEX(firstname) IN({$soundex_list})
        OR     SOUNDEX(lastname)  IN({$soundex_list})";
$sql_result = $dbLink->query($sql);

// Add the matches to their respective soundex key in the data array.
// This checks which word matched, the first or last name, and tags
// that word as the match so it can be highlighted later.
if($sql_result) {
    while($_row = $sql_result->fetch_assoc()) {
        foreach($data as $_soundex => &$_elem) {
            if(soundex($_row['firstname']) == $_soundex) {
                $_row['matches'] = 'firstname';
                $_elem['matches'][] = $_row;
            }
            else if(soundex($_row['lastname']) == $_soundex) {
                $_row['matches'] = 'lastname';
                $_elem['matches'][] = $_row;
            }
        }
    }
}

// Print the results as a simple text list.
header('content-type: text/plain');
echo "-- Possible results --\n";

foreach($data as $_group) {
    // Print the keywords for this group's soundex key.
    $keyword_list = "'". implode("', '", $_group['keywords']) ."'";
    echo "For keywords: {$keyword_list}\n";

    // Print all the matches for this group, if any.
    if(isset($_group['matches']) && count($_group['matches']) > 0) {
        foreach($_group['matches'] as $_match) {
            // Highlight the matching word by encapsulatin it in dashes.
            if($_match['matches'] == 'firstname') {
                $_match['firstname'] = "-{$_match['firstname']}-";
            }
            else {
                $_match['lastname'] = "-{$_match['lastname']}-";
            }

            echo " #{$_match['id']}: {$_match['firstname']} {$_match['lastname']}\n";
        }
    }
    else {
        echo " No matches.\n";
    }
}
?>

A more generalized function, to pull out the matching soundex word from a strings could look like:

一个更通用的函数,从字符串中提取匹配的soundex单词可能如下所示:

<?php
/**
 * Attempts to find the first word in the $heystack that is a soundex
 * match for the $needle.
 */
function find_soundex_match($heystack, $needle) {
    $words = explode(' ', $heystack);
    $needle_soundex = soundex($needle);
    foreach($words as $_word) {
        if(soundex($_word) == $needle_soundex) {
            return $_word;
        }
    }
    return false;
}
?>

Which, if I am understanding it correctly, could be used in your previously posted code as:

如果我理解正确,可以在以前发布的代码中使用:

foreach ($find_array as $term_to_highlight){
    foreach ($result as $key => $result_string){
        $match_to_highlight = find_soundex_match($result_string, $term_to_highlight);
        $result[$key]=highlight_stuff($result_string, $match_to_highlight);
    }
}

This wouldn't be as efficient tho, as the more targeted code in the first snippet.

这不会像第一个代码段中更具针对性的代码那样高效。

#2


8  

Note that SOUNDS LIKE does not work as you think it does. It is not equivalent to LIKE in MySQL, as it does not support the % wildcard.

请注意,SOUNDS LIKE并不像您认为的那样有效。它不等同于MySQL中的LIKE,因为它不支持%通配符。

This means your query will not find "John David" when searching for "John". This might be acceptable if this is just your fallback, but it is not ideal.

这意味着在搜索“John”时,您的查询将找不到“John David”。如果这只是你的后备,这可能是可以接受的,但这并不理想。

So here is a different suggestion (that might need improvement); first use PHPs soundex() function to find the soundex of the keyword you are looking for.

所以这是一个不同的建议(可能需要改进);首先使用PHPs soundex()函数来查找您要查找的关键字的soundex。

$soundex = soundex($word);
$soundexPrefix = substr($soundex, 0, 2); // first two characters of soundex
$sql = "SELECT lastname, firstname ".
    "FROM table WHERE SOUNDEX(lastname) LIKE '$soundexPrefix%' ".
    "OR SOUNDEX(firstname) LIKE '$soundexPrefix%'";

Now you'll have a list of firstnames and lastnames that has a vague similarity in sounding (this might be a lot entries, and you might want to increase the length of the soundex prefix you use for your search). You can then calculate the Levenshtein distance between the soundex of each word and your search term, and sort by that.

现在,您将拥有一个名词与名字相似的名字和姓氏的列表(这可能是很多条目,您可能希望增加用于搜索的soundex前缀的长度)。然后,您可以计算每个单词的soundex与搜索词之间的Levenshtein距离,并按此排序。

Second, you should look at parameterized queries in MySQL, to avoid SQL injection bugs.

其次,您应该查看MySQL中的参数化查询,以避免SQL注入错误。