在MySQL数据库中进行REGEXP查找和替换的最快方法是什么?

时间:2021-12-13 18:32:29

I have a table with 700,000 entries and I need to check each entry for 1,000,000 words and then replace the word if found from hello to #~hello~#. Words can occur multiple times in an entry and need to all be replaced. I tried this in PHP and the estimated time was something like 362 days to complete the code. I just modified the code to use a LIKE in MySQL so that I didn't have check each of the 1,000,000 words against all 700,000 entries, but the estimated time for completion is still 29 days. This seems really high.

我有一个包含700,000个条目的表,我需要检查每个条目的1,000,000个单词然后将这个单词替换为hello中的#~hello~#。单词可以在条目中多次出现,需要全部替换。我在PHP中试过这个,估计完成代码的时间大约为362天。我刚刚修改了代码以在MySQL中使用LIKE,这样我就没有检查所有700,000个单词中的1,000,000个单词,但估计完成时间仍然是29天。这看起来真的很高。

Further complicating the matter words can be multiple words. For example if the word is hello world the program should replace with #~hello world~#.

进一步使问题单词复杂化可以是多个单词。例如,如果单词是hello world,程序应该替换为#~​​hello world~#。

What am I missing?

我错过了什么?

The code looks something like this:

代码看起来像这样:

$query = "SELECT word_id, word_name, FROM words ORDER BY char_length(word_name) DESC";
$result = mysqli_query($con, $query);
while($row = mysqli_fetch_array($result)){
  $words[$i] = new wordObj($row['word_id'], $row['word_name']);
}

Foreach($words as $word){
  $query = "SELECT id, entry FROM entries WHERE entry LIKE '%".$word."%'";
  $result = mysqli_query($con, $query);
  if ($result) {
    if ($result->num_rows != 0) {
      while($row = mysqli_fetch_array($result)){
        $entry[$i] = new meatObj($row['id'], $row['entry']);
        $i++;
      }
    }else{
      $entry = '';
    }
  }else{
    $entry ='';
  }
  foreach($entryArray as $entry){
    check entry for all words and replace
  }
}

1 个解决方案

#1


2  

The simplest solution would be storing all the words that needed to be replaced in the hash table. Then on each entry, we break all the word and check against the hash table.

最简单的解决方案是将所有需要替换的单词存储在哈希表中。然后在每个条目上,我们打破所有单词并检查哈希表。

// HOW DOES TAKE 29 DAYS TO EXECUTE?
// Create a hash table to store all the words
$hash = array();

$query = "SELECT word_id, word_name, FROM words ORDER BY char_length(word_name) DESC";
$result = mysqli_query($con, $query);
while($row = mysqli_fetch_array($result)){
    $hash[strtolower($row['word_name'])] = true;
}



// DO SOME QUERY HERE
// .....

while($row = mysqli_fetch_array($result)) {
    $delimiter = "/([ \.,\"'!\?\-_;])/";
    $tokens = preg_split($delimiter, $row['entry'], -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY));

    // replace the text
    $final = "";
    foreach($tokens as $token) {
        if (isset($hash[strtolower($token)])) {
            $final .= "#~" $token . "~#";
        } else {
            $final .= $token;
        }
    }

    // UPDATE NEW ENTRY HERE
    // .......
}

#1


2  

The simplest solution would be storing all the words that needed to be replaced in the hash table. Then on each entry, we break all the word and check against the hash table.

最简单的解决方案是将所有需要替换的单词存储在哈希表中。然后在每个条目上,我们打破所有单词并检查哈希表。

// HOW DOES TAKE 29 DAYS TO EXECUTE?
// Create a hash table to store all the words
$hash = array();

$query = "SELECT word_id, word_name, FROM words ORDER BY char_length(word_name) DESC";
$result = mysqli_query($con, $query);
while($row = mysqli_fetch_array($result)){
    $hash[strtolower($row['word_name'])] = true;
}



// DO SOME QUERY HERE
// .....

while($row = mysqli_fetch_array($result)) {
    $delimiter = "/([ \.,\"'!\?\-_;])/";
    $tokens = preg_split($delimiter, $row['entry'], -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY));

    // replace the text
    $final = "";
    foreach($tokens as $token) {
        if (isset($hash[strtolower($token)])) {
            $final .= "#~" $token . "~#";
        } else {
            $final .= $token;
        }
    }

    // UPDATE NEW ENTRY HERE
    // .......
}