在PHP + MySQL中获取热门词汇

时间:2022-11-07 03:52:20

How do I go about getting the most popular words from multiple content tables in PHP/MySQL.

如何从PHP / MySQL中的多个内容表中获取最流行的单词。

For example, I have a table forum_post with forum post; this contains a subject and content. Besides these I have multiple other tables with different fields which could also contain content to be analysed.

例如,我有一个论坛帖子的论坛_post;这包含主题和内容。除了这些之外,我还有多个具有不同字段的其他表,这些表还可以包含要分析的内容。

I would probably myself go fetch all the content, strip (possible) html explode the string on spaces. remove quotes and comma's etc. and just count the words which are not common by saving an array whilst running through all the words.

我可能会自己去获取所有内容,条带(可能)html在空格上爆炸字符串。删除引号和逗号等,只需在运行所有单词时通过保存数组来计算不常见的单词。

My main question is if someone knows of a method which might be easier or faster.

我的主要问题是,如果有人知道一种可能更容易或更快的方法。

I couldn't seem to find any helpful answers about this it might be the wrong search patterns.

我似乎无法找到任何有用的答案,这可能是错误的搜索模式。

2 个解决方案

#1


3  

Somebody's already done it.

The magic you're looking for is a php function called str_word_count().

你正在寻找的魔法是一个名为str_word_count()的php函数。

In my example code below, if you get a lot of extraneous words from this you'll need to write custom stripping to remove them. Additionally you'll want to strip all of the html tags from the words and other characters as well.

在下面的示例代码中,如果您从中获得了大量无关的单词,则需要编写自定义剥离来删除它们。此外,您还希望从单词和其他字符中删除所有html标记。

I use something similar to this for keyword generation (obviously that code is proprietary). In short we're taking provided text, we're checking the word frequency and if the words come up in order we're sorting them in an array based on priority. So the most frequent words will be first in the output. We're not counting words that only occur once.

我使用与此类似的东西来生成关键字(显然代码是专有的)。简而言之,我们正在提供文本,我们正在检查单词频率,如果单词出现,我们将根据优先级对它们进行排序。所以最频繁的单词将在输出中排在第一位。我们不计算只出现过一次的单词。

<?php
$text = "your text.";

//Setup the array for storing word counts
$freqData = array();
foreach( str_word_count( $text, 1 ) as $words ){
// For each word found in the frequency table, increment its value by one
array_key_exists( $words, $freqData ) ? $freqData[ $words ]++ : $freqData[ $words ] = 1;
}

$list = '';
arsort($freqData);
foreach ($freqData as $word=>$count){
    if ($count > 2){
        $list .= "$word ";
    }
}
if (empty($list)){
    $list = "Not enough duplicate words for popularity contest.";   
}
echo $list;
?>

#2


0  

I see you've accepted an answer, but I want to give you an alternative that might be more flexible in a sense: (Decide for yourself :-)) I've not tested the code, but I think you get the picture. $dbh is a PDO connection object. It's then up to you what you want to do with the resulting $words array.

我看到你已经接受了答案,但我想给你一个在某种意义上可能更灵活的替代方案:(自己决定:-))我没有测试过代码,但我认为你能得到答案。 $ dbh是一个PDO连接对象。这取决于你想要用你生成的$ words数组做什么。

<?php
$words = array();

$tableName = 'party'; //The name of the table
countWordsFromTable($words, $tableName)

$tableName = 'party2'; //The name of the table
countWordsFromTable($words, $tableName)

//Example output array:
/*
$words['word'][0] = 'happy'; //Happy from table party
$words['wordcount'][0] = 5;
$words['word'][1] = 'bulldog'; //Bulldog from table party2
$words['wordcount'][1] = 15;
$words['word'][2] = 'pokerface'; //Pokerface from table party2
$words['wordcount'][2] = 2;
*/

$maxValues = array_keys($words, max($words)); //Get all keys with indexes of max values     of $words-array
$popularIndex = $maxValues[0]; //Get only one value...
$mostPopularWord = $words[$popularIndex]; 


function countWordsFromTable(&$words, $tableName) {

    //Get all fields from specific table
    $q = $dbh->prepare("DESCRIBE :tableName"); 
    $q->execute(array(':tableName' = > $tableName));
    $tableFields = $q->fetchAll(PDO::FETCH_COLUMN);

    //Go through all fields and store count of words and their content in array $words
    foreach($tableFields as $dbCol) {

        $wordCountQuery = "SELECT :dbCol as word, LENGTH(:dbCol) - LENGTH(REPLACE(:dbCol, ' ', ''))+1 AS wordcount FROM :tableName"; //Get count and the content of words from every column in db
        $q = $dbh->prepare($wordCountQuery);
        $q->execute(array(':dbCol' = > $dbCol));
        $wrds = $q->fetchAll(PDO::FETCH_ASSOC);

        //Add result to array $words
        foreach($wrds as $w) {
            $words['word'][] = $w['word'];
            $words['wordcount'][] = $w['wordcount'];
        }

    }
}
?>

#1


3  

Somebody's already done it.

The magic you're looking for is a php function called str_word_count().

你正在寻找的魔法是一个名为str_word_count()的php函数。

In my example code below, if you get a lot of extraneous words from this you'll need to write custom stripping to remove them. Additionally you'll want to strip all of the html tags from the words and other characters as well.

在下面的示例代码中,如果您从中获得了大量无关的单词,则需要编写自定义剥离来删除它们。此外,您还希望从单词和其他字符中删除所有html标记。

I use something similar to this for keyword generation (obviously that code is proprietary). In short we're taking provided text, we're checking the word frequency and if the words come up in order we're sorting them in an array based on priority. So the most frequent words will be first in the output. We're not counting words that only occur once.

我使用与此类似的东西来生成关键字(显然代码是专有的)。简而言之,我们正在提供文本,我们正在检查单词频率,如果单词出现,我们将根据优先级对它们进行排序。所以最频繁的单词将在输出中排在第一位。我们不计算只出现过一次的单词。

<?php
$text = "your text.";

//Setup the array for storing word counts
$freqData = array();
foreach( str_word_count( $text, 1 ) as $words ){
// For each word found in the frequency table, increment its value by one
array_key_exists( $words, $freqData ) ? $freqData[ $words ]++ : $freqData[ $words ] = 1;
}

$list = '';
arsort($freqData);
foreach ($freqData as $word=>$count){
    if ($count > 2){
        $list .= "$word ";
    }
}
if (empty($list)){
    $list = "Not enough duplicate words for popularity contest.";   
}
echo $list;
?>

#2


0  

I see you've accepted an answer, but I want to give you an alternative that might be more flexible in a sense: (Decide for yourself :-)) I've not tested the code, but I think you get the picture. $dbh is a PDO connection object. It's then up to you what you want to do with the resulting $words array.

我看到你已经接受了答案,但我想给你一个在某种意义上可能更灵活的替代方案:(自己决定:-))我没有测试过代码,但我认为你能得到答案。 $ dbh是一个PDO连接对象。这取决于你想要用你生成的$ words数组做什么。

<?php
$words = array();

$tableName = 'party'; //The name of the table
countWordsFromTable($words, $tableName)

$tableName = 'party2'; //The name of the table
countWordsFromTable($words, $tableName)

//Example output array:
/*
$words['word'][0] = 'happy'; //Happy from table party
$words['wordcount'][0] = 5;
$words['word'][1] = 'bulldog'; //Bulldog from table party2
$words['wordcount'][1] = 15;
$words['word'][2] = 'pokerface'; //Pokerface from table party2
$words['wordcount'][2] = 2;
*/

$maxValues = array_keys($words, max($words)); //Get all keys with indexes of max values     of $words-array
$popularIndex = $maxValues[0]; //Get only one value...
$mostPopularWord = $words[$popularIndex]; 


function countWordsFromTable(&$words, $tableName) {

    //Get all fields from specific table
    $q = $dbh->prepare("DESCRIBE :tableName"); 
    $q->execute(array(':tableName' = > $tableName));
    $tableFields = $q->fetchAll(PDO::FETCH_COLUMN);

    //Go through all fields and store count of words and their content in array $words
    foreach($tableFields as $dbCol) {

        $wordCountQuery = "SELECT :dbCol as word, LENGTH(:dbCol) - LENGTH(REPLACE(:dbCol, ' ', ''))+1 AS wordcount FROM :tableName"; //Get count and the content of words from every column in db
        $q = $dbh->prepare($wordCountQuery);
        $q->execute(array(':dbCol' = > $dbCol));
        $wrds = $q->fetchAll(PDO::FETCH_ASSOC);

        //Add result to array $words
        foreach($wrds as $w) {
            $words['word'][] = $w['word'];
            $words['wordcount'][] = $w['wordcount'];
        }

    }
}
?>