从过去n小时内选择观看次数最多的帖子的最佳方法

时间:2021-11-28 06:59:19

I'm using PHP and MYSQL(innodb engine).

我正在使用PHP和MYSQL(innodb引擎)。

As MYSQL reference says, selecting with comparison of one column and ordering by another can't use our considered index.

正如MYSQL参考所说,选择一列的比较和另一列的排序不能使用我们考虑的索引。

I have a table named News.

我有一张名为News的表。

This table has at least 1 million records with two important columns: time_added and number_of_views.

此表包含至少100万条记录,其中包含两个重要列:time_added和number_of_views。

I need to select most viewed records from last n hours. What is the best index to do this? Or is it possible to run this kind of queries very fast for a table with millions of records?

我需要从过去n小时中选择最常查看的记录。这样做的最佳指标是什么?或者是否可以非常快速地为具有数百万条记录的表运行此类查询?

I've already done this for "last day", meaning I can select most viewed records from last day by adding a new column (date_added). But if I decide to select these records from last week, I'm in trouble again.

我已经为“最后一天”做了这个,这意味着我可以通过添加新列(date_added)从最后一天选择最常查看的记录。但如果我决定从上周选择这些记录,我又遇到了麻烦。

3 个解决方案

#1


1  

First, write the query:

首先,编写查询:

select n.*
from news n
where time_added >= date_sub(now(), interval <n> hours)
order by number_of_views desc
limit ??;

The best index is (time_added, number_of_views). Actually, number_of_views won't be used for the full query, but I would include it for other possible queries.

最佳索引是(time_added,number_of_views)。实际上,number_of_views不会用于完整查询,但我会将其包含在其他可能的查询中。

#2


0  

First you must add the following line to the my.cnf (in section

首先,您必须将以下行添加到my.cnf(在部分中

[mysqld]):
query_cache_size = 32M (or more).
query_cache_limit = 32M (or more)

query_cache_size Sets size of the cache

query_cache_size设置缓存的大小

Another option, which should pay attention - this query_cache_limit - it sets the maximum amount of the result of the query, which can be placed in the cache. Check the status of the cache, you can request the following:

另一个应该注意的选项 - 这个query_cache_limit - 它设置查询结果的最大量,可以放在缓存中。检查缓存的状态,您可以请求以下内容:

show global status like 'Qcache%';

http://dev.mysql.com/doc/refman/5.7/en/mysql-indexes.html

http://dev.mysql.com/doc/refman/5.7/en/mysql-indexes.html

If the table has a multiple-column index, any leftmost prefix of the index can be used by the optimizer to look up rows. For example, if you have a three-column index on (col1, col2, col3), you have indexed search capabilities on (col1), (col1, col2), and (col1, col2, col3). For more information, see http://dev.mysql.com/doc/refman/5.7/en/multiple-column-indexes.html

如果表具有多列索引,则优化程序可以使用索引的任何最左前缀来查找行。例如,如果在(col1,col2,col3)上有三列索引,则在(col1),(col1,col2)和(col1,col2,col3)上编制索引搜索功能。有关更多信息,请参阅http://dev.mysql.com/doc/refman/5.7/en/multiple-column-indexes.html

#3


0  

You need a summary table. Since 'hour' is your granularity, something like this might work:

您需要一个汇总表。由于'小时'是您的粒度,这样的事情可能会起作用:

CREATE TABLE HourlyViews (
    the_hour DATETIME NOT NULL,
    ct SMALLINT UNSIGNED NOT NULL,
    PRIMARY KEY(the_hour)
) ENGINE=InnoDB;

It might need another column (and add it to the PK) if there is some breakdown of the items you are counting. And you might want some other things SUM'd or COUNT'd in this table.

如果您正在计算的项目有一些细分,它可能需要另一列(并将其添加到PK)。而且您可能希望在此表中使用SUM'd或COUNT来处理其他一些内容。

Build and maintain this table incrementally. That is, every hour, add another row to the table. (Or you could keep it updated with INSERT .. ON DUPLICATE KEY UPDATE ...)

逐步构建和维护此表。也就是说,每小时,向表中添加另一行。 (或者您可以使用INSERT更新它...在重复键更新...)

More on Summary Tables

更多关于汇总表

Then change the query to use that table; it will be a lot faster.

然后更改查询以使用该表;它会快得多。

#1


1  

First, write the query:

首先,编写查询:

select n.*
from news n
where time_added >= date_sub(now(), interval <n> hours)
order by number_of_views desc
limit ??;

The best index is (time_added, number_of_views). Actually, number_of_views won't be used for the full query, but I would include it for other possible queries.

最佳索引是(time_added,number_of_views)。实际上,number_of_views不会用于完整查询,但我会将其包含在其他可能的查询中。

#2


0  

First you must add the following line to the my.cnf (in section

首先,您必须将以下行添加到my.cnf(在部分中

[mysqld]):
query_cache_size = 32M (or more).
query_cache_limit = 32M (or more)

query_cache_size Sets size of the cache

query_cache_size设置缓存的大小

Another option, which should pay attention - this query_cache_limit - it sets the maximum amount of the result of the query, which can be placed in the cache. Check the status of the cache, you can request the following:

另一个应该注意的选项 - 这个query_cache_limit - 它设置查询结果的最大量,可以放在缓存中。检查缓存的状态,您可以请求以下内容:

show global status like 'Qcache%';

http://dev.mysql.com/doc/refman/5.7/en/mysql-indexes.html

http://dev.mysql.com/doc/refman/5.7/en/mysql-indexes.html

If the table has a multiple-column index, any leftmost prefix of the index can be used by the optimizer to look up rows. For example, if you have a three-column index on (col1, col2, col3), you have indexed search capabilities on (col1), (col1, col2), and (col1, col2, col3). For more information, see http://dev.mysql.com/doc/refman/5.7/en/multiple-column-indexes.html

如果表具有多列索引,则优化程序可以使用索引的任何最左前缀来查找行。例如,如果在(col1,col2,col3)上有三列索引,则在(col1),(col1,col2)和(col1,col2,col3)上编制索引搜索功能。有关更多信息,请参阅http://dev.mysql.com/doc/refman/5.7/en/multiple-column-indexes.html

#3


0  

You need a summary table. Since 'hour' is your granularity, something like this might work:

您需要一个汇总表。由于'小时'是您的粒度,这样的事情可能会起作用:

CREATE TABLE HourlyViews (
    the_hour DATETIME NOT NULL,
    ct SMALLINT UNSIGNED NOT NULL,
    PRIMARY KEY(the_hour)
) ENGINE=InnoDB;

It might need another column (and add it to the PK) if there is some breakdown of the items you are counting. And you might want some other things SUM'd or COUNT'd in this table.

如果您正在计算的项目有一些细分,它可能需要另一列(并将其添加到PK)。而且您可能希望在此表中使用SUM'd或COUNT来处理其他一些内容。

Build and maintain this table incrementally. That is, every hour, add another row to the table. (Or you could keep it updated with INSERT .. ON DUPLICATE KEY UPDATE ...)

逐步构建和维护此表。也就是说,每小时,向表中添加另一行。 (或者您可以使用INSERT更新它...在重复键更新...)

More on Summary Tables

更多关于汇总表

Then change the query to use that table; it will be a lot faster.

然后更改查询以使用该表;它会快得多。