如何在不破坏MySQL数据库的情况下跟踪综合浏览量

时间:2022-04-05 02:44:26

I am trying to track pageviews in MySQL DB using the following query:

我正在尝试使用以下查询跟踪MySQL数据库中的网页浏览:

"UPDATE $table SET pageviews = pageviews + 1 WHERE page_id = 1"

“UPDATE $ table SET pageviews = pageviews + 1 WHERE page_id = 1”

This is fine for low to moderate traffic. However, under high traffic, constant writes to the DB would result in high read/write contention and eventually bring down the DB.

这适用于低到中等流量。但是,在高流量时,对DB的不断写入会导致高读/写争用并最终导致DB崩溃。

I have read several QA's here on * and elsewhere, where MongoDB is suggested as an alternative. However, that choice ain't available and I must stick to MySQL. Furthermore, I do not have control over the Engine — MyISAM or InnoDB (InnoDB performs better due to row based locking instead of table, as in case of MyISAM).

我已经在*和其他地方阅读了几个QA,其中建议使用MongoDB作为替代方案。但是,这个选择不可用,我必须坚持使用MySQL。此外,我无法控制引擎 - MyISAM或InnoDB(InnoDB由于基于行的锁定而不是表格而表现更好,如MyISAM的情况)。

Considering the above scenario, what's the best posible method to track pageviews without thrashing the DB (in DB or something else)? I would really appreciate an answer that provides code fragments as a starting point (if posible).

考虑到上面的场景,跟踪浏览量而不会破坏数据库(在数据库或其他方面)的最佳可行方法是什么?我真的很感激答案提供代码片段作为起点(如果可行)。

BTW, I am using PHP.

顺便说一下,我正在使用PHP。

Update: @fire has a good solution here. However, it requires use of memcache. I am, looking at something that could be easily implemented without requiring specific infra. This is for a module that could virtually be used in different hosting environments. On a second thought things that comes to my mind are some sort of cookie or file log based implementation. I am not sure how such implementation would work in practice. Any further inputs are really welcome.

更新:@fire在这里有一个很好的解决方案。但是,它需要使用memcache。我正在寻找一些可以轻松实现而无需特定基础知识的东西。这适用于几乎可以在不同托管环境中使用的模块。在第二个想法,我想到的是某种基于cookie或文件日志的实现。我不确定这种实施如何在实践中发挥作用。我们非常欢迎任何进一步的投入

4 个解决方案

#1


15  

I would use memcached to store the count, and then sync it with the database on a cron...

我会使用memcached来存储计数,然后将它与cron上的数据库同步...

// Increment
$page_id = 1;
$memcache = new Memcache();
$memcache->connect('localhost', 11211);

if (!$memcache->get('page_' . $page_id)) {
    $memcache->set('page_' . $page_id, 1);
}
else {
    $memcache->increment('page_' . $page_id, 1);
}

// Cron
if ($pageviews = $memcache->get('page_' . $page_id)) {
    $sql = "UPDATE pages SET pageviews = pageviews + " . $pageviews . " WHERE page_id = " . $page_id;
    mysql_query($sql);
    $memcache->delete('page_' . $page_id);
}

#2


1  

I'd consider gathering raw hits with the fastest writing engine you have available:

我会考虑使用你提供的最快的写入引擎收集原始命中:

INSERT INTO hits (page_id, hit_date) VALUES (:page_id, CURRENT_TIMESTAMP)

... and then running a periodical process, possibly a cron command line script, that would count and store the page count summary you need in an hourly or daily basis:

...然后运行一个定期进程,可能是一个cron命令行脚本,它将按小时或每天计算并存储您需要的页数统计摘要:

INSERT INTO daily_stats (page_id, num_hits, day)
SELECT page_id, SUM(hit_id)
FROM hits
WHERE hit_date='2012-11-29'
GROUP BY page_id

(Queries are mere examples, tweak to your needs)

(查询仅仅是示例,可根据您的需求进行调整)

Another typical solution is good old log parsing, feeding a script like AWStats with your web server's logs.

另一个典型的解决方案是良好的旧日志解析,将AWStats等脚本与Web服务器的日志一起提供。

Clarification: My first suggestion is fairly similar to @fire's but I didn't get into storage details. The key point is to delay heavy processing and just the minimum amount of raw info in the fastest way.

澄清:我的第一个建议与@ fire的相似,但我没有进入存储细节。关键是要以最快的方式延迟繁重的处理和最小量的原始信息。

#3


0  

Have you considered using Google Analytics?

您考虑过使用Google Analytics吗?

http://analytics.google.com

#4


0  

You haven't specified the read or write rate to this table. MySQL can usually keep up quite well if you keep the indexing to an absolute minimum and the row size small. A table with a page ID and a counter column should be very fast most of the time.

您尚未指定此表的读取或写入速率。如果将索引保持在绝对最小值并且行大小很小,MySQL通常可以保持良好状态。具有页面ID和计数器列的表在大多数情况下应该非常快。

InnoDB should be fine as well. MyISAM is liable to explode in the worst possible way if the system crashes or loses power during heavy write activity, it's not journaled and can't always be recovered. InnoDB is much more robust.

InnoDB也应该没问题。如果系统在重写活动期间崩溃或断电,MyISAM可能以最坏的方式爆炸,它不是记录的,也不能总是被恢复。 InnoDB更强大。

To get maximum performance from InnoDB, you'll want to tune your server according to the standard guidelines and benchmark it aggressively to be sure you got it right. Each OS has its quirks. Sometimes you can be missing out on a factor of two performance increase by not having the right setting.

为了从InnoDB获得最大性能,您需要根据标准指南调整服务器并积极地对其进行基准测试,以确保您做到了正确。每个操作系统都有它的怪癖。有时候,由于没有正确的设置,你可能会错过两倍的性能提升。

If your tracking database is small, you might want to create an instance backed by a RAM disk and replicate it to another server with a regular HD. Since you're expecting extremely high write activity, if you can endure a small loss of data in the worst possible situation like a system crash, you could simply mysqldump this database periodically to snapshot it. Dumping a memory-backed database with even a million rows should take only a minute and wouldn't interrupt writes due to MVCC.

如果您的跟踪数据库很小,您可能希望创建一个由RAM磁盘支持的实例,并将其复制到另一台具有常规HD的服务器。由于您期望极高的写入活动,如果您可以在最糟糕的情况下(如系统崩溃)忍受少量数据丢失,您可以定期mysqldump此数据库以对其进行快照。将内存支持的数据库转储到甚至一百万行应该只需要一分钟,并且不会因MVCC而中断写入。

#1


15  

I would use memcached to store the count, and then sync it with the database on a cron...

我会使用memcached来存储计数,然后将它与cron上的数据库同步...

// Increment
$page_id = 1;
$memcache = new Memcache();
$memcache->connect('localhost', 11211);

if (!$memcache->get('page_' . $page_id)) {
    $memcache->set('page_' . $page_id, 1);
}
else {
    $memcache->increment('page_' . $page_id, 1);
}

// Cron
if ($pageviews = $memcache->get('page_' . $page_id)) {
    $sql = "UPDATE pages SET pageviews = pageviews + " . $pageviews . " WHERE page_id = " . $page_id;
    mysql_query($sql);
    $memcache->delete('page_' . $page_id);
}

#2


1  

I'd consider gathering raw hits with the fastest writing engine you have available:

我会考虑使用你提供的最快的写入引擎收集原始命中:

INSERT INTO hits (page_id, hit_date) VALUES (:page_id, CURRENT_TIMESTAMP)

... and then running a periodical process, possibly a cron command line script, that would count and store the page count summary you need in an hourly or daily basis:

...然后运行一个定期进程,可能是一个cron命令行脚本,它将按小时或每天计算并存储您需要的页数统计摘要:

INSERT INTO daily_stats (page_id, num_hits, day)
SELECT page_id, SUM(hit_id)
FROM hits
WHERE hit_date='2012-11-29'
GROUP BY page_id

(Queries are mere examples, tweak to your needs)

(查询仅仅是示例,可根据您的需求进行调整)

Another typical solution is good old log parsing, feeding a script like AWStats with your web server's logs.

另一个典型的解决方案是良好的旧日志解析,将AWStats等脚本与Web服务器的日志一起提供。

Clarification: My first suggestion is fairly similar to @fire's but I didn't get into storage details. The key point is to delay heavy processing and just the minimum amount of raw info in the fastest way.

澄清:我的第一个建议与@ fire的相似,但我没有进入存储细节。关键是要以最快的方式延迟繁重的处理和最小量的原始信息。

#3


0  

Have you considered using Google Analytics?

您考虑过使用Google Analytics吗?

http://analytics.google.com

#4


0  

You haven't specified the read or write rate to this table. MySQL can usually keep up quite well if you keep the indexing to an absolute minimum and the row size small. A table with a page ID and a counter column should be very fast most of the time.

您尚未指定此表的读取或写入速率。如果将索引保持在绝对最小值并且行大小很小,MySQL通常可以保持良好状态。具有页面ID和计数器列的表在大多数情况下应该非常快。

InnoDB should be fine as well. MyISAM is liable to explode in the worst possible way if the system crashes or loses power during heavy write activity, it's not journaled and can't always be recovered. InnoDB is much more robust.

InnoDB也应该没问题。如果系统在重写活动期间崩溃或断电,MyISAM可能以最坏的方式爆炸,它不是记录的,也不能总是被恢复。 InnoDB更强大。

To get maximum performance from InnoDB, you'll want to tune your server according to the standard guidelines and benchmark it aggressively to be sure you got it right. Each OS has its quirks. Sometimes you can be missing out on a factor of two performance increase by not having the right setting.

为了从InnoDB获得最大性能,您需要根据标准指南调整服务器并积极地对其进行基准测试,以确保您做到了正确。每个操作系统都有它的怪癖。有时候,由于没有正确的设置,你可能会错过两倍的性能提升。

If your tracking database is small, you might want to create an instance backed by a RAM disk and replicate it to another server with a regular HD. Since you're expecting extremely high write activity, if you can endure a small loss of data in the worst possible situation like a system crash, you could simply mysqldump this database periodically to snapshot it. Dumping a memory-backed database with even a million rows should take only a minute and wouldn't interrupt writes due to MVCC.

如果您的跟踪数据库很小,您可能希望创建一个由RAM磁盘支持的实例,并将其复制到另一台具有常规HD的服务器。由于您期望极高的写入活动,如果您可以在最糟糕的情况下(如系统崩溃)忍受少量数据丢失,您可以定期mysqldump此数据库以对其进行快照。将内存支持的数据库转储到甚至一百万行应该只需要一分钟,并且不会因MVCC而中断写入。