最好的做法是将大量的点击数记录到MySQL数据库中。

时间:2022-02-02 03:55:09

Well, this is the thing. Let's say that my future PHP CMS need to drive 500k visitors daily and I need to record them all in MySQL database (referrer, ip address, time etc.). This way I need to insert 300-500 rows per minute and update 50 more. The main problem is that script would call database every time I want to insert new row, which is every time someone hits a page.

这就是问题所在。假设我未来的PHP CMS需要每天驱动500k的访客,我需要将他们全部记录在MySQL数据库中(引用者、ip地址、时间等)。这样,我需要每分钟插入300-500行,并更新50行。主要的问题是,每当我想插入新行时,脚本就会调用数据库,也就是每次有人碰到页面时。

My question, is there any way to locally cache incoming hits first (and what is the best solution for that apc, csv...?) and periodically send them to database every 10 minutes for example? Is this good solution and what is the best practice for this situation?

我的问题是,有没有办法先在本地缓存传入的命中数据(对于这个apc、csv……最好的解决方案是什么?),然后每隔10分钟定期将它们发送到数据库?这是一个好的解决方案吗?

10 个解决方案

#1


22  

500k daily it's just 5-7 queries per second. If each request will be served for 0.2 sec, then you will have almost 0 simultaneous queries, so there is nothing to worry about.
Even if you will have 5 times more users - all should work fine.
You can just use INSERT DELAYED and tune your mysql.
About tuning: http://www.day32.com/MySQL/ - there is very useful script (will change nothing, just show you the tips how to optimize settings).

每天500k,每秒5-7个查询。如果每个请求将被服务0.2秒,那么您将有几乎0个同时查询,因此无需担心。即使你有5倍的用户,一切都应该没问题。你可以使用插入延迟和调整你的mysql。关于调优:http://www.day32.com/MySQL/ -有一个非常有用的脚本(不会改变任何东西,只向您展示如何优化设置的技巧)。

You can use memcache or APC to write log there first, but with using INSERT DELAYED MySQL will do almost same work, and will do it better :)

您可以使用memcache或APC先在那里写日志,但是使用INSERT delay MySQL会完成几乎相同的工作,而且会做得更好:

Do not use files for this. DB will serve locks much better, than PHP. It's not so trivial to write effective mutexes, so let DB (or memcache, APC) do this work.

不要为此使用文件。DB将提供比PHP更好的锁。编写有效的互斥对象并非易事,所以让DB(或memcache, APC)来完成这项工作。

#2


18  

A frequently used solution:

一个常用的解决方案:

You could implement an counter in memcached which you increment on an visit, and push an update to the database for every 100 (or 1000) hits.

您可以在memcached内实现一个计数器,在访问时对其进行递增,并为每100次(或1000次)的访问向数据库推送一次更新。

#3


4  

We do this by storing locally on each server to CSV, then having a minutely cron job to push the entries into the database. This is to avoid needing a highly available MySQL database more than anything - the database should be able to cope with that volume of inserts without a problem.

我们通过在每个服务器上本地存储到CSV来实现这一点,然后使用一个小cron作业将条目推送到数据库中。这是为了避免对高度可用的MySQL数据库的需求——数据库应该能够毫无问题地处理大量的插入。

#4


3  

Save them to a directory-based database (or flat file, depends) somewhere and at a certain time, use a PHP code to insert/update them into your MySQL database. Your php code can be executed periodically using Cron, so check if your server has Cron so that you can set the schedule for that, say every 10 minutes.

将它们保存到基于目录的数据库(或平面文件,视情况而定)中,并在某个时候使用PHP代码将它们插入/更新到MySQL数据库中。您的php代码可以使用Cron定期执行,所以检查您的服务器是否有Cron,这样您就可以为它设置时间表,比如每10分钟一次。

Have a look at this page: http://damonparker.org/blog/2006/05/10/php-cron-script-to-run-automated-jobs/. Some codes have been written in the cloud and are ready for you to use :)

看看这个页面:http://damonparker.org/blog/2006/05/10/php-cron-script-to-run- automatingjobs/。有些代码已经在云中编写好,可以供您使用:)

#5


2  

One way would be to use Apache access.log. You can get a quite fine logging by using cronolog utility with apache . Cronolog will handle the storage of a very big number of rows in files, and can rotate it based on volume day, year, etc. Using this utility will prevent your Apache from suffering of log writes.

一种方法是使用Apache access.log。通过使用apache的cronolog实用程序,您可以获得相当好的日志记录。Cronolog将处理文件中大量行的存储,并可以根据卷日、年等对其进行旋转。使用这个工具可以防止Apache遭受日志写入的痛苦。

Then as said by others, use a cron-based job to analyse these log and push whatever summarized or raw data you want in MySQL.

然后,正如其他人所说的,使用基于crbased作业来分析这些日志,并在MySQL中推送任何您想要的汇总或原始数据。

You may think of using a dedicated database (or even database server) for write-intensive jobs, with specific settings. For example you may not need InnoDB storage and keep a simple MyIsam. And you could even think of another database storage (as said by @Riccardo Galli)

您可以考虑使用专用数据库(甚至数据库服务器)处理写密集型的作业,并设置特定的设置。例如,您可能不需要InnoDB存储并保持一个简单的MyIsam。您甚至可以考虑另一个数据库存储(如@Riccardo Galli所说)

#6


2  

If you absolutely HAVE to log directly to MySQL, consider using two databases. One optimized for quick inserts, which means no keys other than possibly an auto_increment primary key. And another with keys on everything you'd be querying for, optimized for fast searches. A timed job would copy hits from the insert-only to the read-only database on a regular basis, and you end up with the best of both worlds. The only drawback is that your available statistics will only be as fresh as the previous "copy" run.

如果必须直接登录到MySQL,可以考虑使用两个数据库。其中一个优化为快速插入,这意味着除了可能的auto_increment主键之外没有键。另一个是在你要查询的所有东西上都有键,为快速搜索做了优化。一个定时的任务会定期地将插入的命中结果复制到只读数据库中,您将得到这两个世界中最好的结果。惟一的缺点是,您的可用统计信息将只与以前的“复制”运行一样新鲜。

#7


2  

I have also previously seen a system which records the data into a flat file on the local disc on each web server (be careful to do only atomic appends if using multiple proceses), and periodically asynchronously write them into the database using a daemon process or cron job.

我以前还看到过一个系统,它将数据记录到每个web服务器上的本地磁盘上的一个平面文件中(如果使用多个proceses,请小心使用原子附加程序),并使用守护进程或cron作业定期异步地将它们写入数据库。

This appears to be the prevailing optimium solution; your web app remains available if the audit database is down and users don't suffer poor performance if the database is slow for any reason.

这似乎是主流的优化方案;如果审计数据库关闭,您的web应用程序仍然可用,如果数据库由于任何原因运行缓慢,则用户不会受到性能低下的影响。

The only thing I can say, is be sure that you have monitoring on these locally-generated files - a build-up definitely indicates a problem and your Ops engineers might not otherwise notice.

我能说的唯一一件事是,确保您对这些本地生成的文件进行了监视——一个累积肯定表明了一个问题,而您的操作工程师可能不会注意到这个问题。

#8


0  

For an high number of write operations and this kind of data you might find more suitable mongodb or couchdb

对于大量的写操作和此类数据,您可能会发现更合适的mongodb或couchdb

#9


0  

Because INSERT DELAYED is only supported by MyISAM, it is not an option for many users.

因为插入延迟仅由MyISAM支持,所以对许多用户来说它不是一个选项。

We use MySQL Proxy to defer the execution of queries matching a certain signature.

我们使用MySQL代理来延迟执行匹配某个签名的查询。

This will require a custom Lua script; example scripts are here, and some tutorials are here.

这将需要一个自定义Lua脚本;示例脚本在这里,一些教程在这里。

The script will implement a Queue data structure for storage of query strings, and pattern matching to determine what queries to defer. Once the queue reaches a certain size, or a certain amount of time has elapsed, or whatever event X occurs, the query queue is emptied as each query is sent to the server.

该脚本将实现用于存储查询字符串的队列数据结构,并通过模式匹配来确定要延迟哪些查询。一旦队列达到了一定的大小,或者经过了一定的时间,或者发生了任何事件X,查询队列将在每个查询被发送到服务器时被清空。

#10


0  

you can use a Queue strategy using beanstalk or IronQ

您可以使用beanstalk或IronQ来使用队列策略。

#1


22  

500k daily it's just 5-7 queries per second. If each request will be served for 0.2 sec, then you will have almost 0 simultaneous queries, so there is nothing to worry about.
Even if you will have 5 times more users - all should work fine.
You can just use INSERT DELAYED and tune your mysql.
About tuning: http://www.day32.com/MySQL/ - there is very useful script (will change nothing, just show you the tips how to optimize settings).

每天500k,每秒5-7个查询。如果每个请求将被服务0.2秒,那么您将有几乎0个同时查询,因此无需担心。即使你有5倍的用户,一切都应该没问题。你可以使用插入延迟和调整你的mysql。关于调优:http://www.day32.com/MySQL/ -有一个非常有用的脚本(不会改变任何东西,只向您展示如何优化设置的技巧)。

You can use memcache or APC to write log there first, but with using INSERT DELAYED MySQL will do almost same work, and will do it better :)

您可以使用memcache或APC先在那里写日志,但是使用INSERT delay MySQL会完成几乎相同的工作,而且会做得更好:

Do not use files for this. DB will serve locks much better, than PHP. It's not so trivial to write effective mutexes, so let DB (or memcache, APC) do this work.

不要为此使用文件。DB将提供比PHP更好的锁。编写有效的互斥对象并非易事,所以让DB(或memcache, APC)来完成这项工作。

#2


18  

A frequently used solution:

一个常用的解决方案:

You could implement an counter in memcached which you increment on an visit, and push an update to the database for every 100 (or 1000) hits.

您可以在memcached内实现一个计数器,在访问时对其进行递增,并为每100次(或1000次)的访问向数据库推送一次更新。

#3


4  

We do this by storing locally on each server to CSV, then having a minutely cron job to push the entries into the database. This is to avoid needing a highly available MySQL database more than anything - the database should be able to cope with that volume of inserts without a problem.

我们通过在每个服务器上本地存储到CSV来实现这一点,然后使用一个小cron作业将条目推送到数据库中。这是为了避免对高度可用的MySQL数据库的需求——数据库应该能够毫无问题地处理大量的插入。

#4


3  

Save them to a directory-based database (or flat file, depends) somewhere and at a certain time, use a PHP code to insert/update them into your MySQL database. Your php code can be executed periodically using Cron, so check if your server has Cron so that you can set the schedule for that, say every 10 minutes.

将它们保存到基于目录的数据库(或平面文件,视情况而定)中,并在某个时候使用PHP代码将它们插入/更新到MySQL数据库中。您的php代码可以使用Cron定期执行,所以检查您的服务器是否有Cron,这样您就可以为它设置时间表,比如每10分钟一次。

Have a look at this page: http://damonparker.org/blog/2006/05/10/php-cron-script-to-run-automated-jobs/. Some codes have been written in the cloud and are ready for you to use :)

看看这个页面:http://damonparker.org/blog/2006/05/10/php-cron-script-to-run- automatingjobs/。有些代码已经在云中编写好,可以供您使用:)

#5


2  

One way would be to use Apache access.log. You can get a quite fine logging by using cronolog utility with apache . Cronolog will handle the storage of a very big number of rows in files, and can rotate it based on volume day, year, etc. Using this utility will prevent your Apache from suffering of log writes.

一种方法是使用Apache access.log。通过使用apache的cronolog实用程序,您可以获得相当好的日志记录。Cronolog将处理文件中大量行的存储,并可以根据卷日、年等对其进行旋转。使用这个工具可以防止Apache遭受日志写入的痛苦。

Then as said by others, use a cron-based job to analyse these log and push whatever summarized or raw data you want in MySQL.

然后,正如其他人所说的,使用基于crbased作业来分析这些日志,并在MySQL中推送任何您想要的汇总或原始数据。

You may think of using a dedicated database (or even database server) for write-intensive jobs, with specific settings. For example you may not need InnoDB storage and keep a simple MyIsam. And you could even think of another database storage (as said by @Riccardo Galli)

您可以考虑使用专用数据库(甚至数据库服务器)处理写密集型的作业,并设置特定的设置。例如,您可能不需要InnoDB存储并保持一个简单的MyIsam。您甚至可以考虑另一个数据库存储(如@Riccardo Galli所说)

#6


2  

If you absolutely HAVE to log directly to MySQL, consider using two databases. One optimized for quick inserts, which means no keys other than possibly an auto_increment primary key. And another with keys on everything you'd be querying for, optimized for fast searches. A timed job would copy hits from the insert-only to the read-only database on a regular basis, and you end up with the best of both worlds. The only drawback is that your available statistics will only be as fresh as the previous "copy" run.

如果必须直接登录到MySQL,可以考虑使用两个数据库。其中一个优化为快速插入,这意味着除了可能的auto_increment主键之外没有键。另一个是在你要查询的所有东西上都有键,为快速搜索做了优化。一个定时的任务会定期地将插入的命中结果复制到只读数据库中,您将得到这两个世界中最好的结果。惟一的缺点是,您的可用统计信息将只与以前的“复制”运行一样新鲜。

#7


2  

I have also previously seen a system which records the data into a flat file on the local disc on each web server (be careful to do only atomic appends if using multiple proceses), and periodically asynchronously write them into the database using a daemon process or cron job.

我以前还看到过一个系统,它将数据记录到每个web服务器上的本地磁盘上的一个平面文件中(如果使用多个proceses,请小心使用原子附加程序),并使用守护进程或cron作业定期异步地将它们写入数据库。

This appears to be the prevailing optimium solution; your web app remains available if the audit database is down and users don't suffer poor performance if the database is slow for any reason.

这似乎是主流的优化方案;如果审计数据库关闭,您的web应用程序仍然可用,如果数据库由于任何原因运行缓慢,则用户不会受到性能低下的影响。

The only thing I can say, is be sure that you have monitoring on these locally-generated files - a build-up definitely indicates a problem and your Ops engineers might not otherwise notice.

我能说的唯一一件事是,确保您对这些本地生成的文件进行了监视——一个累积肯定表明了一个问题,而您的操作工程师可能不会注意到这个问题。

#8


0  

For an high number of write operations and this kind of data you might find more suitable mongodb or couchdb

对于大量的写操作和此类数据,您可能会发现更合适的mongodb或couchdb

#9


0  

Because INSERT DELAYED is only supported by MyISAM, it is not an option for many users.

因为插入延迟仅由MyISAM支持,所以对许多用户来说它不是一个选项。

We use MySQL Proxy to defer the execution of queries matching a certain signature.

我们使用MySQL代理来延迟执行匹配某个签名的查询。

This will require a custom Lua script; example scripts are here, and some tutorials are here.

这将需要一个自定义Lua脚本;示例脚本在这里,一些教程在这里。

The script will implement a Queue data structure for storage of query strings, and pattern matching to determine what queries to defer. Once the queue reaches a certain size, or a certain amount of time has elapsed, or whatever event X occurs, the query queue is emptied as each query is sent to the server.

该脚本将实现用于存储查询字符串的队列数据结构,并通过模式匹配来确定要延迟哪些查询。一旦队列达到了一定的大小,或者经过了一定的时间,或者发生了任何事件X,查询队列将在每个查询被发送到服务器时被清空。

#10


0  

you can use a Queue strategy using beanstalk or IronQ

您可以使用beanstalk或IronQ来使用队列策略。