
时间:2022-09-19 15:48:39

What do I have to do to make 20k mysql inserts per second possible (during peak hours around 1k/sec during slower times)? I've been doing some research and I've seen the "INSERT DELAYED" suggestion, writing to a flat file, "fopen(file,'a')", and then running a chron job to dump the "needed" data into mysql, etc. I've also heard you need multiple servers and "load balancers" which I've never heard of, to make something like this work. I've also been looking at these "cloud server" thing-a-ma-jigs, and their automatic scalability, but not sure about what's actually scalable.

我需要做些什么来实现每秒20k mysql插入(在高峰时段,在较慢的时间内大约1k /秒)?我一直在做一些研究,我已经看到了“INSERT DELAYED”建议,写入平面文件,“fopen(文件,'a')”,然后运行chron作业将“所需”数据转储到mysql等我也听说你需要多个服务器和“负载平衡器”,这是我从未听说过的,做这样的工作。我也一直在关注这些“云服务器”的东西,它们的自动可扩展性,但不确定实际可扩展性。

The application is just a tracker script, so if I have 100 websites that get 3 million page loads a day, there will be around 300 million inserts a day. The data will be ran through a script that will run every 15-30 minutes which will normalize the data and insert it into another mysql table.


How do the big dogs do it? How do the little dogs do it? I can't afford a huge server anymore so any intuitive ways, if there are multiple ways of going at it, you smart people can think of.. please let me know :)


8 个解决方案



How do the big dogs do it?


Multiple servers. Load balancing.


How do the little dogs do it?


Multiple servers. Load balancing.


You really want to save up inserts and push them to the database in bulk. 20k individual inserts a second is a crapton of overhead, and simplifying that down to one big insert each second eliminates most of that.

您真的想要保存插入并将它们批量推送到数据库。 20k个单独插入一秒钟是头顶上的一个掠夺者,并且每秒插入一个大插入物就可以消除大部分内容。



A couple of ways:


Firstly, you will reach a point where you need to partition or shard your data to split it across multiple servers. This could be as simple as A-C on server1, D-F on server2 and so on.


Secondly, defer writing to the database. Instead write to a fast memory store using either beanstalkd or memcached directly. Have another process collect those states and write aggregated data to the database. Periodically amalgamate those records into summary data.




Thats impressive. Most of my data has been from massive inserts at once. One thing that I find is that bulk inserts do a lot better than individual inserts. Also, the design of your tables, indexes etc has a lot to do with insert speed. The problem with using cron and bulk inserting are the edge cases. (When it goes to do the inserts).

这很让人佩服。我的大多数数据都来自大量插入。我发现的一件事是批量插入比单个插入更好。此外,表格,索引等的设计与插入速度有很大关系。使用cron和批量插入的问题是边缘情况。 (当它去做插入)。

Additionally with flatfiles. You can easily run into issues with concurrency with writing the inserts to the file. If you are writting 1k+ inserts a s you'll quickly run into lots of conflicts and loss when there are issues with the file writing.

另外还有flatfiles。通过将插入写入文件,您可以轻松地遇到并发问题。如果您正在写1k +插入s,那么当文件写入有问题时,您将很快遇到许多冲突和丢失。



This is not a problem you can handle in PHP alone.


If you have 20 000 requests a second hitting your "low-budget" (as I understood by the undertone of your question) server, then it will reach its limit before most of them reach the PHP processor (and, eventually, MySQL).


If you have a traffic tracker script, you'll very likely cause problems for all the sites you track too.




PHP is not well-suited to high volume web traffic IMHO. However, the database will likely bog you down before the PHP performance - especially with PHP's connection model (opens a new connection for every requst).

PHP不适合大量网络流量恕我直言。但是,数据库可能会在PHP性能之前陷入困境 - 尤其是使用PHP的连接模型(为每个请求打开一个新连接)。

I have two suggestions for you:


  1. Give SQL Relay a look: http://sqlrelay.sourceforge.net/
  2. 给SQL Relay看一看:http://sqlrelay.sourceforge.net/

  3. Check out some PHP accelerators: http://en.wikipedia.org/wiki/List_of_PHP_accelerators
  4. 查看一些PHP加速器:http://en.wikipedia.org/wiki/List_of_PHP_accelerators

SQL Relay effectively allows PHP to tke advantage of connection pooling and that will give much better performance for a high volume database application.

SQL Relay有效地允许PHP利用连接池,这将为高容量数据库应用程序提供更好的性能。

PHP accelrators (generally speaking) cache the PHP opcodes which saves the overhead of interpreting the PHP code with every request.


Good Luck!



I'd recommend memcaching, too.


Write your data into a memcache and have a periodically running job aggregate it and do the inserts.


Writing to an actual file would probably DECREASE your performance since file system access is mostly slower than talking to a database that can handle writing access much more efficiently.




Writing to a file is great, but you still need to synchronize your file writes which puts you back to square one.



  • MQ system, though sometimes the DB can be faster,
  • MQ系统,虽然有时DB可以更快,

  • On the MQ idea: in-memory queue. I know you said PHP, but I've seen this done quite well in Java/Servlets,
  • 关于MQ的想法:内存中的队列。我知道你说的是PHP,但是我在Java / Servlets中看到这个做得很好,

  • Depending on what it is you're tracking, you can deploy a static file into a CDN (the cloud thing you talked about) and aggregate the access logs in batch. Allows you to rent scaling out,
  • 根据您正在跟踪的内容,您可以将静态文件部署到CDN(您谈到的云事物)中,并批量聚合访问日志。允许您租借扩展,

  • INSERT DELAYED good idea but I don't know what the backlog/queue size is for that in MySQL? (anyone)
  • INSERT DELAYED好主意,但我不知道MySQL的积压/队列大小是什么? (任何人)



Since you're tracking impressions, what if try only saving, say, one in every 5. Then you still have a completely "random" sample, and you can just apply the percentages to the bigger dataset.




How do the big dogs do it?


Multiple servers. Load balancing.


How do the little dogs do it?


Multiple servers. Load balancing.


You really want to save up inserts and push them to the database in bulk. 20k individual inserts a second is a crapton of overhead, and simplifying that down to one big insert each second eliminates most of that.

您真的想要保存插入并将它们批量推送到数据库。 20k个单独插入一秒钟是头顶上的一个掠夺者,并且每秒插入一个大插入物就可以消除大部分内容。



A couple of ways:


Firstly, you will reach a point where you need to partition or shard your data to split it across multiple servers. This could be as simple as A-C on server1, D-F on server2 and so on.


Secondly, defer writing to the database. Instead write to a fast memory store using either beanstalkd or memcached directly. Have another process collect those states and write aggregated data to the database. Periodically amalgamate those records into summary data.




Thats impressive. Most of my data has been from massive inserts at once. One thing that I find is that bulk inserts do a lot better than individual inserts. Also, the design of your tables, indexes etc has a lot to do with insert speed. The problem with using cron and bulk inserting are the edge cases. (When it goes to do the inserts).

这很让人佩服。我的大多数数据都来自大量插入。我发现的一件事是批量插入比单个插入更好。此外,表格,索引等的设计与插入速度有很大关系。使用cron和批量插入的问题是边缘情况。 (当它去做插入)。

Additionally with flatfiles. You can easily run into issues with concurrency with writing the inserts to the file. If you are writting 1k+ inserts a s you'll quickly run into lots of conflicts and loss when there are issues with the file writing.

另外还有flatfiles。通过将插入写入文件,您可以轻松地遇到并发问题。如果您正在写1k +插入s,那么当文件写入有问题时,您将很快遇到许多冲突和丢失。



This is not a problem you can handle in PHP alone.


If you have 20 000 requests a second hitting your "low-budget" (as I understood by the undertone of your question) server, then it will reach its limit before most of them reach the PHP processor (and, eventually, MySQL).


If you have a traffic tracker script, you'll very likely cause problems for all the sites you track too.




PHP is not well-suited to high volume web traffic IMHO. However, the database will likely bog you down before the PHP performance - especially with PHP's connection model (opens a new connection for every requst).

PHP不适合大量网络流量恕我直言。但是,数据库可能会在PHP性能之前陷入困境 - 尤其是使用PHP的连接模型(为每个请求打开一个新连接)。

I have two suggestions for you:


  1. Give SQL Relay a look: http://sqlrelay.sourceforge.net/
  2. 给SQL Relay看一看:http://sqlrelay.sourceforge.net/

  3. Check out some PHP accelerators: http://en.wikipedia.org/wiki/List_of_PHP_accelerators
  4. 查看一些PHP加速器:http://en.wikipedia.org/wiki/List_of_PHP_accelerators

SQL Relay effectively allows PHP to tke advantage of connection pooling and that will give much better performance for a high volume database application.

SQL Relay有效地允许PHP利用连接池,这将为高容量数据库应用程序提供更好的性能。

PHP accelrators (generally speaking) cache the PHP opcodes which saves the overhead of interpreting the PHP code with every request.


Good Luck!



I'd recommend memcaching, too.


Write your data into a memcache and have a periodically running job aggregate it and do the inserts.


Writing to an actual file would probably DECREASE your performance since file system access is mostly slower than talking to a database that can handle writing access much more efficiently.




Writing to a file is great, but you still need to synchronize your file writes which puts you back to square one.



  • MQ system, though sometimes the DB can be faster,
  • MQ系统,虽然有时DB可以更快,

  • On the MQ idea: in-memory queue. I know you said PHP, but I've seen this done quite well in Java/Servlets,
  • 关于MQ的想法:内存中的队列。我知道你说的是PHP,但是我在Java / Servlets中看到这个做得很好,

  • Depending on what it is you're tracking, you can deploy a static file into a CDN (the cloud thing you talked about) and aggregate the access logs in batch. Allows you to rent scaling out,
  • 根据您正在跟踪的内容,您可以将静态文件部署到CDN(您谈到的云事物)中,并批量聚合访问日志。允许您租借扩展,

  • INSERT DELAYED good idea but I don't know what the backlog/queue size is for that in MySQL? (anyone)
  • INSERT DELAYED好主意,但我不知道MySQL的积压/队列大小是什么? (任何人)



Since you're tracking impressions, what if try only saving, say, one in every 5. Then you still have a completely "random" sample, and you can just apply the percentages to the bigger dataset.
