最佳做法,PHP,每天跟踪数百万次展示

时间:2022-09-19 15:48:39

What do I have to do to make 20k mysql inserts per second possible (during peak hours around 1k/sec during slower times)? I've been doing some research and I've seen the "INSERT DELAYED" suggestion, writing to a flat file, "fopen(file,'a')", and then running a chron job to dump the "needed" data into mysql, etc. I've also heard you need multiple servers and "load balancers" which I've never heard of, to make something like this work. I've also been looking at these "cloud server" thing-a-ma-jigs, and their automatic scalability, but not sure about what's actually scalable.

我需要做些什么来实现每秒20k mysql插入(在高峰时段,在较慢的时间内大约1k /秒)?我一直在做一些研究,我已经看到了“INSERT DELAYED”建议,写入平面文件,“fopen(文件,'a')”,然后运行chron作业将“所需”数据转储到mysql等我也听说你需要多个服务器和“负载平衡器”,这是我从未听说过的,做这样的工作。我也一直在关注这些“云服务器”的东西,它们的自动可扩展性,但不确定实际可扩展性。

The application is just a tracker script, so if I have 100 websites that get 3 million page loads a day, there will be around 300 million inserts a day. The data will be ran through a script that will run every 15-30 minutes which will normalize the data and insert it into another mysql table.

该应用程序只是一个跟踪器脚本,所以如果我有100个网站每天可以获得300万次页面加载,那么每天将有大约3亿次插入。数据将通过一个每15-30分钟运行一次的脚本运行,该脚本将规范化数据并将其插入另一个mysql表中。

How do the big dogs do it? How do the little dogs do it? I can't afford a huge server anymore so any intuitive ways, if there are multiple ways of going at it, you smart people can think of.. please let me know :)

大狗怎么做?小狗怎么做的?我买不起大型服务器,所以任何直观的方式,如果有多种方式,你聪明的人可以想到..请让我知道:)

8 个解决方案

#1


5  

How do the big dogs do it?

大狗怎么做?

Multiple servers. Load balancing.

多台服务器。负载均衡。

How do the little dogs do it?

小狗怎么做的?

Multiple servers. Load balancing.

多台服务器。负载均衡。

You really want to save up inserts and push them to the database in bulk. 20k individual inserts a second is a crapton of overhead, and simplifying that down to one big insert each second eliminates most of that.

您真的想要保存插入并将它们批量推送到数据库。 20k个单独插入一秒钟是头顶上的一个掠夺者,并且每秒插入一个大插入物就可以消除大部分内容。

#2


5  

A couple of ways:

有两种方法:

Firstly, you will reach a point where you need to partition or shard your data to split it across multiple servers. This could be as simple as A-C on server1, D-F on server2 and so on.

首先,您将达到需要对数据进行分区或分片以将其拆分为多个服务器的程度。这可以像server1上的A-C,server2上的D-F一样简单,依此类推。

Secondly, defer writing to the database. Instead write to a fast memory store using either beanstalkd or memcached directly. Have another process collect those states and write aggregated data to the database. Periodically amalgamate those records into summary data.

其次,推迟写入数据库。而是直接使用beanstalkd或memcached写入快速内存存储。让另一个进程收集这些状态并将聚合数据写入数据库。定期将这些记录合并为摘要数据。

#3


2  

Thats impressive. Most of my data has been from massive inserts at once. One thing that I find is that bulk inserts do a lot better than individual inserts. Also, the design of your tables, indexes etc has a lot to do with insert speed. The problem with using cron and bulk inserting are the edge cases. (When it goes to do the inserts).

这很让人佩服。我的大多数数据都来自大量插入。我发现的一件事是批量插入比单个插入更好。此外,表格,索引等的设计与插入速度有很大关系。使用cron和批量插入的问题是边缘情况。 (当它去做插入)。

Additionally with flatfiles. You can easily run into issues with concurrency with writing the inserts to the file. If you are writting 1k+ inserts a s you'll quickly run into lots of conflicts and loss when there are issues with the file writing.

另外还有flatfiles。通过将插入写入文件,您可以轻松地遇到并发问题。如果您正在写1k +插入s,那么当文件写入有问题时,您将很快遇到许多冲突和丢失。

#4


1  

This is not a problem you can handle in PHP alone.

这不是你可以单独用PHP处理的问题。

If you have 20 000 requests a second hitting your "low-budget" (as I understood by the undertone of your question) server, then it will reach its limit before most of them reach the PHP processor (and, eventually, MySQL).

如果你有2万个请求一秒钟达到你的“低预算”(正如我对你的问题的暗示所理解的那样)服务器,那么在大多数服务器到达PHP处理器(最终是MySQL)之前它将达到极限。

If you have a traffic tracker script, you'll very likely cause problems for all the sites you track too.

如果您有流量跟踪器脚本,则很可能会导致您跟踪的所有站点出现问题。

#5


1  

PHP is not well-suited to high volume web traffic IMHO. However, the database will likely bog you down before the PHP performance - especially with PHP's connection model (opens a new connection for every requst).

PHP不适合大量网络流量恕我直言。但是,数据库可能会在PHP性能之前陷入困境 - 尤其是使用PHP的连接模型(为每个请求打开一个新连接)。

I have two suggestions for you:

我有两个建议:

  1. Give SQL Relay a look: http://sqlrelay.sourceforge.net/
  2. 给SQL Relay看一看:http://sqlrelay.sourceforge.net/

  3. Check out some PHP accelerators: http://en.wikipedia.org/wiki/List_of_PHP_accelerators
  4. 查看一些PHP加速器:http://en.wikipedia.org/wiki/List_of_PHP_accelerators

SQL Relay effectively allows PHP to tke advantage of connection pooling and that will give much better performance for a high volume database application.

SQL Relay有效地允许PHP利用连接池,这将为高容量数据库应用程序提供更好的性能。

PHP accelrators (generally speaking) cache the PHP opcodes which saves the overhead of interpreting the PHP code with every request.

PHP加速器(一般来说)缓存PHP操作码,这节省了每次请求解释PHP代码的开销。

Good Luck!

#6


1  

I'd recommend memcaching, too.

我也推荐memcaching。

Write your data into a memcache and have a periodically running job aggregate it and do the inserts.

将数据写入内存缓存并定期运行作业聚合并执行插入操作。

Writing to an actual file would probably DECREASE your performance since file system access is mostly slower than talking to a database that can handle writing access much more efficiently.

写入实际文件可能会降低您的性能,因为文件系统访问速度大大低于与能够更有效地处理写入访问的数据库交谈。

#7


0  

Writing to a file is great, but you still need to synchronize your file writes which puts you back to square one.

写入文件很棒,但是您仍然需要同步文件写入,这会使您回到原点。

Suggestions:

  • MQ system, though sometimes the DB can be faster,
  • MQ系统,虽然有时DB可以更快,

  • On the MQ idea: in-memory queue. I know you said PHP, but I've seen this done quite well in Java/Servlets,
  • 关于MQ的想法:内存中的队列。我知道你说的是PHP,但是我在Java / Servlets中看到这个做得很好,

  • Depending on what it is you're tracking, you can deploy a static file into a CDN (the cloud thing you talked about) and aggregate the access logs in batch. Allows you to rent scaling out,
  • 根据您正在跟踪的内容,您可以将静态文件部署到CDN(您谈到的云事物)中,并批量聚合访问日志。允许您租借扩展,

  • INSERT DELAYED good idea but I don't know what the backlog/queue size is for that in MySQL? (anyone)
  • INSERT DELAYED好主意,但我不知道MySQL的积压/队列大小是什么? (任何人)

#8


0  

Since you're tracking impressions, what if try only saving, say, one in every 5. Then you still have a completely "random" sample, and you can just apply the percentages to the bigger dataset.

由于您正在跟踪展示次数,如果只尝试保存,例如,每5次一次,那会怎样呢?然后,您仍然拥有一个完全“随机”的示例,您只需将百分比应用于更大的数据集即可。

#1


5  

How do the big dogs do it?

大狗怎么做?

Multiple servers. Load balancing.

多台服务器。负载均衡。

How do the little dogs do it?

小狗怎么做的?

Multiple servers. Load balancing.

多台服务器。负载均衡。

You really want to save up inserts and push them to the database in bulk. 20k individual inserts a second is a crapton of overhead, and simplifying that down to one big insert each second eliminates most of that.

您真的想要保存插入并将它们批量推送到数据库。 20k个单独插入一秒钟是头顶上的一个掠夺者,并且每秒插入一个大插入物就可以消除大部分内容。

#2


5  

A couple of ways:

有两种方法:

Firstly, you will reach a point where you need to partition or shard your data to split it across multiple servers. This could be as simple as A-C on server1, D-F on server2 and so on.

首先,您将达到需要对数据进行分区或分片以将其拆分为多个服务器的程度。这可以像server1上的A-C,server2上的D-F一样简单,依此类推。

Secondly, defer writing to the database. Instead write to a fast memory store using either beanstalkd or memcached directly. Have another process collect those states and write aggregated data to the database. Periodically amalgamate those records into summary data.

其次,推迟写入数据库。而是直接使用beanstalkd或memcached写入快速内存存储。让另一个进程收集这些状态并将聚合数据写入数据库。定期将这些记录合并为摘要数据。

#3


2  

Thats impressive. Most of my data has been from massive inserts at once. One thing that I find is that bulk inserts do a lot better than individual inserts. Also, the design of your tables, indexes etc has a lot to do with insert speed. The problem with using cron and bulk inserting are the edge cases. (When it goes to do the inserts).

这很让人佩服。我的大多数数据都来自大量插入。我发现的一件事是批量插入比单个插入更好。此外,表格,索引等的设计与插入速度有很大关系。使用cron和批量插入的问题是边缘情况。 (当它去做插入)。

Additionally with flatfiles. You can easily run into issues with concurrency with writing the inserts to the file. If you are writting 1k+ inserts a s you'll quickly run into lots of conflicts and loss when there are issues with the file writing.

另外还有flatfiles。通过将插入写入文件,您可以轻松地遇到并发问题。如果您正在写1k +插入s,那么当文件写入有问题时,您将很快遇到许多冲突和丢失。

#4


1  

This is not a problem you can handle in PHP alone.

这不是你可以单独用PHP处理的问题。

If you have 20 000 requests a second hitting your "low-budget" (as I understood by the undertone of your question) server, then it will reach its limit before most of them reach the PHP processor (and, eventually, MySQL).

如果你有2万个请求一秒钟达到你的“低预算”(正如我对你的问题的暗示所理解的那样)服务器,那么在大多数服务器到达PHP处理器(最终是MySQL)之前它将达到极限。

If you have a traffic tracker script, you'll very likely cause problems for all the sites you track too.

如果您有流量跟踪器脚本,则很可能会导致您跟踪的所有站点出现问题。

#5


1  

PHP is not well-suited to high volume web traffic IMHO. However, the database will likely bog you down before the PHP performance - especially with PHP's connection model (opens a new connection for every requst).

PHP不适合大量网络流量恕我直言。但是,数据库可能会在PHP性能之前陷入困境 - 尤其是使用PHP的连接模型(为每个请求打开一个新连接)。

I have two suggestions for you:

我有两个建议:

  1. Give SQL Relay a look: http://sqlrelay.sourceforge.net/
  2. 给SQL Relay看一看:http://sqlrelay.sourceforge.net/

  3. Check out some PHP accelerators: http://en.wikipedia.org/wiki/List_of_PHP_accelerators
  4. 查看一些PHP加速器:http://en.wikipedia.org/wiki/List_of_PHP_accelerators

SQL Relay effectively allows PHP to tke advantage of connection pooling and that will give much better performance for a high volume database application.

SQL Relay有效地允许PHP利用连接池,这将为高容量数据库应用程序提供更好的性能。

PHP accelrators (generally speaking) cache the PHP opcodes which saves the overhead of interpreting the PHP code with every request.

PHP加速器(一般来说)缓存PHP操作码,这节省了每次请求解释PHP代码的开销。

Good Luck!

#6


1  

I'd recommend memcaching, too.

我也推荐memcaching。

Write your data into a memcache and have a periodically running job aggregate it and do the inserts.

将数据写入内存缓存并定期运行作业聚合并执行插入操作。

Writing to an actual file would probably DECREASE your performance since file system access is mostly slower than talking to a database that can handle writing access much more efficiently.

写入实际文件可能会降低您的性能,因为文件系统访问速度大大低于与能够更有效地处理写入访问的数据库交谈。

#7


0  

Writing to a file is great, but you still need to synchronize your file writes which puts you back to square one.

写入文件很棒,但是您仍然需要同步文件写入,这会使您回到原点。

Suggestions:

  • MQ system, though sometimes the DB can be faster,
  • MQ系统,虽然有时DB可以更快,

  • On the MQ idea: in-memory queue. I know you said PHP, but I've seen this done quite well in Java/Servlets,
  • 关于MQ的想法:内存中的队列。我知道你说的是PHP,但是我在Java / Servlets中看到这个做得很好,

  • Depending on what it is you're tracking, you can deploy a static file into a CDN (the cloud thing you talked about) and aggregate the access logs in batch. Allows you to rent scaling out,
  • 根据您正在跟踪的内容,您可以将静态文件部署到CDN(您谈到的云事物)中,并批量聚合访问日志。允许您租借扩展,

  • INSERT DELAYED good idea but I don't know what the backlog/queue size is for that in MySQL? (anyone)
  • INSERT DELAYED好主意,但我不知道MySQL的积压/队列大小是什么? (任何人)

#8


0  

Since you're tracking impressions, what if try only saving, say, one in every 5. Then you still have a completely "random" sample, and you can just apply the percentages to the bigger dataset.

由于您正在跟踪展示次数,如果只尝试保存,例如,每5次一次,那会怎样呢?然后,您仍然拥有一个完全“随机”的示例,您只需将百分比应用于更大的数据集即可。