如何在apache中处理共享服务器上的带宽计费？

What solutions do you have in place for handling bandwidth billing for your vhosts on a shared environment in apache? If you are using log parsing, does your solution scale well when the logs become very very large? Anyone using any sort of module out there for this?

您在apache的共享环境中处理vhost的带宽计费有哪些解决方案?如果您正在使用日志解析,那么当日志变得非常大时,您的解决方案是否可以很好地扩展?有人使用任何类型的模块吗?

6 个解决方案

#1

There exist certain modules for Apache 1.x and 2.x that will allow you to set a maximum on the transfer amount, most of them keep track using the scoreboard file that Apache generates (when mod_status is enabled with ExtendedStatus on). The one I still have bookmarked from when I was looking for one is mod_curb, however it is not complete and at the current moment in time looks to only work on a server-wide scale and not for individual virtual hosts.

Apache 1.x和2.x中存在某些模块,允许您在传输量上设置最大值,其中大多数模块使用Apache生成的记分板文件(当使用ExtendedStatus启用mod_status时)跟踪。我在寻找一个时仍然收藏的那个是mod_curb,但是它并不完整,并且在当前时刻看起来只能在服务器范围内工作而不能用于单个虚拟主机。

Apache modules can be set to be outbound filters, so you could write a costume module that would sit at the end of the chain, and add up all the outgoing packets, using the data that APR provides you can then add it to a counter for that specific domain/sub-domain. After that you have a choice of what to do with the data.

Apache模块可以设置为出站过滤器,因此您可以编写一个位于链末端的服装模块,并将所有传出的数据包相加,使用APR提供的数据然后将其添加到计数器中特定的域/子域。之后,您可以选择如何处理数据。

For specific examples, take a look at mod_deflate that Apache provides, to see how it sits at the end of the chain and compresses everything but the headers the server sends out. This should give you a good start.

有关具体示例,请查看Apache提供的mod_deflate,以查看它如何位于链的末尾并压缩除服务器发出的标头之外的所有内容。这应该会给你一个良好的开端。

As for log based processing, it becomes slower the more logs exist. This is just the nature of the beast. When we were using a log based solution we had a custom perl script that ran every 15 minutes. Eventually it would take longer than 15 minutes to parse, and since we had proper locking after a while multiple of these log processing perl scripts were now running, all waiting on each other. We ended up re-writing it with a simple call to tail -F, which then let perl parse each and every request as it came in, while not entirely efficient, it worked. The upside of that was that we were now able to update traffic statistics in near realtime so that clients were updated sooner rather than later if they went over their limits.

对于基于日志的处理,存在的日志越多,速度就越慢。这只是野兽的本质。当我们使用基于日志的解决方案时,我们有一个每15分钟运行一次的自定义perl脚本。最终解析需要花费超过15分钟的时间,因为我们在一段时间后正在运行这些日志处理perl脚本的多个时间,所有这些脚本都在等待。我们最后用一个简单的调用tail -F来重写它,然后让perl解析每个请求,虽然不是完全有效,但是它有效。这样做的好处是我们现在能够近乎实时地更新流量统计数据,以便客户在超出限制时尽快更新。

#2

You could go the poor man's route, and use Webalizer or Awstats. Both of these will give you an idea of traffic based off of access logs, and can be done on a per virtual host basis. In the case of Awstats, I know once you start doing 10GB+ of traffic daily, it starts to consume resources. You can always nice it, but then you'll get your data next week, rather than when you actually need it. In the past with Webalizer I've had to use some hackery to get it to handle large access logs, by chunking up the logs to smaller pieces that it could manage. It didn't provide as many useful metrics from what I've done with it, but I've also never needed to save a server from it :)

你可以去穷人的路线,并使用Webalizer或Awstats。这两种方法都可以让您了解基于访问日志的流量,并且可以基于每个虚拟主机完成。在Awstats的情况下,我知道一旦你开始每天做10GB +的流量,就会开始消耗资源。你可以随时使用它,但是下周你将得到你的数据,而不是你真正需要的时候。在过去使用Webalizer时,我不得不使用一些hackery来处理大型访问日志,方法是将日志分解为可以管理的较小部分。它没有提供我用它做过的那么多有用的指标,但我也从来不需要从中保存服务器:)

#3

If virtual host does not have own IP, there is no easier way than logfile parsing. Just use mod_logio to calculate actual bytes transferred. mod_logio handles broken connections, compressed data etc. correctly. You should be able to parse logs realtime using piped logs. Use BufferedLogs to scale further (just check that parser handles lines broken when buffered correctly). Parser should save data periodically (like every minute) somewhere, just avoid locking issues as parsing must not slow down httpd. If httpd connections is spending time in L-state at server-status, you are too slow. After you have numbers, you can sum then further and then save data to billing system.

如果虚拟主机没有自己的IP,则没有比日志文件解析更简单的方法。只需使用mod_logio来计算传输的实际字节数。 mod_logio正确处理断开的连接,压缩数据等。您应该能够使用管道日志实时解析日志。使用BufferedLogs进一步缩放(只需检查解析器在正确缓冲时处理断行的行)。解析器应该定期(比如每分钟)保存数据,只是避免锁定问题,因为解析不能减慢httpd。如果httpd连接在服务器状态的L状态下花费时间,则速度太慢。有了数字后,您可以进一步求和,然后将数据保存到计费系统。

If you save billing logs as file too you can correct and doublecheck realtime traffic calculations. If you boot httpd you can end up missing some lines. But generally losing couple hundred requests is acceptable as it less than seconds worth on a high volume site.

如果您还将计费日志保存为文件,则可以更正并双重检查实时流量计算。如果你启动httpd,你最终可能会错过一些行。但通常会丢失几百个请求,因为它在高容量网站上的价值不到几秒钟。

There is modules that try to handle and limit bandwidth, like mod_cband and mod_bw. But they don't work when you have same vhost on multiple machines. I guess they would work ok on smaller scale.

有些模块试图处理和限制带宽,比如mod_cband和mod_bw。但是当你在多台机器上使用相同的vhost时它们不起作用。我猜他们会在较小规模上正常工作。

If you have IP per vhost you could try IP based methods like feeding firewall logs to traffic calculator. Simple way is to use iptables.

如果你有每个vhost的IP,你可以尝试基于IP的方法,如将防火墙日志提供给流量计算器。简单的方法是使用iptables。

#4

Although we use IIS rather than apache we do use log file analysis for bandwidth billing (and bandwidth profiling / analysis). We use a custom application to load data collected in the log files in one hour increments, and act upon any required notifications or bandwidth overuse.

虽然我们使用IIS而不是apache,但我们确实使用日志文件分析进行带宽计费(以及带宽分析/分析)。我们使用自定义应用程序以一小时为增量加载日志文件中收集的数据,并根据任何所需的通知或带宽过度使用。

The log file loader runs as a low priority process, so as not to interupt operation of the server. Even on high usage servers with a large number of sites, processing takes less than 15 minutes, so we don't see scalability as a problem with this methodology.

日志文件加载器作为低优先级进程运行,以便不中断服务器的操作。即使在具有大量站点的高使用率服务器上,处理时间也不到15分钟,因此我们认为可扩展性不是此方法的问题。

There may be better ways of doing this, but this is perfectly adequate for what we need. I look forward to viewing the other responses.

可能有更好的方法来做到这一点,但这完全适合我们需要的东西。我期待着查看其他回复。

#5

It can be easily achieved with mod_cband. We've rewritten the module to fix a few bugs, provide true redundancy on restarts and incorporate FTP and Mail statistics.

使用mod_cband可以轻松实现。我们重写了模块以修复一些错误,在重新启动时提供真正的冗余并合并FTP和邮件统计信息。

http://www.howtoforge.com/mod_cband_apache2_bandwidth_quota_throttling

#6

Well mod_cband would be great, except for when i'm using it, the max_connections (the overall, total value for every client combined), decides to crawl upwards until it hits the max value i've set. when it does reach the highest value, it just stays there and leaves all my clients receiving a constant "503 Service Temporarily Unavailable" error.

mod_cband很好,除了当我使用它时,max_connections(每个客户端的总体值,总值),决定向上爬,直到它达到我设置的最大值。当它达到最高值时,它就会停留在那里并让我的所有客户端都收到一个“503 Service Temporarily Unavailable”错误。

for example, i set "CbandSpeed 1000Mbps 500 1200", and the server connections crawls up to 1200 in about 8 hrs, then stays there. at this point, i count the total number of connections under Remote Clients in the mod_cband status window, and i see around 50. i've also used ps aux and i see around the same amount (~50) open http processes, which is normal, except for the fact that nobody can access the site at all because of the 503 errors.

例如,我设置“CbandSpeed 1000Mbps 500 1200”,服务器连接在大约8小时内爬升到1200,然后停留在那里。在这一点上,我在mod_cband状态窗口中计算远程客户端下的连接总数,我看到大约50.我也使用了ps aux,我看到大约相同数量(~50)的开放http进程,这是正常,但由于503错误,没有人可以访问该网站。

Any ideas what could be wrong, or can this be fixed?

任何想法可能是错的,还是可以修复?

#1