需要替代Ruby on Rails项目的过滤器/观察者

时间:2023-02-06 08:16:04

Rails has a nice set of filters (before_validation, before_create, after_save, etc) as well as support for observers, but I'm faced with a situation in which relying on a filter or observer is far too computationally expensive. I need an alternative.

Rails有一组很好的过滤器(before_validation,before_create,after_save等)以及对观察者的支持,但我面临的情况是依赖于过滤器或观察者的计算成本太高。我需要另一种选择。

The problem: I'm logging web server hits to a large number of pages. What I need is a trigger that will perform an action (say, send an email) when a given page has been viewed more than X times. Due to the huge number of pages and hits, using a filter or observer will result in a lot of wasted time because, 99% of the time, the condition it tests will be false. The email does not have to be sent out right away (i.e. a 5-10 minute delay is acceptable).

问题:我正在将Web服务器命中记录到大量页面。我需要的是一个触发器,当给定页面被查看超过X次时,它将执行一个动作(比如发送电子邮件)。由于页面和命中数量巨大,使用过滤器或观察器会导致大量浪费时间,因为99%的时间,它测试的条件都是错误的。电子邮件不必立即发送(即可接受5-10分钟的延迟)。

What I am instead considering is implementing some kind of process that sweeps the database every 5 minutes or so and checks to see which pages have been hit more than X times, recording that state in a new DB table, then sending out a corresponding email. It's not exactly elegant, but it will work.

我正在考虑的是实现某种过程,每5分钟左右扫描数据库并检查哪些页面被击中超过X次,将该状态记录在新的数据库表中,然后发送相应的电子邮件。它不是很优雅,但它会起作用。

Does anyone else have a better idea?

还有其他人有更好的主意吗?

4 个解决方案

#1


Rake tasks are nice! But you will end up writing more custom code for each background job you add. Check out the Delayed Job plugin http://blog.leetsoft.com/2008/2/17/delayed-job-dj

耙子任务很好!但是,您最终会为添加的每个后台作业编写更多自定义代码。查看延迟作业插件http://blog.leetsoft.com/2008/2/17/delayed-job-dj

DJ is an asynchronous priority queue that relies on one simple database table. According to the DJ website you can create a job using Delayed::Job.enqueue() method shown below.

DJ是一个依赖于一个简单数据库表的异步优先级队列。根据DJ网站,您可以使用如下所示的Delayed :: Job.enqueue()方法创建作业。

class NewsletterJob < Struct.new(:text, :emails)
  def perform
    emails.each { |e| NewsletterMailer.deliver_text_to_email(text, e) }
  end    
end  

Delayed::Job.enqueue( NewsletterJob.new("blah blah", Customers.find(:all).collect(&:email)) )

#2


I was once part of a team that wrote a custom ad server, which has the same requirements: monitor the number of hits per document, and do something once they reach a certain threshold. This server was going to be powering an existing very large site with a lot of traffic, and scalability was a real concern. My company hired two Doubleclick consultants to pick their brains.

我曾经是编写自定义广告服务器的团队的一员,该服务器具有相同的要求:监控每个文档的点击次数,并在达到某个阈值后执行某些操作。这台服务器将为拥有大量流量的现有超大型站点供电,可扩展性是一个真正的问题。我的公司聘请了两位Doubleclick顾问来挑选他们的大脑。

Their opinion was: The fastest way to persist any information is to write it in a custom Apache log directive. So we built a site where every time someone would hit a document (ad, page, all the same), the server that handled the request would write a SQL statement to the log: "INSERT INTO impressions (timestamp, page, ip, etc) VALUES (x, 'path/to/doc', y, etc);" -- all output dynamically with data from the webserver. Every 5 minutes, we would gather these files from the web servers, and then dump them all in the master database one at a time. Then, at our leisure, we could parse that data to do anything we well pleased with it.

他们的意见是:保留任何信息的最快方法是将其写入自定义Apache日志指令。所以我们建立了一个网站,每次有人点击文档(广告,页面,所有相同)时,处理请求的服务器会将SQL语句写入日志:“插入INTO印象(时间戳,页面,IP等) )VALUES(x,'path / to / doc',y等);“ - 使用来自网络服务器的数据动态输出所有内容。每隔5分钟,我们将从Web服务器收集这些文件,然后一次一个地将它们全部转储到master数据库中。然后,在闲暇时,我们可以解析这些数据,做任何我们很满意的事情。

Depending on your exact requirements and deployment setup, you could do something similar. The computational requirement to check if you're past a certain threshold is still probably even smaller (guessing here) than executing the SQL to increment a value or insert a row. You could get rid of both bits of overhead by logging hits (special format or not), and then periodically gather them, parse them, input them to the database, and do whatever you want with them.

根据您的具体要求和部署设置,您可以执行类似的操作。检查您是否超过某个阈值的计算要求仍然可能比执行SQL增加值或插入行更小(在此猜测)。您可以通过记录命中(特殊格式或非特定格式)来消除两个开销,然后定期收集它们,解析它们,将它们输入到数据库中,并用它们做任何你想做的事情。

#3


When saving your Hit model, update a redundant column in your Page model that stores a running total of hits, this costs you 2 extra queries, so maybe each hit takes twice as long to process, but you can decide if you need to send the email with a simple if.

保存Hit模型时,更新页面模型中存储运行总点击数的冗余列,这会花费2个额外的查询,因此每次点击可能需要两倍的处理时间,但您可以决定是否需要发送用简单的if发送电子邮件。

Your original solution isn't bad either.

你原来的解决方案也不错。

#4


I have to write something here so that * code-highlights the first line.

我必须在这里写一些东西,以便*代码突出显示第一行。

class ApplicationController < ActionController::Base
  before_filter :increment_fancy_counter

  private

  def increment_fancy_counter
    # somehow increment the counter here
  end
end

# lib/tasks/fancy_counter.rake
namespace :fancy_counter do
  task :process do
    # somehow process the counter here
  end
end

Have a cron job run rake fancy_counter:process however often you want it to run.

有一个cron作业运行rake fancy_counter:process但是经常你希望它运行。

#1


Rake tasks are nice! But you will end up writing more custom code for each background job you add. Check out the Delayed Job plugin http://blog.leetsoft.com/2008/2/17/delayed-job-dj

耙子任务很好!但是,您最终会为添加的每个后台作业编写更多自定义代码。查看延迟作业插件http://blog.leetsoft.com/2008/2/17/delayed-job-dj

DJ is an asynchronous priority queue that relies on one simple database table. According to the DJ website you can create a job using Delayed::Job.enqueue() method shown below.

DJ是一个依赖于一个简单数据库表的异步优先级队列。根据DJ网站,您可以使用如下所示的Delayed :: Job.enqueue()方法创建作业。

class NewsletterJob < Struct.new(:text, :emails)
  def perform
    emails.each { |e| NewsletterMailer.deliver_text_to_email(text, e) }
  end    
end  

Delayed::Job.enqueue( NewsletterJob.new("blah blah", Customers.find(:all).collect(&:email)) )

#2


I was once part of a team that wrote a custom ad server, which has the same requirements: monitor the number of hits per document, and do something once they reach a certain threshold. This server was going to be powering an existing very large site with a lot of traffic, and scalability was a real concern. My company hired two Doubleclick consultants to pick their brains.

我曾经是编写自定义广告服务器的团队的一员,该服务器具有相同的要求:监控每个文档的点击次数,并在达到某个阈值后执行某些操作。这台服务器将为拥有大量流量的现有超大型站点供电,可扩展性是一个真正的问题。我的公司聘请了两位Doubleclick顾问来挑选他们的大脑。

Their opinion was: The fastest way to persist any information is to write it in a custom Apache log directive. So we built a site where every time someone would hit a document (ad, page, all the same), the server that handled the request would write a SQL statement to the log: "INSERT INTO impressions (timestamp, page, ip, etc) VALUES (x, 'path/to/doc', y, etc);" -- all output dynamically with data from the webserver. Every 5 minutes, we would gather these files from the web servers, and then dump them all in the master database one at a time. Then, at our leisure, we could parse that data to do anything we well pleased with it.

他们的意见是:保留任何信息的最快方法是将其写入自定义Apache日志指令。所以我们建立了一个网站,每次有人点击文档(广告,页面,所有相同)时,处理请求的服务器会将SQL语句写入日志:“插入INTO印象(时间戳,页面,IP等) )VALUES(x,'path / to / doc',y等);“ - 使用来自网络服务器的数据动态输出所有内容。每隔5分钟,我们将从Web服务器收集这些文件,然后一次一个地将它们全部转储到master数据库中。然后,在闲暇时,我们可以解析这些数据,做任何我们很满意的事情。

Depending on your exact requirements and deployment setup, you could do something similar. The computational requirement to check if you're past a certain threshold is still probably even smaller (guessing here) than executing the SQL to increment a value or insert a row. You could get rid of both bits of overhead by logging hits (special format or not), and then periodically gather them, parse them, input them to the database, and do whatever you want with them.

根据您的具体要求和部署设置,您可以执行类似的操作。检查您是否超过某个阈值的计算要求仍然可能比执行SQL增加值或插入行更小(在此猜测)。您可以通过记录命中(特殊格式或非特定格式)来消除两个开销,然后定期收集它们,解析它们,将它们输入到数据库中,并用它们做任何你想做的事情。

#3


When saving your Hit model, update a redundant column in your Page model that stores a running total of hits, this costs you 2 extra queries, so maybe each hit takes twice as long to process, but you can decide if you need to send the email with a simple if.

保存Hit模型时,更新页面模型中存储运行总点击数的冗余列,这会花费2个额外的查询,因此每次点击可能需要两倍的处理时间,但您可以决定是否需要发送用简单的if发送电子邮件。

Your original solution isn't bad either.

你原来的解决方案也不错。

#4


I have to write something here so that * code-highlights the first line.

我必须在这里写一些东西,以便*代码突出显示第一行。

class ApplicationController < ActionController::Base
  before_filter :increment_fancy_counter

  private

  def increment_fancy_counter
    # somehow increment the counter here
  end
end

# lib/tasks/fancy_counter.rake
namespace :fancy_counter do
  task :process do
    # somehow process the counter here
  end
end

Have a cron job run rake fancy_counter:process however often you want it to run.

有一个cron作业运行rake fancy_counter:process但是经常你希望它运行。