如何在使用mod_perl时仅在编译时执行低效代码?

时间:2022-11-25 00:23:52

I've been benchmarking the performance of a framework I'm writing in Perl and I'm getting a 50% decrease in requests per second over our existing codebase (some hit is understandable, because we're going from procedural spaghetti code to an OOP MVC framework).

我一直在对我在Perl中编写的框架的性能进行基准测试,并且我的每秒请求数比现有代码库减少了50%(有些命中是可以理解的,因为我们将从程序意义上的代码转换为OOP MVC框架)。

The application is running under mod_perl, and I've added Moose and all my framework code into the startup.pl script, which itself doubled my requests per second amount. I'm looking to further enhance this number to get it as close as possible to the existing amount. The argument is there that this is premature optimisation, but there are a couple of glaring inefficiencies that I'd like to fix and see how it effects performance.

应用程序在mod_perl下运行,我已将Moose和我的所有框架代码添加到startup.pl脚本中,该脚本本身将每秒请求数量增加了一倍。我希望进一步增强这个数字,使其尽可能接近现有数量。我们认为这是不成熟的优化,但是我想解决一些明显的低效问题,看看它如何影响性能。

Like most frameworks, I have a configuration file and a dispatcher. The config part is handled by Config::General, so a bit of IO and parsing is involved to get my config file loaded into the app. The biggest problem I see here is that I'm doing this for EVERY REQUEST that comes in!

像大多数框架一样,我有一个配置文件和一个调度程序。配置部分由Config :: General处理,因此需要一些IO和解析来将我的配置文件加载到应用程序中。我在这里看到的最大问题是,我正在为每个请求进行此操作!

Running Devel::Dprof on my app points to Config::General::BEGIN and a bunch of related IO modules as one of the major slow points that isn't Moose. So what I'd like to do, and what makes a lot more sense in hindsight is take advantage of mod_perl's persistence and the startup.pl compilation stuff to only do the work to load in the config file once - when the server starts.

在我的应用程序上运行Devel :: Dprof指向Config :: General :: BEGIN和一堆相关的IO模块作为不是Moose的主要慢点之一。所以我想要做的事情,以及在后见之明更有意义的是利用mod_perl的持久性和startup.pl编译的东西,只做一次加载配置文件的工作 - 当服务器启动时。

The problem is that I'm not too familiar with how this would work.

问题是我不太熟悉这是如何工作的。

Currently each project has a PerlHandler bootstrapping class which is pretty lean and looks like this:

目前每个项目都有一个非常精简的PerlHandler引导类,如下所示:

use MyApp; 
MyApp->new(config_file => '/path/to/site.config')->run();

MyApp.pm inherits from the framework Project module, which has this code:

MyApp.pm继承自框架Project模块,该模块具有以下代码:

my $config = Config::General->new(
                -ConfigFile => $self->config_file,
                -InterPolateVars => 1,
             );    

$self->config({$config->getall});

To only do this at compile time, both my bootstrap and Project base modules will have to change (I think), but I'm pretty unsure as to what changes to make and still keep the code nice and lean. Can anyone point me in the right direction here?

要仅在编译时执行此操作,我的引导程序和项目基础模块都必须更改(我认为),但我不确定要进行哪些更改并仍然保持代码良好和精益。任何人都能指出我在正确的方向吗?

UPDATE

I tried the BEGIN BLOCK in each project module approach as described by ysth in his answer. So I now have:

我在每个项目模块方法中尝试了BEGIN BLOCK,如ysth在他的回答中所描述的那样。所以我现在有:

package MyApp::bootstrap;
use MyApp;

my $config;
BEGIN
{
    $config = {Config::General->new(...)->getall};        
}

sub handler { ..etc.
    MyApp->new(config => $config)->run();

This quick change alone gave me a 50% increase in requests per second, confirming my thoughts that the config file was a major bottleneck worth fixing. The benchmark figure on our crotchety old dev machine is 60rps, and my framework has went from 30rps to 45rps with this change alone. For those who say Moose is slow and has a compile time hit.. I got the same (50%) increase when compiling all my Moose code at start-up as I did from pre-compiling my config file.

仅这一快速变化就让我每秒的请求增加了50%,这证实了我认为配置文件是一个值得修复的主要瓶颈。我们的crotchety旧开发机器上的基准数字是60rps,我的框架已经从30rps变为45rps单独进行此更改。对于那些说Moose很慢并且编译时间很短的人来说......在启动时编译我所有的Moose代码时我得到了相同的(50%)增加,就像我预先编译我的配置文件一样。

The only problem I have now is that this violates the DRY principal since the same Config::General->new code is in every BEGIN block with only the path to the config file differing. I have a few different strategies to limit this, but I just wanted to post the results of this change.

我现在唯一的问题是这违反了DRY主体,因为相同的Config :: General->新代码在每个BEGIN块中,只有配置文件的路径不同。我有一些不同的策略来限制这个,但我只是想发布这个变化的结果。

6 个解决方案

#1


10  

Assuming your applications don't change the config at all, move it into a begin block:

假设您的应用程序根本不更改配置,请将其移至开始块:

# this code goes at file scope
my $config;
BEGIN {
    $config = { Config::General->new( ... )->getall }
}

# when creating a new instance
$self->config( $config );

And make sure all your modules are compiled in startup.pl.

并确保所有模块都在startup.pl中编译。

You could get fancier, and have a singleton class provide the config hash, but you don't need to.

你可以得到更好的,并有一个单例类提供配置哈希,但你不需要。

#2


4  

If you can make your Moose classes immutable, that might give you another speed bump.

如果你可以使你的Moose类不可变,那可能会给你另一个减速带。

#3


3  

A module's import sub is executed at compile time, so we could use that to reduce/eliminate the DRY of ysth's answer.

模块的import子句在编译时执行,因此我们可以使用它来减少/消除ysth答案的DRY。

In the following example we use an import method to read the configuration file with the arguments given to us and then push that configuration into the calling package.

在下面的示例中,我们使用import方法读取配置文件,其中包含给我们的参数,然后将该配置推送到调用包中。

The caveat being any $config variable in the calling package is going to get wiped out by this.

警告是调用包中的任何$ config变量都会被这个消失掉。

package Foo_Config;
use English qw(-no_match_vars);
sub import {
   my ($self, @cfg) = @ARG;
   my $call_pkg     = caller;
   my $config       = {Config::General->new(@cfg)->getall};
   do{ # this will create the $config variable in the calling package.
       no strict 'refs';
       ${$call_pkg . '::config'} = $config;
   };
   return;
}

package MyApp;
# will execute Foo_Config->import('/path/to/site.config') at compile time.
use Foo_Config '/path/to/site.config'; 

#4


1  

I had the same problems in an HTML::Mason framework install, and found this to work rather well: In httpd.conf:

我在HTML :: Mason框架安装中遇到了同样的问题,发现这个问题工作得相当好:在httpd.conf中:

PerlRequire handler.pl
<FilesMatch "\.mhtml$">
  SetHandler perl-script
  PerlHandler YourModule::Mason
</FilesMatch>

In your handler.pl file, you define all of your static items like your config, database handles, etc. This defines them in the scope of YourModule::Mason which is compiled when the apache thread starts (new threads will obviously have an inherent overhead). YourModule::Mason then has a handler method which handles the request.

在您的handler.pl文件中,您定义了所有静态项,如配置,数据库句柄等。这在YourModule :: Mason的范围内定义它们是在apache线程启动时编译的(新线程显然具有固有的高架)。然后,YourModule :: Mason有一个处理请求的处理程序方法。

I will admit that there may be some magic that is happening in HTML::Mason that is helping me with this, but it works for me, maybe for you?

我承认在HTML :: Mason中可能会有一些神奇的东西帮我解决这个问题,但它对我有用,也许适合你?

#5


0  

A common way of speeding up such things with few changes is to simply use global variables and cache state in them between invocations of the same Apache process:

加速这些事情的常见方法是在同一个Apache进程的调用之间简单地使用全局变量和缓存状态:

use vars qw ($config);
# ...
$config = Config::General->new( ... )->getall
    unless blessed($config); # add more suitable test here

It's not very clean and can lead to obscure bugs (although "my $var" leads to more in my experience) and it sometimes eats a lot of memory, but many (repeated) expensive initialization statements can be avoided this way. The advantage over using BEGIN{}; code only is that you can re-initialize based on other events as well without needing to restart apache or killing your process (e.g. by including the timestamp of a file on disk in the test above).

它不是很干净并且可能导致模糊的错误(虽然“我的$ var”导致我的经验更多)并且它有时会占用大量内存,但是可以通过这种方式避免许多(重复的)昂贵的初始化语句。使用BEGIN {}的优势;代码只是您可以根据其他事件重新初始化,而无需重新启动apache或终止进程(例如,通过在上面的测试中将文件的时间戳包含在磁盘上)。

Watch out for the gotchas though: an easy way to break in

但要注意陷阱:一个容易入侵的方法

#6


-2  

JackM has the right idea.

JackM有正确的想法。

By loading all of your classes and instantiating your Application-level objects (in your case, the configuration) in the "Mother" Apache process, You Don't have to compile them each time a new worker spawns, since they're already available and in memory. The very meticulous amongst us add a "use" line for every module that their application uses regularly. If you don't load your packages and modules in the mother ship, each worker takes not only the performance hit of loading the modules, but does not gain the benefit of memory sharing that modern operating systems provide.

通过加载所有类并在“Mother”Apache进程中实例化应用程序级对象(在您的情况下,配置),您不必在每次新工作程序生成时编译它们,因为它们已经可用并在记忆中。我们非常细致,为他们的应用程序定期使用的每个模块添加“使用”行。如果您不在母船中加载软件包和模块,则每个工作人员不仅要获得加载模块的性能,还要获得现代操作系统提供的内存共享的好处。

It is really the other half of the difference between mod_perl and CGI. With the first half being mod_perl's persistent perl-engine vs CGI's respawning perl for each invocation.

它实际上是mod_perl和CGI之间差异的另一半。前半部分是每次调用时mod_perl的持久性perl-engine和CGI的重生perl。

#1


10  

Assuming your applications don't change the config at all, move it into a begin block:

假设您的应用程序根本不更改配置,请将其移至开始块:

# this code goes at file scope
my $config;
BEGIN {
    $config = { Config::General->new( ... )->getall }
}

# when creating a new instance
$self->config( $config );

And make sure all your modules are compiled in startup.pl.

并确保所有模块都在startup.pl中编译。

You could get fancier, and have a singleton class provide the config hash, but you don't need to.

你可以得到更好的,并有一个单例类提供配置哈希,但你不需要。

#2


4  

If you can make your Moose classes immutable, that might give you another speed bump.

如果你可以使你的Moose类不可变,那可能会给你另一个减速带。

#3


3  

A module's import sub is executed at compile time, so we could use that to reduce/eliminate the DRY of ysth's answer.

模块的import子句在编译时执行,因此我们可以使用它来减少/消除ysth答案的DRY。

In the following example we use an import method to read the configuration file with the arguments given to us and then push that configuration into the calling package.

在下面的示例中,我们使用import方法读取配置文件,其中包含给我们的参数,然后将该配置推送到调用包中。

The caveat being any $config variable in the calling package is going to get wiped out by this.

警告是调用包中的任何$ config变量都会被这个消失掉。

package Foo_Config;
use English qw(-no_match_vars);
sub import {
   my ($self, @cfg) = @ARG;
   my $call_pkg     = caller;
   my $config       = {Config::General->new(@cfg)->getall};
   do{ # this will create the $config variable in the calling package.
       no strict 'refs';
       ${$call_pkg . '::config'} = $config;
   };
   return;
}

package MyApp;
# will execute Foo_Config->import('/path/to/site.config') at compile time.
use Foo_Config '/path/to/site.config'; 

#4


1  

I had the same problems in an HTML::Mason framework install, and found this to work rather well: In httpd.conf:

我在HTML :: Mason框架安装中遇到了同样的问题,发现这个问题工作得相当好:在httpd.conf中:

PerlRequire handler.pl
<FilesMatch "\.mhtml$">
  SetHandler perl-script
  PerlHandler YourModule::Mason
</FilesMatch>

In your handler.pl file, you define all of your static items like your config, database handles, etc. This defines them in the scope of YourModule::Mason which is compiled when the apache thread starts (new threads will obviously have an inherent overhead). YourModule::Mason then has a handler method which handles the request.

在您的handler.pl文件中,您定义了所有静态项,如配置,数据库句柄等。这在YourModule :: Mason的范围内定义它们是在apache线程启动时编译的(新线程显然具有固有的高架)。然后,YourModule :: Mason有一个处理请求的处理程序方法。

I will admit that there may be some magic that is happening in HTML::Mason that is helping me with this, but it works for me, maybe for you?

我承认在HTML :: Mason中可能会有一些神奇的东西帮我解决这个问题,但它对我有用,也许适合你?

#5


0  

A common way of speeding up such things with few changes is to simply use global variables and cache state in them between invocations of the same Apache process:

加速这些事情的常见方法是在同一个Apache进程的调用之间简单地使用全局变量和缓存状态:

use vars qw ($config);
# ...
$config = Config::General->new( ... )->getall
    unless blessed($config); # add more suitable test here

It's not very clean and can lead to obscure bugs (although "my $var" leads to more in my experience) and it sometimes eats a lot of memory, but many (repeated) expensive initialization statements can be avoided this way. The advantage over using BEGIN{}; code only is that you can re-initialize based on other events as well without needing to restart apache or killing your process (e.g. by including the timestamp of a file on disk in the test above).

它不是很干净并且可能导致模糊的错误(虽然“我的$ var”导致我的经验更多)并且它有时会占用大量内存,但是可以通过这种方式避免许多(重复的)昂贵的初始化语句。使用BEGIN {}的优势;代码只是您可以根据其他事件重新初始化,而无需重新启动apache或终止进程(例如,通过在上面的测试中将文件的时间戳包含在磁盘上)。

Watch out for the gotchas though: an easy way to break in

但要注意陷阱:一个容易入侵的方法

#6


-2  

JackM has the right idea.

JackM有正确的想法。

By loading all of your classes and instantiating your Application-level objects (in your case, the configuration) in the "Mother" Apache process, You Don't have to compile them each time a new worker spawns, since they're already available and in memory. The very meticulous amongst us add a "use" line for every module that their application uses regularly. If you don't load your packages and modules in the mother ship, each worker takes not only the performance hit of loading the modules, but does not gain the benefit of memory sharing that modern operating systems provide.

通过加载所有类并在“Mother”Apache进程中实例化应用程序级对象(在您的情况下,配置),您不必在每次新工作程序生成时编译它们,因为它们已经可用并在记忆中。我们非常细致,为他们的应用程序定期使用的每个模块添加“使用”行。如果您不在母船中加载软件包和模块,则每个工作人员不仅要获得加载模块的性能,还要获得现代操作系统提供的内存共享的好处。

It is really the other half of the difference between mod_perl and CGI. With the first half being mod_perl's persistent perl-engine vs CGI's respawning perl for each invocation.

它实际上是mod_perl和CGI之间差异的另一半。前半部分是每次调用时mod_perl的持久性perl-engine和CGI的重生perl。