使用WWW :: Mechanize透明地处理GZip编码内容

I am using WWW::Mechanize and currently handling HTTP responses with the 'Content-Encoding: gzip' header in my code by first checking the response headers and then using IO::Uncompress::Gunzip to get the uncompressed content.

我正在使用WWW :: Mechanize并且当前正在我的代码中使用'Content-Encoding:gzip'标头处理HTTP响应,首先检查响应头,然后使用IO :: Uncompress :: Gunzip来获取未压缩的内容。

However I would like to do this transparently so that WWW::Mechanize methods like form(), links() etc work on and parse the uncompressed content. Since WWW::Mechanize is a sub-class of LWP::UserAgent, I would prefer to use the LWP::UA::handlers to do this.

但是我想透明地这样做,以便像form(),links()等WWW :: Mechanize方法处理和解析未压缩的内容。由于WWW :: Mechanize是LWP :: UserAgent的子类,我更愿意使用LWP :: UA ::处理程序来执行此操作。

While I have been partly successful (I can print the uncompressed content for example), I am unable to do this transparently in a way that I can call

虽然我已经取得了部分成功(例如我可以打印未压缩的内容),但我无法以我可以调用的方式透明地执行此操作

$mech->forms();

In summary: How do I "replace" the content inside the $mech object so that from that point onwards, all WWW::Mechanize methods work as if the Content-Encoding never happened?

总结:如何“替换”$ mech对象中的内容,以便从那时起,所有WWW :: Mechanize方法都像是从未发生过Content-Encoding一样工作?

I would appreciate your attention and help. Thanks

我将非常感谢您的关注和帮助。谢谢

3 个解决方案

#1

WWW::Mechanize::GZip, I think.

WWW :: Mechanize :: GZip,我想。

#2

It looks to me like you can replace it by using the $res->content( $bytes ) member.

在我看来,你可以使用$ res-> content($ bytes)成员替换它。

By the way, I found this stuff by looking at the source of LWP::UserAgent, then HTTP::Response, then HTTP::Message.

顺便说一句,我通过查看LWP :: UserAgent,然后是HTTP :: Response,然后是HTTP :: Message来找到这些东西。

#3

It is built in with UserAgent and thus Mechanize. One MAJOR caveat to save you some hair

它内置于UserAgent,因此是Mechanize。一个主要的警告,为你节省一些头发

-To debug, make sure you check for error $@ after the call to decoded_content.

- 要调试,请确保在调用decode_content后检查错误$ @。

$html = $r->decoded_content;
die $@ if $@;

Better yet, look through the source of HTTP::Message and make sure all the support packages are there

更好的是,查看HTTP :: Message的来源并确保所有支持包都在那里

In my case, decoded_content returned undef while content is raw binary, and I went on a wild goose chase. UserAgent will set the error flag on failure to decode, but Mechanize will just ignore it (It doesn't check or log the incidence as its own error/warning).

就我而言,decode_content返回undef,而内容是原始二进制,我继续疯狂追逐。 UserAgent将在解码失败时设置错误标志,但Mechanize将忽略它(它不检查或将事件记录为自己的错误/警告)。

In my case $@ sez: "Can't find IO/HTML.pm .. It was eval'ed

在我的情况下$ @ sez:“找不到IO / HTML.pm ..它是eval'ed

After having to dive into the source, I find out the built-in decoding process is long, meticulous, and arduous, covering just about every scenario and making tons of guesses (Thank you Gisle!).

在深入了解源代码后,我发现内置的解码过程漫长,细致,艰巨,几乎涵盖了每个场景并进行了大量的猜测(谢谢Gisle!)。

if you are paranoid, explicitly set the default header to be used with every request at new()

如果你是偏执狂,显式设置默认标头用于new()的每个请求

    $browser = new WWW::Mechanize('default_headers' => HTTP::Headers->new('Accept-Encoding' 
                            => scalar HTTP::Message::decodable()));

#1