如何使用Perl从使用AJAX的网站收集数据？

This might seem a bit backwards, but I want to use Perl (and Curl if possible) to get data from a site that is using Ajax to fill an HTML shell with information. How do I make these Javascript calls to get the data I need?

这可能看起来有些倒退，但我想使用Perl（如果可能的话，使用Curl）从使用Ajax的站点获取数据以填充带有信息的HTML shell。如何进行这些Javascript调用以获取我需要的数据？

The website is here: http://www.jigsaw.com/showContactUpdateTab.xhtml?companyId=224230

该网站位于：http：//www.jigsaw.com/showContactUpdateTab.xhtml？companyId = 22424

2 个解决方案

#1

Remember that AJAX calls are ordinary HTTP requests, so you always should be able to perform them.

请记住，AJAX调用是普通的HTTP请求，因此您始终应该能够执行它们。

Open Firebug or Web Inspector on the website you're talking about, you'll see some XHR calls:

在您正在谈论的网站上打开Firebug或Web Inspector，您会看到一些XHR调用：

XHR finished loading: "http://www.jigsaw.com/dwr/interface/UserActionAPI.js". "http://www.jigsaw.com/dwr/call/plaincall/UserActionAPI.getMostPurchasedContacts.dwr". "http://www.jigsaw.com/dwr/call/plaincall/UserActionAPI.getRecentlyGraveyardedContacts.dwr "http://www.jigsaw.com/dwr/call/plaincall/UserActionAPI.getRecentlyAddedContacts.dwr". "http://www.jigsaw.com/dwr/call/plaincall/UserActionAPI.getRecentlyTitleChangedContacts.dwr"

XHR完成装载：“http://www.jigsaw.com/dwr/interface/UserActionAPI.js”。 “http://www.jigsaw.com/dwr/call/plaincall/UserActionAPI.getMostPurchasedContacts.dwr”。 “http://www.jigsaw.com/dwr/call/plaincall/UserActionAPI.getRecentlyGraveyardedContacts.dwr”http://www.jigsaw.com/dwr/call/plaincall/UserActionAPI.getRecentlyAddedContacts.dwr“。”http：/ /www.jigsaw.com/dwr/call/plaincall/UserActionAPI.getRecentlyTitleChangedContacts.dwr”

Yay! Now you know where to get that data. Their scripts use POST HTTP request to the URLs above, so if you open them in your browser, you'll see various engine errors.

好极了！现在您知道从哪里获取数据了。他们的脚本对上面的URL使用POST HTTP请求，因此如果您在浏览器中打开它们，您将看到各种引擎错误。

When you sniff (via Web Inspector debugger, for example) their AJAX POST requests, you'll see the next body:

当您（例如通过Web Inspector调试器）嗅探他们的AJAX POST请求时，您将看到下一个正文：

"callCount=1 page=/showContactUpdateTab.xhtml?companyId=224230 httpSessionId=F5E7EC4A45DFCE87B969A9F4FA06C361 scriptSessionId=D020EFF4333283B907402687182D03E034 c0-scriptName=UserActionAPI c0-methodName=getRecentlyGraveyardedContacts c0-id=0 c0-param0=number:224230 c0-param1=boolean:false c0-param2=boolean:false batchId=1 "

“callCount = 1 page = / showContactUpdateTab.xhtml？companyId = 224230 httpSessionId = F5E7EC4A45DFCE87B969A9F4FA06C361 scriptSessionId = D020EFF4333283B907402687182D03E034 c0-scriptName = UserActionAPI c0-methodName = getRecentlyGraveyardedContacts c0-id = 0 c0-param0 = number：224230 c0-param1 = boolean：false c0 -param2 =布尔值：false batchId = 1“

I'm pretty sure, they're generating a bunch of security session IDs to avoid data miners. You may need to dive into their JavaScript codes to learn more about those generators.

我很确定，他们正在生成一堆安全会话ID以避免数据挖掘者。您可能需要深入了解其JavaScript代码以了解有关这些生成器的更多信息。

#2

Some applications have code in place to check that the client is a real AJAX client. They simply the check for the presence of the header X-Requested-With: XMLHttpRequest. So it's easy to circumvent:

某些应用程序具有代码来检查客户端是否是真正的AJAX客户端。他们只是检查标头X-Requested-With：XMLHttpRequest的存在。因此很容易规避：

curl -H 'X-Requested-With: XMLHttpRequest' ...

use HTTP::Request::Common;
GET $url, 'X-Requested-With' => 'XMLHttpRequest', ...

Of course, you might have to deal with the usual stuff, like required cookies (for the session), nonce parameters, the occasional complexity. Firebug or the like for other browsers will help you reverse-engineer the required headers and parameters.

当然，您可能必须处理通常的事情，例如所需的cookie（用于会话），nonce参数，偶尔的复杂性。其他浏览器的Firebug等将帮助您对所需的标头和参数进行反向工程。

#1