Scraping AJAX requests with random strings appended to URL

时间:2022-10-07 16:41:07

I am trying to monitor cricket scores on scorespro/cricket by making browser AJAX requests. Analysing the network traffic in Google Chrome, I can see my browser making requests of the form:

我试图通过制作浏览器AJAX请求来监控得分/板球的板球得分。通过分析Google Chrome中的网络流量,我可以看到我的浏览器请求表单:

http://www.scorespro.com/cricket/ajax.php?g_sort=league&date=2014-10-02&mut=1412265716&sut=0&(some_random_number)

When I click on the response IN Google Chrome, I can see the data that has been received. However when I try to request the request URL myself, no data is received. Why is that happening (is it to do with the random string) and how can I get around it?

当我点击Google Chrome中的回复时,我可以看到已收到的数据。但是,当我尝试自己请求请求URL时,不会收到任何数据。为什么会发生这种情况(是否与随机字符串有关)以及如何解决这个问题?

1 个解决方案

#1


0  

Is doing this from javascript a requirement? Have you considered abstracting the requests by calling a script on a server you control?

从javascript做这个要求吗?您是否考虑通过在您控制的服务器上调用脚本来抽象请求?

For example on your server you could have a PHP script called, for example, "grabber.php"

例如,在您的服务器上,您可以调用一个PHP脚本,例如“grabber.php”

<?php
$r = '0.' . rand(1000000000000000, 9000000000000000);

$url = 'http://www.scorespro.com/cricket/ajax.php?g_sort=league&date=2014-10-03&mut=1412328280&sut=0&' . $r;
$useragent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:32.0) Gecko/20100101 Firefox/32.0';
$referer = 'http://www.scorespro.com/cricket/';

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_REFERER, $referer);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_USERAGENT, $useragent);
curl_setopt($ch, CURLINFO_HEADER_OUT, true);
curl_setopt($ch, CURLOPT_BINARYTRANSFER, true);
curl_setopt($ch, CURLOPT_COOKIEFILE, '/tmp/cookie.txt');
curl_setopt($ch, CURLOPT_COOKIEJAR, '/tmp/cookie.txt');
$response = curl_exec($ch);

curl_close($ch);

$data = array('payload' => $response);

echo json_encode($data);

exit();

?>

You could then call that page via a simple ajax request :

然后,您可以通过简单的ajax请求调用该页面:

<script src="//code.jquery.com/jquery-1.11.0.min.js"></script>

<script type="text/javascript">
$.ajax({
    url: 'http://yourserver.com/grabber.php',
    dataType: 'json',
    type: 'GET',
    success: function(data, textStatus, jqXHR){
        if (data['payload']){
            alert(data['payload']);
        } else {
            alert ('oops');
        }
    }
});

Of course if you went with this approach you'd have to decide how to get the URL's you need to request from the cricket site to the grabber script (i.e. pass them from javascript or get them directly from within the PHP script depending on your requirements)

当然,如果您采用这种方法,您必须决定如何获取您需要从板球站点请求到抓取器脚本的URL(即从javascript传递它们或根据您的要求直接从PHP脚本中获取它们) )

#1


0  

Is doing this from javascript a requirement? Have you considered abstracting the requests by calling a script on a server you control?

从javascript做这个要求吗?您是否考虑通过在您控制的服务器上调用脚本来抽象请求?

For example on your server you could have a PHP script called, for example, "grabber.php"

例如,在您的服务器上,您可以调用一个PHP脚本,例如“grabber.php”

<?php
$r = '0.' . rand(1000000000000000, 9000000000000000);

$url = 'http://www.scorespro.com/cricket/ajax.php?g_sort=league&date=2014-10-03&mut=1412328280&sut=0&' . $r;
$useragent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:32.0) Gecko/20100101 Firefox/32.0';
$referer = 'http://www.scorespro.com/cricket/';

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_REFERER, $referer);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_USERAGENT, $useragent);
curl_setopt($ch, CURLINFO_HEADER_OUT, true);
curl_setopt($ch, CURLOPT_BINARYTRANSFER, true);
curl_setopt($ch, CURLOPT_COOKIEFILE, '/tmp/cookie.txt');
curl_setopt($ch, CURLOPT_COOKIEJAR, '/tmp/cookie.txt');
$response = curl_exec($ch);

curl_close($ch);

$data = array('payload' => $response);

echo json_encode($data);

exit();

?>

You could then call that page via a simple ajax request :

然后,您可以通过简单的ajax请求调用该页面:

<script src="//code.jquery.com/jquery-1.11.0.min.js"></script>

<script type="text/javascript">
$.ajax({
    url: 'http://yourserver.com/grabber.php',
    dataType: 'json',
    type: 'GET',
    success: function(data, textStatus, jqXHR){
        if (data['payload']){
            alert(data['payload']);
        } else {
            alert ('oops');
        }
    }
});

Of course if you went with this approach you'd have to decide how to get the URL's you need to request from the cricket site to the grabber script (i.e. pass them from javascript or get them directly from within the PHP script depending on your requirements)

当然,如果您采用这种方法,您必须决定如何获取您需要从板球站点请求到抓取器脚本的URL(即从javascript传递它们或根据您的要求直接从PHP脚本中获取它们) )