使用php实现爬虫程序套取网站的图片实例

<?php

//去采集a67 图片 网站链接 http://www.xiamov.com/list/1/p.2  你也可以采集其他网站的图片

//创建链接 dedecms--a67

//设置执行不超时

set_time_limit(0);
//fsockopen() 函数 第一个参数是指主机 第二个参数指的是端口号 一般我们默认为80端口 第三个参数是错误编号  第四个参数是返回错误的字符串  第五个参数指的是链接时长 我上面写的是30秒 如果30秒没有链接到对方主机 则会返回链接失败

$conn=fsockopen("www.xiamov.com",80,$errno,$errstr,30);

if(!$conn){

	die("链接失败");

}

//说话  协议

$httpstr="GET /list/1/p.2 HTTP/1.1\r\n";

$httpstr.="Host: www.xiamov.com\r\n";

$httpstr.="Connection: Close\r\n\r\n";

//发送http请求 对方就应该有回应

fwrite($conn,$httpstr,strlen($httpstr));

//看看a67 网站会送的是什么东西

$res="";

while(!feof($conn)){

	$res.=fread($conn,1024);

}

fclose($conn);

//file_put_contents("D:/1.txt", $res);

//echo $res;

//我要找到该页面的图片资源 img src

//<img alt="邪恶小分队下载" title="邪恶小分队下载" src="http://img.xiamov.com/vod/2016-07/578e2cc52da30.jpg">

$reg1='/<img alt="[^"]*" title="[^"]*" src="([^"]*)"/i'; //这个是匹配上面的<img 的正则表达式 [^"]* 这个表示的是 只要不是“ 就不断匹配 这个很常用 可以学习

preg_match_all($reg1,$res,$arr1);

//把$arr1[1]遍历  并取出各个图片的uri

foreach($arr1[1] as $imgurl){

	//echo"<br/>".$imgurl;

	$imguri=str_replace("http://img.xiamov.com","",$imgurl);

	//echo"<br/>".$imguri;

	//再次发出请求 要图片  注意 这里的主机发生变化了 主机变化为img.xiamov.com

	$conn=fsockopen("img.xiamov.com",80,$errno,$errstr,30);

	//组织httpstr

	$httpstr="GET $imguri HTTP/1.1\r\n";

$httpstr.="Host: img.xiamov.com\r\n";

$httpstr.="Connection: Close\r\n\r\n";

//发出请求 img

fwrite($conn,$httpstr,strlen($httpstr));

$res2="";

while(!feof($conn)){

	$res2.=fread($conn,1024);

}

fclose($conn);

//看看$res2是什么

//file_put_contents("D:/1.txt", $res2);

//exit();

//我们把图片的数据从$res2截取出来 保存成图片

$pos=strpos($res2,"\r\n\r\n");

$imgres=substr($res2,$pos+4);  //后面加的数字很重要 这个数字4 是本人不断测试才得到的

$fileinfo=pathinfo($imguri);

file_put_contents("./myimages/".$fileinfo['basename'], $imgres);

//die;

sleep(2);  //我们可以使用sleep函数 来延迟发送请求

}

die("成功取回图片");

　　以上是采取A67电影网中电影列表的部分图片通过以上的爬取程序我们就可以爬取任何网站的图片了

秒客网

使用php实现爬虫程序套取网站的图片实例

相关文章

使用php实现爬虫程序 套取网站的图片实例

相关文章

使用php实现爬虫程序套取网站的图片实例