I've little bug with scrapy and Pillow. Know they've many "same" question but I try all I find and it's not works..
我对scrapy和Pillow有点小错误。知道他们有很多“相同”的问题,但我尝试了所有我找到的并且它不起作用..
I use scrapy to parse many website, more than 100 000 webpages. I've created a pipeline that define if page contains image, and if, it download picture and create thumbail on same path. Use it because if creation of thumbail fail, I've "big" version of image.
我使用scrapy来解析许多网站,超过10万个网页。我已经创建了一个管道,用于定义页面是否包含图像,如果,它下载图片并在同一路径上创建缩略图。使用它,因为如果创建缩略图失败,我就是“大”版本的图像。
Here some code
这里有一些代码
from PIL import Image
from slugify import slugify
class DownloadImageOnDisk( object ):
def process_item( self, item, spider ):
try:
# If image on page
if item[ 'image' ]:
img = item[ 'image' ]
# Get extension of image
ext = img.split( '.' )
ext = ext[ -1 ].split('?')
ext = ext[0]
key = self.remove_accents( item[ 'imagetitle' ] ).encode( 'utf-8', 'replace' )
path = settings[ 'IMG_PATH' ] + item[ 'website' ] + '/' + key + '.' + ext
# Create dir
if not os.path.exists( settings['IMG_PATH'] + item['website'] ):
os.makedirs( settings[ 'IMG_PATH' ] + item[ 'website' ] )
# Check if image not already exist
if not os.path.isfile( path ):
# Download big image
urllib.urlretrieve( img, path )
if os.path.isfile( path ):
# Create thumb
self.optimize_image( path )
item[ 'image' ] = item[ 'website' ] + '/' + key + '.' + ext
return item
except Exception as exc:
pass
# Slugify path
def remove_accents( self, input_str ):
try:
return slugify( input_str )
except Exception as exc:
raise DropItem( exc )
# Create thumb
def optimize_image( self, path ):
try:
image = Image.open( path )
image.thumbnail( ( 200,200 ), Image.ANTIALIAS )
image.save( path, optimize=True, quality=85 )
except IOError as exc:
raise DropItem( exc )
except Exception as exc:
raise DropItem( exc )
But sometimes, not regulary (one for 100 items I thinks) I've this error
但有时候,不是常规的(我认为100个项目之一)我有这个错误
cannot identify image file '/PATH/NAME.jpg'
On optimize_image function. When I check on disk I image exist, it already do.
关于optimize_image函数。当我检查磁盘时我存在图像,它已经存在了。
I really not understand..
我真的不明白..
I you've any suggestion.
我有什么建议。
Thanks in advance
提前致谢
1 个解决方案
#1
Not sure but it seems to be resolve with
不确定,但似乎是解决
import requests
import io
...
response = requests.get( img )
image = Image.open(io.BytesIO(response.content))
image.thumbnail( ( 200,200 ), Image.ANTIALIAS )
image.save( path, optimize=True, quality=85 )
I continue my test
我继续我的考试
#1
Not sure but it seems to be resolve with
不确定,但似乎是解决
import requests
import io
...
response = requests.get( img )
image = Image.open(io.BytesIO(response.content))
image.thumbnail( ( 200,200 ), Image.ANTIALIAS )
image.save( path, optimize=True, quality=85 )
I continue my test
我继续我的考试