本帖最后由 应真先生 于 2019-8-20 02:32 编辑
爬一个汉服论坛所有帖子里面的图片,标题保存为title,帖子网址保存为image_url,ImagesPipeline会读取图片链接,但是item队列里面好像没有读取到title,自己改了file_path好像不能读取到title,老哥们帮忙看看我问题出在哪里?
代码链接: https://pan.baidu.com/s/1IjwVfzFJq5bR0VBWKtZlfw 提取码: xf71 复制这段内容后打开百度网盘手机App,操作更方便哦
这个是pipeline
[Python] 纯文本查看 复制代码 # -*- coding: utf-8 -*-
import hashlib
from scrapy.pipelines.images import ImagesPipeline
from scrapy import Request
# Define your item pipelines here
#
# Don't forget to add your pipeline to the ITEM_PIPELINES setting
# See: https://docs.scrapy.org/en/latest/topics/item-pipeline.html
from scrapy.utils.python import to_bytes
class HanfuPipeline(object):
def process_item(self, item, spider):
return item
class A52HanfuPipeline(ImagesPipeline):
def get_media_requests(self, item, info):
# 处理对象:每组item中的每张图片
for image_url in item.get('image_url'):
yield Request(image_url,meta={'item':item})
def file_path(self, request, response=None, info=None):
item = request.meta['item']
print('item'+item)
title = item['title'][0]
print('标题'+title)
title = title.replace(" ","")
end = request.url.split('/')[-1].split('.')[-1]
image_guid = hashlib.sha1(to_bytes(request.url)).hexdigest()
image_name = "%s%s%s"%(image_guid,'.',end)
file_name = u'full/{0}/{1}'.format(title,image_name)
print(file_name)
return file_name
|