好友
阅读权限10
听众
最后登录1970-1-1
|
25吾爱币
爬取wallhaven图片时,在代码第23行总会错误,显示为IndexError: list index out of range;经过排查发现似乎是获取的是空值导致了这个原因,尝试过添加cookies也没有解决.求助各位大佬[Python] 纯文本查看 复制代码 # -*- coding:utf-8 -*-
import requests
import time
from lxml import etree
li_list = []
start_time = time.time()
for page in range(1, 23):
url = f'https://wallhaven.cc/search?q=car&categories=111&purity=111&sorting=relevance&order=desc&page={page}'
headers = {
'User-Agent': 'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.107 Mobile Safari/537.36'
}
page_text = requests.get(url=url, headers=headers).text
tree = etree.HTML(page_text)
now_li_list = tree.xpath('//a[@class="preview"]/@href')
print(f'第{page}页获取到{len(now_li_list)}张图片!!')
print('正在获取详情页....')
for image_url in now_li_list:
detail_page = requests.get(url=image_url, headers=headers).text
detail_tree = etree.HTML(detail_page)
image_src_url = detail_tree.xpath('//img[@id="wallpaper"]/@src')[0]
li_list.append(image_src_url)
print(f'添加{image_url[-10:]}完成..')
time.sleep(0.2)
end_time = time.time()
print(f'总共获取到{len(li_list)}张图片....')
print(f'耗时{end_time - start_time}秒...')
with open('url.txt', 'a', encoding='utf-8') as f:
for url in li_list:
f.write(url + '\n')
print('存储完成!')
|
最佳答案
查看完整内容
每次请求打印状态码出来,text打印出来,设置timeout, 实在不行在for循环里面try一下,看下错误再分析分析
|
发帖前要善用【论坛搜索】功能,那里可能会有你要找的答案或者已经有人发布过相同内容了,请勿重复发帖。 |
|
|
|
|