好友
阅读权限10
听众
最后登录1970-1-1
|
学习中,未加多线程,由于网站、图片大小等因素导致下载缓慢
望各位大神提出建议并加以修善
网站主页及其爬取页如下,可自行修改代码中的url变量已及爬取页数
源码如下
[Python] 纯文本查看 复制代码 import requests
from lxml import etree
for x in range(2,5): #爬取页数
n=0 #name initialize of picture
url = f"https://wallhaven.cc/toplist?page={x}"
headers1 = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.0.0 Safari/537.36'
}
response = requests.get(url=url,headers=headers1)
#print(response.text)
tree = etree.HTML(response.text)
list = tree.xpath('//*[@id="thumbs"]/section[1]/ul/li/figure/a/@href')
response.close()
def repage(url1,n):
response1 = requests.get(url = url1,headers=headers1)
tree1 = etree.HTML(response1.text)
src_url = tree1.xpath('//*[@id="wallpaper"]/@src')
for j in src_url:
response_img = requests.get(j)
print(response_img)
with open('img//'+'wallhaven//'+str(n)+'.jpg', mode="wb") as f:
f.write(response_img.content) #二进制存入图片
print("Done!")
response_img.close()
response1.close()
for i in list:
n=n+1
repage(i,n) #本页图片获取
|
免费评分
-
查看全部评分
|
发帖前要善用【论坛搜索】功能,那里可能会有你要找的答案或者已经有人发布过相同内容了,请勿重复发帖。 |
|
|
|
|