先来欣赏一下网站下的小姐姐
首先声明,只是学习,并无恶意爬取
并且在只抱有学习的心态下只敲了爬取一页的代码,并且时间为每分钟请求一次图片链接
各位想要爬取多页,自行添加一个页数循环即可
想要加快速度更改time.sleep()即可,但并不建议
还有需要自行在项目内添加文件路径,代码中的为'img//'+'唯美女生//'
[Python] 纯文本查看 复制代码 import time
import requests
import re
url_web = "https://www.vmgirls.com/page/1/"
head = {
"user-agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36"
}
responce = requests.get(url = url_web,headers=head)
#print(responce.text)
url_obj = re.compile(r'<a class="media-content" target="_blank" href="(?P<url>.*?)" title=".*?" ',re.S)
list_url = url_obj.finditer(responce.text)
for i in list_url:
n = 0
url_type = i.group("url")
#print(url_type)
responce1 = requests.get(url_type)
print(responce1.text)
src1 = re.compile(r'<img alt="(?P<id>.*?)" src="(?P<url_src>.*?)" alt=""/></a>',re.S)
src2 = src1.finditer(responce1.text)
for y in src2:
id = y.group("id")
url_src = y.group("url_src")
responce2 = requests.get(url_src,headers=head)
with open('img//'+'唯美女生//'+id+str(n)+'.jpg',mode="wb") as f:
f.write(responce2.content)
n = n+1
print("over!",n)
time.sleep(60)
responce.close()
结果如下图所示
|