Python爬取制服买家秀小姐姐

modys 发表于 2021-5-8 15:50

这个网站大概是1个月之前在水漫金山某位大神发出来的，当天我就写了爬虫，今天没事又去爬一下看更新了没{:1_918:}，发现是空的，网站内容更改了。
所以刚刚又重新用scrapy写了一个整站爬虫，但还是不发出来，省的各位把网站给爬死了{:1_925:}。
复制出来改成单分类爬虫，剩下的想爬取，自己更改！！！

# from ip_proxy import ips
import requests, os, re, random
from lxml import etree

# ip_add = random.choice(ips())
if not os.path.exists('./zhifu'):
os.mkdir('./zhifu')

headers = {
'user-agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'
}
for i in range(1,4):
url = 'https://www.ikmjx.com/index.php?g=portal&m=list&a=index&id=3&p=' + str(i)
r = requests.get(url=url, headers=headers).text
tree = etree.HTML(r)
div_list = tree.xpath('/html/body/main/div/div/div')
for li in div_list:
   a = 0
   src = 'https://www.ikmjx.com' + li.xpath('./div/a/@href')
   titles = li.xpath('./div/a/@title')
   title = titles.replace('?','')
   req = requests.get(url=src, headers=headers).text
   tree1 = etree.HTML(req)
   div1_list = tree1.xpath('/html/body/main/div/div/div/div/p')
   for p in div1_list:
         src_path = p.xpath('./img/@src')
         # print(src_path)
         for img in src_path:
            a = a+1
            img_data = requests.get(url=img, headers=headers).content
            img_path = './zhifu/' + title + '_' + str(a) + '.jpg'
            with open(img_path, 'wb') as fp:
               fp.write(img_data)
               # print(img_data, '下载完成！！！')

anandyuan 发表于 2021-5-8 16:09

这才是学习python的动力{:1_927:}

七秒的记忆 发表于 2021-5-8 16:06

学习学习

chamner123 发表于 2021-5-8 16:02

让我燃起学习Python的浓烈兴趣

mnbjkl1024 发表于 2021-5-8 15:57

{:1_890:}学习用非常不错，但是爬取这样的图片觉得无用

jay2020 发表于 2021-5-8 16:09

不要,影响我学习

lwz373146809 发表于 2021-5-8 16:11

不错啊，小姐姐挺好

相位猛冲 发表于 2021-5-8 16:18

发个成品

52changew 发表于 2021-5-8 16:27

来看看; 学习; 谢谢分享!

wapjwang 发表于 2021-5-8 16:53

不会使用

页: [1] 2 3

吾爱破解 - 52pojie.cn's Archiver

Python爬取制服买家秀小姐姐