【原创】【Python】小白学习Python之beautifulsuop爬取妹子图

执念i_ 发表于 2018-6-27 18:59

学习python中，今天学了beautifulsoup库，很实用的一个库，感觉比正则好很多，一个find_all就直接能取到关键点，
爬妹子图给我我学习的动力简直不要太强！！！
嘻嘻，今天的成果分享给大家

妹子图主站#http://www.meizitu.com/

交代环境：
python3.6
用到的模块：
BeautifulSoup库
requests库
os库

下面是代码：
from bs4 import BeautifulSoup
import requests
import os

#获取图片链接
def get_img_url(folders,img):
headers = {
   'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) '
                  'AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.87 Safari/537.36'
}
req = requests.get(img,headers=headers).content#请求组图链接
soup = BeautifulSoup(req, 'html5lib')#beautsou解析
soup = soup.find_all(name='div',attrs={'id':'picture'})#查找关键字段
for m in soup:#遍历出图片链接和图片名
   m = m.find_all('img')
   for i in m:
         name = i['alt']
         src = i['src']
         folder = folders
         download_img(src,name,folder)

#下载图片
def download_img(img_url,img_name,foler):
headers = {
   'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) '
                  'AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.87 Safari/537.36'
}
file = requests.get(img_url,headers = headers)#请求图片链接
if not os.path.exists('imagge' + '/' + str(foler)):# 检测是否有image目录没有则创建
   os.makedirs('imagge' + '/' + str(foler))
filename = 'imagge'+'/'+str(foler)+'/'+str(img_name)+'.jpg'#拼接图片名
fp = open(filename,'wb')#打开文件
print(filename)
fp.write(file.content)#写入文件
fp.close()

#访问主页面获取组图链接
def get_url(page):
headers = {
   'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) '
                  'AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.87 Safari/537.36'
}
url = 'http://www.meizitu.com/a/more_'+str(page)+'.html'
reponse = requests.get(url, headers=headers).content
soup = BeautifulSoup(reponse, 'html5lib')
soup = soup.find_all(name='div', attrs={"class":"con"})
for i in soup:
   m = i.find_all('h3')
   for n in m:
         name1 = n.a.string
         html1 = n.a['href']
         get_img_url(name1,html1)

#运行
get_url(1)#这里的数字是第几页，想爬那一页这里的数字改成几就行。。。

源码下载：https://www.lanzouj.com/i1aw32d
tip:评分不要钱，您的评分是我最大的动力:loveliness:

lilihuakai 发表于 2018-7-30 22:25

、林缺发表于 2018-7-1 23:28
bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: html5lib. Do you...

需要安装html5lib：pip install html5lib

、林缺 发表于 2018-7-1 23:28

bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: html5lib. Do you need to install a parser library?

Xw丶小威 发表于 2018-6-27 19:08

感觉看不到if__name__=="__main__":
就很难受

流言发表于 2018-6-27 19:12

还不如弄个聚合图片的插件

13169456869 发表于 2018-6-27 19:14

感谢分享

汆肉米线 发表于 2018-6-27 19:16

礼貌性回复

懿蓑烟雨 发表于 2018-6-27 19:38

谢谢楼主

舰长大人 发表于 2018-6-27 20:08

这个。。。。。。。看起来很不错啊!(手动滑稽）

zuiai125520 发表于 2018-6-27 20:25

不错哈也在学习python 但是最近考试没时间

jshon 发表于 2018-6-28 00:25

感谢楼主分享！

eniac 发表于 2018-6-28 12:55

爬妹子图，这事最有动力了。觉得还可以爬一些其他的图，用处很大很大。谢谢楼主！

页: [1] 2 3 4

吾爱破解 - 52pojie.cn's Archiver

【原创】【Python】小白学习Python之beautifulsuop爬取妹子图