爬取昵图网设计/摄影图库图片

LEOIRON · 发表于 2021-3-21 13:10

本帖最后由 LEOIRON 于 2021-4-9 12:22 编辑

新手照猫画虎写了一个爬昵图网图库照片的小爬虫，爬下来有水印

52PJ

[Python] 纯文本查看 复制代码

01

02

03

04

05

06

07

08

09

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

import requests
from lxml import etree
import os
 
url1 = "http://www.nipic.com/photo/shengwu/yulei/index.html"#目标设计/摄影图库地址
data1 = requests.get(url1).text
s1 = etree.HTML(data1)
pics_href = s1.xpath("/html/body/div[@class='new-layout-width mbt-area clearfix layout-width']/div[@class='fl new-search-main']/div[@class='new-search-result overflow-hidden']/ul/li/a/@href")
 
i = 1
for pic_href in pics_href:
    try:
        url2 = pic_href
        data2 = requests.get(url2).text
        s2 = etree.HTML(data2)
        pic_list = s2.xpath("//*[@id='J_worksImg']/@src")
        pic_url = pic_list[0]
        root = "f://Pyspider//pics//"#可更改为自己的路径
        path = root + pic_url.split('/')[-1]
        if not os.path.exists(root):
            os.mkdir(root)
        if not os.path.exists(path):
            r = requests.get(pic_url)
            with open(path,'wb') as f:
                f.write(r.content)
                f.close()
                print('第{}张图片保存成功'.format(i))
                i += 1
        else:
            print('此图片已存在')
    except:
        print('图片地址异常')

糊涂虫晓晓 · 发表于 2021-3-24 20:24

经测试，全是图片地址异常

LEOIRON · 发表于 2021-4-9 11:01

糊涂虫晓晓发表于 2021-3-24 20:24
经测试，全是图片地址异常

看了看是copy时最后的代码缩进错了😂，改了之后应该可以了，同时注意图库地址正确

糊涂虫晓晓 · 发表于 2022-2-17 15:38

试试这个代码，把你的代码稍微修改了一下，不报错了，只能下载第一页，没有翻页功能。

[Python] 纯文本查看 复制代码

01

02

03

04

05

06

07

08

09

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

import requests
from lxml import html
import os
 
url1 = "https://www.nipic.com/design/ziran/fengguang/index.html"  # 目标设计/摄影图库地址
data1 = requests.get(url1).text
etree=html.etree
s1 = etree.HTML(data1)
pics_href = s1.xpath(
    "/html/body/div[@class='new-layout-width mbt-area clearfix layout-width']/div[@class='fl new-search-main']/div[@class='new-search-result overflow-hidden']/ul/li/a/@href")
i = 1
for pic_href in pics_href:
    try:
        url2 = "https:"+pic_href
        data2 = requests.get(url2).text
        s2 = etree.HTML(data2)
        pic_list = s2.xpath("//*[@id='J_worksImg']/@src")
        pic_url = pic_list[0]
        root = "f:/Pyspider/pics/"  # 可更改为自己的路径
        path = root + pic_url.split('/')[-1]
        if not os.path.exists(root):
            os.mkdir(root)
        if not os.path.exists(path):
            r = requests.get("https:"+pic_url)
            with open(path, 'wb') as f:
                f.write(r.content)
                f.close()
                print('第{}张图片保存成功'.format(i))
                i += 1
        else:
            print('此图片已存在')
    except:
        continue

帐号		自动登录	找回密码
密码			注册[Register]

[Python 转载] 爬取昵图网 设计/摄影图库 图片

[Python 转载] 爬取昵图网设计/摄影图库图片