吾爱破解 - 52pojie.cn

 找回密码
 注册[Register]

QQ登录

只需一步,快速开始

查看: 5964|回复: 3
收起左侧

[Python 转载] 爬取昵图网 设计/摄影图库 图片

[复制链接]
LEOIRON 发表于 2021-3-21 13:10
本帖最后由 LEOIRON 于 2021-4-9 12:22 编辑

新手照猫画虎写了一个爬昵图网图库照片的小爬虫,爬下来有水印

52PJ

52PJ
[Python] 纯文本查看 复制代码
01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
import requests
from lxml import etree
import os
 
url1 = "http://www.nipic.com/photo/shengwu/yulei/index.html"#目标设计/摄影图库地址
data1 = requests.get(url1).text
s1 = etree.HTML(data1)
pics_href = s1.xpath("/html/body/div[@class='new-layout-width mbt-area clearfix layout-width']/div[@class='fl new-search-main']/div[@class='new-search-result overflow-hidden']/ul/li/a/@href")
 
i = 1
for pic_href in pics_href:
    try:
        url2 = pic_href
        data2 = requests.get(url2).text
        s2 = etree.HTML(data2)
        pic_list = s2.xpath("//*[@id='J_worksImg']/@src")
        pic_url = pic_list[0]
        root = "f://Pyspider//pics//"#可更改为自己的路径
        path = root + pic_url.split('/')[-1]
        if not os.path.exists(root):
            os.mkdir(root)
        if not os.path.exists(path):
            r = requests.get(pic_url)
            with open(path,'wb') as f:
                f.write(r.content)
                f.close()
                print('第{}张图片保存成功'.format(i))
                i += 1
        else:
            print('此图片已存在')
    except:
        print('图片地址异常')

进入这样的图库复制地址

进入这样的图库复制地址

运行中

运行中

发帖前要善用论坛搜索功能,那里可能会有你要找的答案或者已经有人发布过相同内容了,请勿重复发帖。

糊涂虫晓晓 发表于 2021-3-24 20:24
经测试,全是图片地址异常
 楼主| LEOIRON 发表于 2021-4-9 11:01
糊涂虫晓晓 发表于 2021-3-24 20:24
经测试,全是图片地址异常

看了看是copy时最后的代码缩进错了😂,改了之后应该可以了,同时注意图库地址正确
糊涂虫晓晓 发表于 2022-2-17 15:38
试试这个代码,把你的代码稍微修改了一下,不报错了,只能下载第一页,没有翻页功能。
[Python] 纯文本查看 复制代码
01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
import requests
from lxml import html
import os
 
url1 = "https://www.nipic.com/design/ziran/fengguang/index.html"  # 目标设计/摄影图库地址
data1 = requests.get(url1).text
etree=html.etree
s1 = etree.HTML(data1)
pics_href = s1.xpath(
    "/html/body/div[@class='new-layout-width mbt-area clearfix layout-width']/div[@class='fl new-search-main']/div[@class='new-search-result overflow-hidden']/ul/li/a/@href")
i = 1
for pic_href in pics_href:
    try:
        url2 = "https:"+pic_href
        data2 = requests.get(url2).text
        s2 = etree.HTML(data2)
        pic_list = s2.xpath("//*[@id='J_worksImg']/@src")
        pic_url = pic_list[0]
        root = "f:/Pyspider/pics/"  # 可更改为自己的路径
        path = root + pic_url.split('/')[-1]
        if not os.path.exists(root):
            os.mkdir(root)
        if not os.path.exists(path):
            r = requests.get("https:"+pic_url)
            with open(path, 'wb') as f:
                f.write(r.content)
                f.close()
                print('第{}张图片保存成功'.format(i))
                i += 1
        else:
            print('此图片已存在')
    except:
        continue
您需要登录后才可以回帖 登录 | 注册[Register]

本版积分规则

返回列表

RSS订阅|小黑屋|处罚记录|联系我们|吾爱破解 - LCG - LSG ( 京ICP备16042023号 | 京公网安备 11010502030087号 )

GMT+8, 2025-4-10 01:17

Powered by Discuz!

Copyright © 2001-2020, Tencent Cloud.

快速回复 返回顶部 返回列表