吾爱破解 - 52pojie.cn

 找回密码
 注册[Register]

QQ登录

只需一步,快速开始

查看: 4192|回复: 33
收起左侧

[Python 原创] Python爬取pixiv[P站]每日插画排行榜(二次修改)

[复制链接]
judgecx 发表于 2020-7-26 15:01
本帖最后由 judgecx 于 2020-7-26 19:05 编辑

别问网站为什么打不开 麻烦请科-学-上网
优化了一下下载的图片命名1-500序号
[Asm] 纯文本查看 复制代码
import requests
for x in range(1,11):
#爬取图片接口链接
    url = 'https://www.pixiv.net/ranking.php?mode=daily&content=illust&p='+str(x)
    for i in range(0,50):
#爬取原图链接
        rg = requests.get(url)
        in_url = 'https://www.pixiv.net'+str(rg.text.split("<div class=\"ranking-image-item\"><a href=\"")[i+1].split("\"")[0])
        img_id = str(in_url.split("/artworks/")[1])
        img_rank = str(rg.text.split("data-rank-text=\"")[i+1].split("\"")[0])
        rgi = requests.get(in_url)
        img_url = str(rgi.text.split("original\":\"")[1].split("\"")[0])
#伪造请求绕过限制
        user = { 'Referer': in_url }
        rgid = requests.get(img_url,headers=user)
#下载图片
        img = rgid.content
        img_num = ((x-1)*50)+i+1
        with open('./'+str(img_num)+'.'+str(img_url.split(".")[3]),'wb') as f:
            f.write(img)
        print(img_url)

第一次发的那个限制五十张现在不会啦  而且还是原图
@kof21411 这位老哥用的是接口 图片不是保存的那么大,小了 所以发布现在这个
这个爬取的是每日排行榜的插画

接口版@kof21411  这位老哥贡献的
[Asm] 纯文本查看 复制代码
import requests
import json
 
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.119 Safari/537.36'
}
 
pages = 1
img_num = 0
while (True):
    #爬取图片接口链接
    url = 'https://www.pixiv.net/ranking.php?mode=daily&content=illust&p=%s&format=json' % pages
    rsp = requests.get(url=url, headers=headers,timeout=60, verify=False).text
    rspJson = json.loads(rsp)
    if 'error' not in rspJson:
        for content in rspJson['contents']:
            #爬取原图链接
            img_url = content['url']
            img_url = img_url.replace('c/240x480/img-master','img-original')
            img_url = img_url.replace('_master1200','')
            #伪造请求绕过限制
            user = {
                'Referer': 'https://www.pixiv.net/artworks/'+str(content['illust_id'])
            }
            rgid=requests.get(img_url,headers=user)
            print(img_url)
            #下载图片
            img=rgid.content
            img_type = str(img_url.split(".")[-1])
            img_num = img_num+1
            with open('./'+str(img_num)+'.'+img_type,'wb') as f:
                f.write(img)
    else:
        # print(rspJson['error'])
        break
    pages = pages + 1

免费评分

参与人数 8吾爱币 +9 热心值 +6 收起 理由
a1554688500 + 1 + 1 谢谢@Thanks!
初见悲风 + 2 + 1 谢谢@Thanks!
jingmiku + 1 + 1 热心回复!
淡DSJ然 + 1 + 1 非常好
Ldw + 1 + 1 热心回复!
luofengya + 1 + 1 谢谢@Thanks!
ruyi.J + 1 谢谢@Thanks!
qq9953 + 1 谢谢@Thanks!

查看全部评分

发帖前要善用论坛搜索功能,那里可能会有你要找的答案或者已经有人发布过相同内容了,请勿重复发帖。

初见悲风 发表于 2020-7-26 18:00
运行一段时间后,出错了楼主看一看
Traceback (most recent call last):
  File "D:\Python\install\lib\site-packages\urllib3\connectionpool.py", line 662, in urlopen
    self._prepare_proxy(conn)
  File "D:\Python\install\lib\site-packages\urllib3\connectionpool.py", line 948, in _prepare_proxy
    conn.connect()
  File "D:\Python\install\lib\site-packages\urllib3\connection.py", line 394, in connect
    ssl_context=context,
  File "D:\Python\install\lib\site-packages\urllib3\util\ssl_.py", line 370, in ssl_wrap_socket
    return context.wrap_socket(sock, server_hostname=server_hostname)
  File "D:\Python\install\lib\ssl.py", line 423, in wrap_socket
    session=session
  File "D:\Python\install\lib\ssl.py", line 870, in _create
    self.do_handshake()
  File "D:\Python\install\lib\ssl.py", line 1139, in do_handshake
    self._sslobj.do_handshake()
OSError: [Errno 0] Error

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:\Python\install\lib\site-packages\requests\adapters.py", line 449, in send
    timeout=timeout
  File "D:\Python\install\lib\site-packages\urllib3\connectionpool.py", line 720, in urlopen
    method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
  File "D:\Python\install\lib\site-packages\urllib3\util\retry.py", line 436, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='i.pximg.net', port=443): Max retries exceeded with url: /img-original/img/2020/07/25/11/00/02/83206041_p0.png (Caused by ProxyError('Cannot connect to proxy.', OSError(0, 'Error')))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:/Python/Program/爬取pixiv.py", line 12, in <module>
    rgid = requests.get(img_url,headers=user)
  File "D:\Python\install\lib\site-packages\requests\api.py", line 75, in get
    return request('get', url, params=params, **kwargs)
  File "D:\Python\install\lib\site-packages\requests\api.py", line 60, in request
    return session.request(method=method, url=url, **kwargs)
  File "D:\Python\install\lib\site-packages\requests\sessions.py", line 533, in request
    resp = self.send(prep, **send_kwargs)
  File "D:\Python\install\lib\site-packages\requests\sessions.py", line 646, in send
    r = adapter.send(request, **kwargs)
  File "D:\Python\install\lib\site-packages\requests\adapters.py", line 510, in send
    raise ProxyError(e, request=request)
requests.exceptions.ProxyError: HTTPSConnectionPool(host='i.pximg.net', port=443): Max retries exceeded with url: /img-original/img/2020/07/25/11/00/02/83206041_p0.png (Caused by ProxyError('Cannot connect to proxy.', OSError(0, 'Error')))
 楼主| judgecx 发表于 2020-7-26 19:04
初见悲风 发表于 2020-7-26 18:00
运行一段时间后,出错了楼主看一看
Traceback (most recent call last):
  File "D:\Python\install\lib\ ...

我优化了下 你重新复制运行一遍吧 有接口版的和链接的
马可菠萝蜜 发表于 2020-7-26 15:13
看见p站不管三七二十一我就滚进来了

额。。。。。都是些优美的插画
 楼主| judgecx 发表于 2020-7-26 15:15
马可菠萝蜜 发表于 2020-7-26 15:13
看见p站不管三七二十一我就滚进来了

额。。。。。都是些优美的插画

哈哈 懂懂
luzhiyao 发表于 2020-7-26 15:23
能否发一下使用方法?有台大硬盘服务器空闲,正好拿来
 楼主| judgecx 发表于 2020-7-26 15:34
luzhiyao 发表于 2020-7-26 15:23
能否发一下使用方法?有台大硬盘服务器空闲,正好拿来

  直接挂代{过}{滤}理运行就完事了啊 没什么复杂的
ericcch 发表于 2020-7-26 15:40
不错不错,刚好学习了
龗魂 发表于 2020-7-26 15:47
我觉得图还是得保存原图比较好
烟火小兽 发表于 2020-7-26 15:52
马可菠萝蜜 发表于 2020-7-26 15:13
看见p站不管三七二十一我就滚进来了

额。。。。。都是些优美的插画

好家伙,我也进来了
zyhxhw 发表于 2020-7-26 16:02
网站怎么打不开呢
zhou0v0 发表于 2020-7-26 16:02
不错不错.收藏
您需要登录后才可以回帖 登录 | 注册[Register]

本版积分规则

返回列表

RSS订阅|小黑屋|处罚记录|联系我们|吾爱破解 - LCG - LSG ( 京ICP备16042023号 | 京公网安备 11010502030087号 )

GMT+8, 2024-11-26 01:56

Powered by Discuz!

Copyright © 2001-2020, Tencent Cloud.

快速回复 返回顶部 返回列表