Python爬取彼岸桌面美女系列4k壁纸

Riors7 · 发表于 2021-8-29 11:40

[Python] 纯文本查看 复制代码

import os
import re
import time
import requests
from bs4 import BeautifulSoup

n=1
headers = {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.159 Safari/537.36 Edg/92.0.902.78",

           "Connection":"close"}
k=input("你要爬取第几页")
url1=f'http://www.netbian.com/meinv/index_{k}.htm'
url2='http://www.netbian.com'


resp=requests.get(url1,headers=headers)
resp.encoding='gbk'
html=resp.text
main_page=BeautifulSoup(html,"html.parser")
alist=main_page.find("div",attrs={"class":"list"}).find_all("a",attrs={"target":"_blank"})

url=re.findall('<a href="(.*?)" title=".*?" target="_blank">',html) #从首页中获取子页面的路径
del(url[0])
url.pop()
j=0
for j in range(19):
    url3=url2+url[j]
    j+=1
    resp2 = requests.get(url3, headers=headers)
    resp2.encoding='gbk'
    html2=resp2.text
    child_page=BeautifulSoup(html2,"html.parser")
    clist=child_page.find("div",attrs={"class":"pic"}).find_all("img")
    for q in clist:
        q1=q.get("src") #获取下载链接
        if not os.path.exists('4k壁纸%s' % k):
           os.mkdir(f'./4k壁纸%s' % k)
        f = open(f'./4k壁纸%s/' % k + "pic_%s.jpg" % n, mode="wb")
        tu = requests.get(q1,headers=headers)
        tu.close()
        f.write(tu.content)
        time.sleep(1)
        print("下载了%s张壁纸" % n)
        n+=1

爬取彼岸桌面美女系列指定页图片
新手写的第一个爬虫，不足之处可以指出，感谢

lihu5841314 · 发表于 2021-8-29 14:51

貌似爬不了4K吧只能爬一般的图片

sikro · 发表于 2021-8-29 13:43

有个小bug，页数输入1，或很大的数的时候会报错，帮你修改了一下

[Python] 纯文本查看 复制代码

import os
import re
import time
import requests
from bs4 import BeautifulSoup
 
n=1
headers = {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.159 Safari/537.36 Edg/92.0.902.78","Connection":"close"}
url2='http://www.netbian.com'

while True:
    k=input("你要爬取第几页")
    if int(k)>1:
        url1=f'http://www.netbian.com/meinv/index_{k}.htm'
    else:
        url1=f'http://www.netbian.com/meinv/index.htm'
    try:
        resp=requests.get(url1,headers=headers)
        resp.encoding='gbk'
        html=resp.text
        main_page=BeautifulSoup(html,"html.parser")
        alist=main_page.find("div",attrs={"class":"list"}).find_all("a",attrs={"target":"_blank"})
        break
    except:
        print("页数过大，请重新输入：")

 
url=re.findall('<a href="(.*?)" title=".*?" target="_blank">',html) #从首页中获取子页面的路径
del(url[0])
url.pop()
j=0
for j in range(19):
    url3=url2+url[j]
    j+=1
    resp2 = requests.get(url3, headers=headers)
    resp2.encoding='gbk'
    html2=resp2.text
    child_page=BeautifulSoup(html2,"html.parser")
    clist=child_page.find("div",attrs={"class":"pic"}).find_all("img")
    for q in clist:
        q1=q.get("src") #获取下载链接
        if not os.path.exists('4k壁纸%s' % k):
           os.mkdir(f'./4k壁纸%s' % k)
        f = open(f'./4k壁纸%s/' % k + "pic_%s.jpg" % n, mode="wb")
        tu = requests.get(q1,headers=headers)
        tu.close()
        f.write(tu.content)
        time.sleep(1)
        print("下载了%s张壁纸" % n)
        n+=1

orb001 · 发表于 2021-8-29 12:29

谢谢分享

HUHU666 · 发表于 2021-8-29 12:38

怎么使用啊，大神，我已经装了anaconda和python！

南归不NG · 发表于 2021-8-29 13:18

HUHU666 发表于 2021-8-29 12:38
怎么使用啊，大神，我已经装了anaconda和python！

把他import导入的第三方包用pip install 下载,可以百度pip安装第三方包
然后在cmd中python xx.py (xx为你代码文件名)

HUHU666 · 发表于 2021-8-29 13:50

南归不NG 发表于 2021-8-29 13:18
把他import导入的第三方包用pip install 下载,可以百度pip安装第三方包
然后在cmd中python xx.py (xx ...

还是不会，大佬请加下具体步骤，另外，我电脑安装了anaconda和python，但是另存为**.py文件的时候，图标是白色的，不是python图标~

sikro · 发表于 2021-8-29 14:00

HUHU666 发表于 2021-8-29 13:50
还是不会，大佬请加下具体步骤，另外，我电脑安装了anaconda和python，但是另存为**.py文件的时候，图标 ...

不用管图标，直接在终端里运行 python **.py 就行了

Riors7 · 发表于 2021-8-29 14:28

sikro 发表于 2021-8-29 13:43
有个小bug，页数输入1，或很大的数的时候会报错，帮你修改了一下

[mw_shl_code=python,true]import os

非常感谢！

Riors7 · 发表于 2021-8-29 14:30

sikro 发表于 2021-8-29 13:43
有个小bug，页数输入1，或很大的数的时候会报错，帮你修改了一下

[mw_shl_code=python,true]import os

非常感谢！

帐号		自动登录	找回密码
密码			注册[Register]

[Python 转载] Python爬取彼岸桌面美女系列4k壁纸

免费评分