Python爬取彼岸桌面美女系列4k壁纸

Riors7 发表于 2021-8-29 11:40

import os
import re
import time
import requests
from bs4 import BeautifulSoup

n=1
headers = {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.159 Safari/537.36 Edg/92.0.902.78",

      "Connection":"close"}
k=input("你要爬取第几页")
url1=f'http://www.netbian.com/meinv/index_{k}.htm'
url2='http://www.netbian.com'

resp=requests.get(url1,headers=headers)
resp.encoding='gbk'
html=resp.text
main_page=BeautifulSoup(html,"html.parser")
alist=main_page.find("div",attrs={"class":"list"}).find_all("a",attrs={"target":"_blank"})

url=re.findall('<a href="(.*?)" title=".*?" target="_blank">',html) #从首页中获取子页面的路径
del(url)
url.pop()
j=0
for j in range(19):
url3=url2+url
j+=1
resp2 = requests.get(url3, headers=headers)
resp2.encoding='gbk'
html2=resp2.text
child_page=BeautifulSoup(html2,"html.parser")
clist=child_page.find("div",attrs={"class":"pic"}).find_all("img")
for q in clist:
   q1=q.get("src") #获取下载链接
   if not os.path.exists('4k壁纸%s' % k):
      os.mkdir(f'./4k壁纸%s' % k)
   f = open(f'./4k壁纸%s/' % k + "pic_%s.jpg" % n, mode="wb")
   tu = requests.get(q1,headers=headers)
   tu.close()
   f.write(tu.content)
   time.sleep(1)
   print("下载了%s张壁纸" % n)
   n+=1爬取彼岸桌面美女系列指定页图片
新手写的第一个爬虫，不足之处可以指出，感谢

lihu5841314 发表于 2021-8-29 14:51

貌似爬不了4K吧只能爬一般的图片

sikro 发表于 2021-8-29 13:43

有个小bug，页数输入1，或很大的数的时候会报错，帮你修改了一下

import os
import re
import time
import requests
from bs4 import BeautifulSoup

n=1
headers = {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.159 Safari/537.36 Edg/92.0.902.78","Connection":"close"}
url2='http://www.netbian.com'

while True:
k=input("你要爬取第几页")
if int(k)>1:
   url1=f'http://www.netbian.com/meinv/index_{k}.htm'
else:
   url1=f'http://www.netbian.com/meinv/index.htm'
try:
   resp=requests.get(url1,headers=headers)
   resp.encoding='gbk'
   html=resp.text
   main_page=BeautifulSoup(html,"html.parser")
   alist=main_page.find("div",attrs={"class":"list"}).find_all("a",attrs={"target":"_blank"})
   break
except:
   print("页数过大，请重新输入：")

url=re.findall('<a href="(.*?)" title=".*?" target="_blank">',html) #从首页中获取子页面的路径
del(url)
url.pop()
j=0
for j in range(19):
url3=url2+url
j+=1
resp2 = requests.get(url3, headers=headers)
resp2.encoding='gbk'
html2=resp2.text
child_page=BeautifulSoup(html2,"html.parser")
clist=child_page.find("div",attrs={"class":"pic"}).find_all("img")
for q in clist:
   q1=q.get("src") #获取下载链接
   if not os.path.exists('4k壁纸%s' % k):
      os.mkdir(f'./4k壁纸%s' % k)
   f = open(f'./4k壁纸%s/' % k + "pic_%s.jpg" % n, mode="wb")
   tu = requests.get(q1,headers=headers)
   tu.close()
   f.write(tu.content)
   time.sleep(1)
   print("下载了%s张壁纸" % n)
   n+=1

orb001 发表于 2021-8-29 12:29

谢谢分享

HUHU666 发表于 2021-8-29 12:38

怎么使用啊，大神，我已经装了anaconda和python！

南归不NG 发表于 2021-8-29 13:18

HUHU666 发表于 2021-8-29 12:38
怎么使用啊，大神，我已经装了anaconda和python！

把他import导入的第三方包用pip install 下载,可以百度pip安装第三方包
然后在cmd中python xx.py(xx为你代码文件名)

HUHU666 发表于 2021-8-29 13:50

南归不NG 发表于 2021-8-29 13:18
把他import导入的第三方包用pip install 下载,可以百度pip安装第三方包
然后在cmd中python xx.py(xx ...

还是不会，大佬请加下具体步骤，另外，我电脑安装了anaconda和python，但是另存为**.py文件的时候，图标是白色的，不是python图标~

sikro 发表于 2021-8-29 14:00

HUHU666 发表于 2021-8-29 13:50
还是不会，大佬请加下具体步骤，另外，我电脑安装了anaconda和python，但是另存为**.py文件的时候，图标 ...

不用管图标，直接在终端里运行 python **.py 就行了

Riors7 发表于 2021-8-29 14:28

sikro 发表于 2021-8-29 13:43
有个小bug，页数输入1，或很大的数的时候会报错，帮你修改了一下

import os

非常感谢！

Riors7 发表于 2021-8-29 14:30

sikro 发表于 2021-8-29 13:43
有个小bug，页数输入1，或很大的数的时候会报错，帮你修改了一下

import os

非常感谢！

页: [1] 2 3

吾爱破解 - 52pojie.cn's Archiver

Python爬取彼岸桌面美女系列4k壁纸