emmm做一个微信公众号图片爬取的扒

cdsgg 发表于 2020-12-2 21:22

本帖最后由 cdsgg 于 2020-12-2 21:33 编辑

读取编码: UTF-8，大小: 2.01KB
import re
import datetime
import requests
from bs4 import BeautifulSoup
import os

a = 0

while True:

url = input("请输入url：")
curr_time = datetime.datetime.now()
print(curr_time)
headers = {
   'user-agent': 'Mozilla/5.0 (iPhone; CPU iPhone OS 11_0 like Mac OS X) AppleWebKit/604.1.38 (KHTML, like Gecko) Version/11.0 Mobile/15A372 Safari/604.1',
   'cookie': 'tvfe_boss_uuid=4427f26b6d83d5d7; pgv_pvid=8192465356; pgv_pvi=2750494720; RK=cfw14pvSFY; ptcz=026939cd8bdd917551be81f3d0d2563bdb9e2d0805f4c83de8df0ea6af457e49; eas_sid=i1e690x1l8v2I68559J4e8K995; LW_sid=W1C6S0u1y8a2A6E864o8L480Z0; LW_uid=51H6V041L8i2n6Q8M4S8e4k0D0; uin_cookie=o0878530130; ied_qq=o0878530130; o_cookie=878530130; pac_uid=1_878530130; luin=o0878530130; lskey=000100000f95a236a0b3f6a309a1f6e4809612024104f9a476a9b0803995ce53ec225971d5d95f3164c7df7a; rewardsn=; wxtokenkey=777'}
path=datetime.datetime.strftime(curr_time,'%Y%m%d%H%M')
print(path)
if os.path.exists(path):
   print("属于这个时间点的文件夹已经创建好")
else:
   os.mkdir(path)
   print("创建成功！！！！正在保存图片")
dirname=os.getcwd()+'\\'+path+'\\'
print(dirname)
# with open(dirname+'a.txt','w') as f:
# f.write(url)
# f.close

req = requests.get(url=url, headers=headers).content.decode()

soup = BeautifulSoup(req, 'lxml')

img = soup.find_all('img')

for i in img:
   imglist = i.get('data-src')
   print(imglist)
   pat = r"https://.*?wx_fmt=(.*)"
   rel = re.findall(pat, str(imglist))
   for j in rel:
         print(j)
         try:

            with open(dirname + '%s.%s' % (str(a),j), 'wb') as f:
               ig = requests.get(imglist, headers=headers).content
               f.write(ig)
               f.close()
               a = a + 1
         except Exception as e:
            print(e)
https://static.52pojie.cn/static/image/hrline/4.gif

成品链接蓝奏云：https://wwa.lanzouj.com/izsmEiztcaj

jiguanlang 发表于 2020-12-2 21:52

cdsgg 发表于 2020-12-2 21:44
可以好的那你发给我看看

链接: https://pan.baidu.com/s/1nIeGbdx22O11uYlx-ZIHvA 提取码: cmfv 复制这段内容后打开百度网盘手机App，操作更方便哦

我只需要爬取这8列地点为山东（包括渤海海域、黄海海域）的数据，起止页码可以自己输入。保存到excel或者csv格式

cdsgg 发表于 2020-12-2 22:01

jiguanlang 发表于 2020-12-2 21:52
链接: https://pan.baidu.com/s/1nIeGbdx22O11uYlx-ZIHvA 提取码: cmfv 复制这段内容后打开百度网盘手机A ...

要不远程吧

yzqhj 发表于 2020-12-2 21:24

楼主，怎么获取到微信的web地址呢？

jiguanlang 发表于 2020-12-2 21:36

大佬帮忙写个爬虫

cdsgg 发表于 2020-12-2 21:37

jiguanlang 发表于 2020-12-2 21:36
大佬帮忙写个爬虫

啥样子的

jiguanlang 发表于 2020-12-2 21:40

cdsgg 发表于 2020-12-2 21:37
啥样子的

https://www.52pojie.cn/thread-1318728-1-1.html
此悬赏贴

cdsgg 发表于 2020-12-2 21:41

jiguanlang 发表于 2020-12-2 21:40
https://www.52pojie.cn/thread-1318728-1-1.html
此悬赏贴

内网的= =咋搞

jiguanlang 发表于 2020-12-2 21:42

cdsgg 发表于 2020-12-2 21:41
内网的= =咋搞

提供网页保存的源码可否？

cdsgg 发表于 2020-12-2 21:44

jiguanlang 发表于 2020-12-2 21:42
提供网页保存的源码可否？

可以好的那你发给我看看

jrzhao 发表于 2020-12-2 21:56

request建议考虑使用session，加快速度。因为微信公众号是属于同一个域名，减少连接建立断开的时间

页: [1] 2

吾爱破解 - 52pojie.cn's Archiver

emmm做一个微信公众号图片爬取的扒