批量抓取公众号，下载公众号音频和视频

susheng · 发表于 2022-6-18 21:27

之前发过帖子抓取2021年吾爱破解论坛公众号历史文章阅读量前10的文章   https://www.52pojie.cn/thread-1590044-1-1.html ，这次不抓取公众号阅读数数据，批量下载公众号文章，音频和视频，直接上代码：
def video(res, headers,date):
vid = re.search(r'wxv_.{19}',res.text)
# time.sleep(2)
if vid:
      vid = vid.group(0)
      print('视频id',vid)
      url = f'https://mp.weixin.qq.com/mp/videoplayer?action=get_mp_video_play_url&preview=0&vid={vid}'
      data = requests.get(url,headers=headers,timeout=1).json()
      video_url = data['url_info'][0]['url']
      video_data = requests.get(video_url,headers=headers)
      print('正在下载视频：'+trimName(data['title'])+'.mp4')
      with open(date+'___'+trimName(data['title'])+'.mp4','wb') as f:
         f.write(video_data.content)
def audio(res,headers,date,title):
aids = re.findall(r'"voice_id":"(.*?)"',res.text)
time.sleep(2)
tmp = 0
for id in aids:
      tmp +=1
      url = f'https://res.wx.qq.com/voice/getvoice?mediaid={id}'
      audio_data = requests.get(url,headers=headers)
      print('正在下载音频：'+title+'.mp3')
      with open(date+'___'+trimName(title)+'___'+str(tmp)+'.mp3','wb') as f5:
         f5.write(audio_data.content)
url = input('请输入文章链接：')
response = requests.get(url, headers=headers)
urls = re.findall('<a target="_blank" href="(https?://mp.weixin.qq.com/s\?.*?)"',response.text)
urls.append(url)
print('文章总数',len(urls))

for mp_url in urls:
res = requests.get(html.unescape(mp_url),proxies={'http': None,'https': None},verify=False, headers=headers)
content = res.text.replace('data-src', 'src').replace('//res.wx.qq.com', 'https://res.wx.qq.com')
try:
      title = re.search(r'var msg_title = \'(.*)\'', content).group(1)
      ct = re.search(r'var ct = "(.*)";', content).group(1)
      date = time.strftime('%Y-%m-%d', time.localtime(int(ct)))
      print(date,title)
      audio(res,headers,date,title)
      video(res,headers,date)
      with open(date+'_'+title+'.html', 'w', encoding='utf-8') as f:
         f.write(content)
except Exception as err:
      with open(str(randint(1,10))+'.html', 'w', encoding='utf-8') as f:
         f.write(content)
效果：

下载的音频，视频在当前目录，文章html可以用python再转pdf。

susheng · 发表于 2022-9-10 08:15

古月2004 发表于 2022-9-8 23:26
用的是你提供的 wechat_down这个程序。列表内容无法识别，想下载里面的音频文件

用新的这个 https://wwn.lanzouf.com/idEy207moo9c

lucool · 发表于 2023-9-15 22:41

怎么无法批量下载？并且视频也下载不了
https://mp.weixin.qq.com/s?__biz=MzAxNzQ4MjU3MA==&mid=2247526987&idx=1&sn=aebc5021d9127d78908175cf5321c192&chksm=9be6fc6eac917578efff36005f7e2d801e6baa184f787ca3e518a7908a7cf9553f1e1a035c55&scene=132&exptype=timeline_recommend_article_extendread_samebiz#wechat_redirect

冬天冷了多穿点 · 发表于 2022-6-18 23:32

感觉不错啊感谢分享

cnljm · 发表于 2022-6-18 23:58

多谢分享，请问有成品吗？

snow城 · 发表于 2022-6-19 00:23

多谢分享，最好有成品出来。

KatharsisKing · 发表于 2022-6-19 00:31

提示: 作者被禁止或删除内容自动屏蔽

潋天堂 · 发表于 2022-6-19 03:36

大佬能抓取小程序图片吗？

a3322a · 发表于 2022-6-19 07:11

感谢分享，测试一下看看

飘浮 · 发表于 2022-6-19 07:25

楼主能来个成品吗。

alfriend · 发表于 2022-6-19 08:14

谢谢分享

elroy23 · 发表于 2022-6-19 08:26

有选择性的下载就完美了

帐号		自动登录	找回密码
密码			注册[Register]

[Python 转载] 批量抓取公众号，下载公众号音频和视频

免费评分

本帖被以下淘专辑推荐:

KatharsisKing KatharsisKing 当前离线好友阅读权限 0 听众最后登录 1970-1-1 头像被屏蔽	KatharsisKing 发表于 2022-6-19 00:31 提示: 作者被禁止或删除内容自动屏蔽

	回复支持举报