蜻蜓fm有声书批量下载支持账号登录原创源码

天空宫阙 发表于 2020-1-20 14:56

本帖最后由天空宫阙于 2020-8-31 12:46 编辑

目标站点 https://www.qingting.fm/

python爬虫实战搞的是PC端，移动端可能有更加方便的接口，欢迎大家留意讨论反正是练手就随便抓一下pc端的包好了

主要内容1.post请求登录2.hmacMD5算法的简单使用

这个例子的登录非常简单没有任何加密直接post就行真的是一点加密和未知参数都没有

python实现，注意此处是类的一个方法不完整无法直接运行 def login(self,user_id,password):
   data = {
         'account_type': '5',
         'device_id': 'web',
         'user_id': user_id,
         'password': password
   }
   response = self.session.post(self.login_url,data=data)
   if response.status_code==200:
         temp = response.json()
         errorno = temp['errorno']
         errormsg = temp['errormsg']
         if errorno == 0:
            print('login successful!','登录成功！')
            data = temp['data']
            self.qingting_id = data['qingting_id']
            self.access_token = data['access_token']

         else:
            print('Login failed','登录失败')
            print(errormsg)
登录成功后我们把access_token和qingting_id拿到，相当于一个登录后的标志如果账号是会员相当于一个会员标志

音频的真实地址请求了这样一个url：
https://audio.qingting.fm/audiostream/redirect/294280/11604885
其中294280是专辑id，
11604885是当前音频的id

其中还带了一些参数比如access_token,qingting_id(登录成功的response中有，上图没有登录所有是空的)，另外还有一些比如t是时间戳，
device_id=MOBILESITE（不变）
关键就在于sign（尝试过不加sign会返回一个签名错误）

可以通过全局搜索试一下是哪个js生成的这个sign我全局搜索了一下
device_id

在mian.一大堆.js 找到了生成sign的函数（需要自己分辨一下是一个device_id: "MOBILESITE"的）
搜索其他关键字应该也是可以顺利找到的

这里的sign是u这个变量它是由c这个变量通过一堆加密处理得到的
我们可以控制台输出一下u和c

所以我们就知道了sign实际是加密了请求的其他参数
一开始我误以为是单纯的MD5所以卡了好久（还进入函数内部看他具体是怎么实现的看的一头雾水）
其实代码已经告诉用的是
createHmac("md5", "fpMn12&38f_2e")
查了下Hmac发现就是一种现成的算法，还有不同的模式MD5是其中一种，需要一个秘钥

这里什么都告诉你了，用Hmac-md5秘钥是fpMn12&38f_2e

找个在线加密的网站试了下，果然和刚才控制台输出的一样

python的话需要import
hmac这个库

import hmac
import time

base_url = "https://audio.qingting.fm"
bookid = "294280"
id = "11590788"
access_token = ""
qingting_id =""
timestamp = str(round(time.time()*1000))
data = f"/audiostream/redirect/{bookid}/{id}?access_token={access_token}&device_id=MOBILESITE&qingting_id={qingting_id}&t={timestamp}"
message = data.encode('utf-8')
key = "fpMn12&38f_2e".encode('utf-8')
sign = hmac.new(key, message, digestmod='MD5').hexdigest()
whole_url = base_url+data+"&sign="+sign
print(whole_url)

得到一个音频可以做到了剩下的就是得到一堆了，其实我们得到每个音频的id就可以了

我请求的是这个接口
info_api = 'https://i.qingting.fm/capi/channel/{self.bookid}/programs/{self.version}?curpage={str(page)}&pagesize=30&order=asc'
version在声书主页的源代码中，只要改curpage就可以翻页了

完整的源码
import requests
import re
import hmac
import time
from tqdm import tqdm
from bs4 import BeautifulSoup
import os
import json
import sys
import urllib3
urllib3.disable_warnings()

class QingTing():
def __init__(self,user_id,password,bookurl,ifLogin):
   self.ifLogin = ifLogin
   self.user_id = user_id
   self.password = password
   self.session = requests.session()
   self.session.headers.update({'user-agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36'})
   self.login_url = "https://u2.qingting.fm/u2/api/v4/user/login"
   self.qingting_id = ''
   self.access_token = ''
   self.bookurl = bookurl
   # self.bookurl = 'https://www.qingting.fm/channels/257790'
   self.bookid = self.bookurl.split('/')[-1]
   self.version = ''
   self.qingtinghost = 'https://audio.qingting.fm'
   self.save_path = ''
   self.bookname = ''

def login(self,user_id,password):
   data = {
         'account_type': '5',
         'device_id': 'web',
         'user_id': user_id,
         'password': password
   }
   response = self.session.post(self.login_url,data=data,verify=False)
   if response.status_code==200:
         temp = response.json()
         errorno = temp['errorno']
         errormsg = temp['errormsg']
         if errorno == 0:
            print('login successful!','登录成功！')
            data = temp['data']
            self.qingting_id = data['qingting_id']
            self.access_token = data['access_token']

         else:
            print('Login failed','登录失败')
            print(errormsg)
            time.sleep(10)
            sys.exit(0)

def __get_version(self):
   response =self.session.get(url=self.bookurl,verify=False)
   if response.status_code==200:
         soup = BeautifulSoup(response.text,'lxml')
         temp_bookname = soup.select('div.album-info-root > div.top > div.info.right > h1').string
         replaced_pattern = '[\\\/:\*\?\"<>|]'
         self.bookname = re.sub(replaced_pattern,' ',temp_bookname,flags=re.M +re.S)
         if not os.path.exists(self.bookname):
            os.makedirs(self.bookname)
         matched = re.search('\"version\":\"(\w+)"',response.text,re.S)
         if matched:
            version = matched.group(1)
            self.version = version
            # return version

def __get_total_page(self):
   self.__get_version()
   page = 1
   info_api = f'https://i.qingting.fm/capi/channel/{self.bookid}/programs/{self.version}?curpage={str(page)}&pagesize=30&order=asc'
   response =self.session.get(info_api,verify=False)
   if response.status_code==200:
         temp =response.json()
         total = temp['data']['total']
         total_page = int(int(total)/30)+1
         return total,total_page

def get_book_info(self):
   total,total_page = self.__get_total_page()
   print(self.bookname,'共{}集'.format(total))
   for page in range(1,total_page+1):
         info_api = f'https://i.qingting.fm/capi/channel/{self.bookid}/programs/{self.version}?curpage={str(page)}&pagesize=30&order=asc'
         response =self.session.get(info_api,verify=False)
         programs = response.json()['data']['programs']
         for program in programs:
            # print(program['id'],program['title'])
            yield program

def get_src(self,id):
   bookid = self.bookid
   access_token = self.access_token
   qingting_id =self.qingting_id
   timestamp = str(round(time.time()*1000))
   data = f"/audiostream/redirect/{bookid}/{id}?access_token={access_token}&device_id=MOBILESITE&qingting_id={qingting_id}&t={timestamp}"
   message = data.encode('utf-8')
   key = "fpMn12&38f_2e".encode('utf-8')
   sign = hmac.new(key, message, digestmod='MD5').hexdigest()
   whole_url = self.qingtinghost+data+"&sign="+sign
   return whole_url

def downloadFILE(self,url,name):
   resp = self.session.get(url=url,stream=True,verify=False)
   if resp.headers['Content-Type'] =='audio/mpeg':
         content_size = int(int(resp.headers['Content-Length'])/1024)
         with open(name, "wb") as f:
            print("Pkg total size is:",content_size,'k,start...')
            for data in tqdm(iterable=resp.iter_content(1024),total=content_size,unit='k',desc=name):
               f.write(data)
            print(name , "download finished!")
   else:
         errorno = resp.json()['errorno']
         errormsg = resp.json()['errormsg']
         print('没有权限下载,请登录已购此音频的账号。')
         print('errorno:',errorno,errormsg)




def run(self):
   if self.ifLogin:
         self.login(self.user_id,self.password)
   programs =self.get_book_info()
   count = 0
   for program in programs:
         count+=1
         try:
            id = program['id']
            title = str(count).zfill(4)+' '+program['title']+'.m4a'
            if not self.bookname =='':
               title = os.path.join(self.bookname,title)
            whole_url =self.get_src(id)
            self.downloadFILE(whole_url,title)
         except Exception as e:
            print(e)
            with open('log.txt','a',encoding='utf-8') as f:
               f.write(str(count)+str(e)+'\n')

def get_config_info():
with open('config.json','r',encoding='utf-8') as f:
   config = json.loads(f.read())
   return config

if __name__ == "__main__":
# pyinstaller -F -i ico.ico QingTingFM.py
config = get_config_info()
if config["ifLogin"]:
   bookurl = input('请输入要下载音频的主页链接:(如https://www.qingting.fm/channels/257790)')
   isvalid = re.search('https://www.qingting.fm/channels/\d+',bookurl)
   if isvalid:
         q= QingTing(config["user_id"],config["password"],bookurl,1)
         q.run()
   else:
         print("输入的主页格式错误")
else:
   # 不登录
   bookurl = input('请输入要下载音频的主页链接:(如https://www.qingting.fm/channels/257790)')
   isvalid = re.search('https://www.qingting.fm/channels/\d+',bookurl)
   if isvalid:
         q= QingTing(config["user_id"],config["password"],bookurl,0)
         q.run()
   else:
         print("输入的主页格式错误")

配置文件部分
{    "ifLogin":1,
"user_id":"135########",
"password":"pwd########"}

源码下载https://www.lanzoux.com/i2xT2g85tmf

就这么多，如果觉得贴子还可以，免费评分鼓励一下

受气包 发表于 2020-8-29 12:50

满心欢喜的进来，一脸懵逼的出去！

allennt 发表于 2020-7-4 21:41

感谢分享！！

lp-cg 发表于 2020-1-20 15:08

感谢分享

天空宫阙 发表于 2020-1-20 15:12

输入的格式不能被'https://www.qingting.fm/channels/\d+'这个正则匹配到的都会直接退出，比如https://www.qingting.fm/channels/257790输入成https://www.qingting.fm/channels/257790/或https://www.qingting.fm/channels/257790)或者不加https都不行，请严格复制

bigwit 发表于 2020-1-20 15:13

感谢分享

bigwit 发表于 2020-1-20 15:18

天空宫阙发表于 2020-1-20 15:12
输入的格式不能被'https://www.qingting.fm/channels/\d+'这个正则匹配到的都会直接退出，比如https://www. ...

我直接用的实例的链接

lijt16 发表于 2020-1-20 15:20

我猜到了加密，但是还是不了解这些，菜鸟在线感谢分享:loveliness:

天空宫阙 发表于 2020-1-20 15:23

bigwit 发表于 2020-1-20 15:18
我直接用的实例的链接

那可能是环境问题了，建议装个python

supnet 发表于 2020-1-20 15:27

非常感谢

bigwit 发表于 2020-1-20 15:29

天空宫阙发表于 2020-1-20 15:23
那可能是环境问题了，建议装个python

我现在在跑源码,报这个错
bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml. Do you login successful! 登录成功！
Traceback (most recent call last):
File "/Users/huai/Downloads/qingtingfmpy/QingTingFM.py", line 162, in <module>
q.run()
File "/Users/huai/Downloads/qingtingfmpy/QingTingFM.py", line 135, in run
for program in programs:
File "/Users/huai/Downloads/qingtingfmpy/QingTingFM.py", line 90, in get_book_info
total,total_page = self.__get_total_page()
File "/Users/huai/Downloads/qingtingfmpy/QingTingFM.py", line 77, in __get_total_page
self.__get_version()
File "/Users/huai/Downloads/qingtingfmpy/QingTingFM.py", line 64, in __get_version
soup = BeautifulSoup(response.text,'lxml')
File "/Users/huai/Downloads/qingtingfmpy/venv/lib/python3.8/site-packages/bs4/__init__.py", line 225, in __init__
raise FeatureNotFound(
bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml. Do you need to install a parser library?

天空宫阙 发表于 2020-1-20 15:32

本帖最后由天空宫阙于 2020-1-20 15:33 编辑

bigwit 发表于 2020-1-20 15:29
我现在在跑源码,报这个错
bs4.FeatureNotFound: Couldn't find a tree builder with the features you r ...
没有lxml这个解析库，pip安装一下
把import的你没安装的都用pip装一下，有空我搞个requirements

页: [1] 2 3 4 5

吾爱破解 - 52pojie.cn's Archiver

蜻蜓fm有声书批量下载 支持账号登录 原创源码

蜻蜓fm有声书批量下载支持账号登录原创源码