-=========================================================-
1、申 请 I D :Alpaca_
2、个人邮箱:2513150647@qq.com
3、原创技术文章:
通过python爬取爱奇艺视频到本地
项目是去年做的,今年没怎么写逆向,先用这个过审核了
# 爱奇艺视频下载逆向
1. 找到m3u8视频链接
![image-20221105124907109](C:\Users\Alpaca\AppData\Roaming\Typora\typora-user-images\image-20221105124907109.png)
2. 视频详情页:
url:https://pcw-api.iqiyi.com/video/video/playervideoinfo?tvid=5175273237810200&locale=cn_s&callback=Qa63062b19104c9508ca7d2398ce6c645
3. 视频内容请求参数分析:
URL:https://cache.video.iqiyi.com/dash
| 参数类型 | 参考值 | 变值 | 描述 |
| :------------ | :----------------------------------------------------------: | ---: | ------------------------------------ |
| tvid | 1615261206566600 | 变 | |
| bid | 600 | | 清晰度,600为1080p,500超清,300高清 |
| vid | bfd2f90a885d99e1af2133aad94b5447 | 变 | |
| src | 01010031010000000000 | | |
| vt | 0 | | |
| rs | 1 | | |
| uid | 3332594213163264 | 变 | |
| ori | pcw | | |
| ps | 1 | | |
| k_uid | 952a32c9f139f8f46f704f8ed42a250f | 变 | |
| pt | 0 | | |
| d | 0 | | |
| s | | | |
| lid | | | |
| cf | | | |
| ct | | | |
| authKey | 2def2a2637c02f92608fb4b22689fb6a | 变 | |
| k_tag | 1 | | |
| dfp | a0a115d368ece94755b4879a16e6070a28e7e4b74a0f2a51d86fb28ddd00845023 | 变 | |
| loacle | zh_cn | | |
| proi | {"ff":"f4v","code":2} | | |
| pck | 1bVnzm2TIa7AdztPlTGAm1ANo50vrLZMvfZUJVbDS11ym3EX23xNogu3TBIrlQL9OgdDz8b | 变 | |
| k_err_retries | 0 | | |
| up | | | |
| sr | 1 | | |
| qd_v | 2 | | |
| tm | 1667623225114 | 变 | |
| qdy | a | | |
| qds | 0 | | |
| k_ft1 | 706436220846084 | | |
| k-ft4 | 1161084347621396 | | |
| k_ft5 | 262145 | | |
| k_ft7 | 4 | | |
| bop | {"version":"10.0","dfp":"a0a115d368ece94755b4879a16e6070a28e7e4b74a0f2a51d86fb28ddd00845023"} | 变 | |
| ut | 1 | | |
| vf | 83e3771bfeb804dd1312a66d81e248d6 | 变 | |
4. 参数分析:
1. tvid:网页源码中可找到("v":{"tvid":)
```python
re.search('"tvid":(?P<tvid>.*?),').group('tvid')
```
2. vid:网页源码中可找到
```python
re.findall('"bid":600,"vid":"(.*?)"')
```
3. uid:cookie中名为P00003的值
4. k_uid:cookie中,名为QY_PUSHMSG_ID(其实就是个32位伪随机数)
5. authKey:在pcweb.wonder.6e664f99.js文件中可以找到生成代码
6. dfp:.iqiyi.com域的cookie中可以找到
7. pck:翻了好久的js代码,最后发现就是cookie里面的P00001的值
![image-20221106215502037](教程.assets/image-20221106215502037.png)
8. tm:
9. k_ft1-7:不变
10. bop:{"version":"10.0","dfp":"dfb的值"}
11. vf:vf是由webpack打包的一段函数生成(7w多行)
webpack分发器在pcweb.wonder.xxxxx.js函数里面
![image-20221129133521282](C:\Users\Alpaca\AppData\Roaming\Typora\typora-user-images\image-20221129133521282.png)
加密函数本体在mmc.authkey.xxxxx.js中,将分发器之前的代码全部删除,使用码云中开源的wepack打包器将函数打包出来,项目地址:[渔滒 / webpack_ast · GitCode](https://gitcode.net/zjq592767809/webpack_ast)
在控制台使用
``` node webpack_mixer.js -l pcweb.wonder.xxxx.js -m mmc.authkey.xxxx.js -o webout.js```
会生成一个webout.js文件,即为导出的函数,导出的函数很长,我们只保留vf代码需要的1043与545两个键值对应的函数即可
调用方法
```js
const { JSDOM } = require("jsdom");
const dom = new JSDOM("<html><head></head><body><p>hello world</p></body></html>")
global.window = dom.window;
global.document = window.document;
global.self = window;
const n = require("./webout.js");
const i = n('545')
console.log(i.mmc('传入url值,格式为/dash?...........&ut=1'))
```
调用后会发现返回值与web中的vf值不同,是因为检测到node环境了,只需要将aU函数修改为返回6即可(原本为很长的一个switch-case语句,当环境为web时该函数返回6)
![image-20221129134541802](C:\Users\Alpaca\AppData\Roaming\Typora\typora-user-images\image-20221129134541802.png)
5. 返回值分析:
1. ```python
['data']['program']['video'][0]['m3u8']
# m3u8文件地址
```
2. ```py
['data']['program']['video'][0]['duration']
# 视频时长
```
3. ```py
['data']['program']['video'][0]['vsize']
# 视频大小
```
拿到m3u8地址后
爱奇艺爬虫
-
找到m3u8视频链接
-
视频详情页:
url:https://pcw-api.iqiyi.com/video/video/playervideoinfo?tvid=5175273237810200&locale=cn_s&callback=Qa63062b19104c9508ca7d2398ce6c645
-
视频内容请求参数分析:
URL:https://cache.video.iqiyi.com/dash
参数类型 |
参考值 |
变值 |
描述 |
tvid |
1615261206566600 |
变 |
|
bid |
600 |
|
清晰度,600为1080p,500超清,300高清 |
vid |
bfd2f90a885d99e1af2133aad94b5447 |
变 |
|
src |
01010031010000000000 |
|
|
vt |
0 |
|
|
rs |
1 |
|
|
uid |
3332594213163264 |
变 |
|
ori |
pcw |
|
|
ps |
1 |
|
|
k_uid |
952a32c9f139f8f46f704f8ed42a250f |
变 |
|
pt |
0 |
|
|
d |
0 |
|
|
s |
|
|
|
lid |
|
|
|
cf |
|
|
|
ct |
|
|
|
authKey |
2def2a2637c02f92608fb4b22689fb6a |
变 |
|
k_tag |
1 |
|
|
dfp |
a0a115d368ece94755b4879a16e6070a28e7e4b74a0f2a51d86fb28ddd00845023 |
变 |
|
loacle |
zh_cn |
|
|
proi |
{"ff":"f4v","code":2} |
|
|
pck |
1bVnzm2TIa7AdztPlTGAm1ANo50vrLZMvfZUJVbDS11ym3EX23xNogu3TBIrlQL9OgdDz8b |
变 |
|
k_err_retries |
0 |
|
|
up |
|
|
|
sr |
1 |
|
|
qd_v |
2 |
|
|
tm |
1667623225114 |
变 |
|
qdy |
a |
|
|
qds |
0 |
|
|
k_ft1 |
706436220846084 |
|
|
k-ft4 |
1161084347621396 |
|
|
k_ft5 |
262145 |
|
|
k_ft7 |
4 |
|
|
bop |
{"version":"10.0","dfp":"a0a115d368ece94755b4879a16e6070a28e7e4b74a0f2a51d86fb28ddd00845023"} |
变 |
|
ut |
1 |
|
|
vf |
83e3771bfeb804dd1312a66d81e248d6 |
变 |
|
-
参数分析:
-
tvid:网页源码中可找到("v":{"tvid":)
re.search('"tvid":(?P<tvid>.*?),').group('tvid')
-
vid:网页源码中可找到
re.findall('"bid":600,"vid":"(.*?)"')
-
uid:cookie中名为P00003的值
-
k_uid:cookie中,名为QY_PUSHMSG_ID(其实就是个32位伪随机数)
-
authKey:在pcweb.wonder.6e664f99.js文件中可以找到生成代码
-
dfp:.iqiyi.com域的cookie中可以找到
-
pck:翻了好久的js代码,最后发现就是cookie里面的P00001的值
-
tm:
-
k_ft1-7:不变
-
bop:{"version":"10.0","dfp":"dfb的值"}
-
vf:vf是由webpack打包的一段函数生成(7w多行)
webpack分发器在pcweb.wonder.xxxxx.js函数里面
加密函数本体在mmc.authkey.xxxxx.js中,将分发器之前的代码全部删除,使用码云中开源的wepack打包器将函数打包出来,项目地址:渔滒 / webpack_ast · GitCode
在控制台使用
node webpack_mixer.js -l pcweb.wonder.xxxx.js -m mmc.authkey.xxxx.js -o webout.js
会生成一个webout.js文件,即为导出的函数,导出的函数很长,我们只保留vf代码需要的1043与545两个键值对应的函数即可
调用方法
const { JSDOM } = require("jsdom");
const dom = new JSDOM("<html><head></head><body><p>hello world</p></body></html>")
global.window = dom.window;
global.document = window.document;
global.self = window;
const n = require("./webout.js");
const i = n('545')
console.log(i.mmc('传入url值,格式为/dash?...........&ut=1'))
调用后会发现返回值与web中的vf值不同,是因为检测到node环境了,只需要将aU函数修改为返回6即可(原本为很长的一个switch-case语句,当环境为web时该函数返回6)
-
返回值分析:
-
['data']['program']['video'][0]['m3u8']
# m3u8文件地址
-
['data']['program']['video'][0]['duration']
# 视频时长
-
['data']['program']['video'][0]['vsize']
# 视频大小
在此便可拿到m3u8视频地址
再获取一个cookie就可尝试用python下载了,cookie有许多获取方式,我这边比较简单粗暴,直接读取chrome本地的cookie文件,不同版本cookie文件地址不同,需要自己找一下
def get_local_cookie_dic(cookie_url): # 获取cookie字典
UserDataDir = os.environ['LOCALAPPDATA'] + r'\Google\Chrome\User Data'
LocalStateFilePath = UserDataDir + r'\Local State'
CookiesFilePath = UserDataDir + r'\Default\Cookies'
try:
con = sqlite3.connect(CookiesFilePath)
except:
return
res = con.execute('select host_key,name,encrypted_value from cookies').fetchall()
con.close()
cookie = {}
key = pull_the_key(get_string(LocalStateFilePath)) # 获取key
for i in res:
cookie_value = DecryptString(key, i[2]) # 解密cookie
if i[0] == cookie_url:
if i[1] == 'ptag':
continue
cookie[i[1]] = cookie_value
return cookie
之后获取cookie之后便可尝试下载视频了,我这里的代码都是之前做的一个项目里面用到的,直接搬过来用了
def get_download_url_list(self): # 获取m3u8文件链接
video_type = 'movie'
not_download_list = iqiyi_video_table().search_not_download(video_type) # 获取未下载的视频列表
for video_info in not_download_list: # 遍历未下载的视频列表
path = f'{iqiyi_setting_table().get_setting("Video_Path")}/{video_type}/{video_info["video_title"]}.mp4' # 视频保存路径
params = all_params(video_info['tvid'], video_info['url']) # 获取视频下载链接的参数
download_list_resp = requests.get('https://cache.video.iqiyi.com/dash', headers=headers, cookies=cookie_dic,
params=params) # 获取视频下载链接
m3u8 = download_list_resp.json()['data']['program']['video'][0]['m3u8']
download_url_list = re.findall('EXTINF:\d+,\n(.*?)\n#', m3u8, re.S)
if not download_url_list: # 如果下载链接获取失败
self.download_error_sign.emit(f'{video_info["video_title"]}m3u8下载链接获取失败')
return
for index, url in enumerate(download_url_list): # 遍历下载链接列表
self.video_download(url, path) # 下载视频
self.download_success_sign.emit(index, len(download_url_list), video_type, video_info['video_title'])
iqiyi_video_table().update_state(video_info['tvid'], int(time.time() * 1000), 1)
def video_download(self, url, path):
"""
:param path: 视频保存路径
:param url: 视频下载的直接地址
"""
with open(path, mode='ab') as video_file:
video_count = requests.get(url, headers, cookies=cookie_dic)
video_file.write(video_count.content)
最后下载好的文件
=============================================================================================================================================================
|