吾爱破解 - 52pojie.cn

 找回密码
 注册[Register]

QQ登录

只需一步,快速开始

查看: 148|回复: 1
收起左侧

[学习记录] 【爬虫】爬取a站目标列表视频

[复制链接]
Derik 发表于 2024-11-12 13:50
(仅做学习,如有侵权,请私信)

[Python] 纯文本查看 复制代码
import os
import pprint
import re
import json
import requests
import fake_useragent
from tqdm import tqdm  # 显示进度条
from bs4 import BeautifulSoup

ua = fake_useragent.UserAgent().random
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36'
}


# 获取m3u8列表文件
def get_m3u8_list(url):
    r = requests.get(url, headers=headers)
    info = re.findall('window.pageInfo = window.videoInfo = (.*?)window.videoResource =', r.text, re.S)[0].strip()[:-1]
    info_json = \
    json.loads(json.loads(info)["currentVideoInfo"]["ksPlayJsonHevc"])['adaptationSet'][0]['representation'][0][
        'url']
    # pprint.pprint(info_json)
    name = json.loads(info)["title"]
    name = re.sub(r'[|?<>/\\]','',name)
    return info_json, name


# 提取所有视频片段的播放地址 ts文件
def get_ts_files(url):
    r = requests.get(url, headers=headers)
    ts_files = re.sub('#.*', '', r.text).split()
    return ts_files


# 下载并合并视频片段
def download_combine(ts_files, name):
    path = os.getcwd()
    with open(f'{path}/{name}.mp4', 'ab') as f:
        for ts in tqdm(ts_files):
            url = 'https://tx-safety-video.acfun.cn/mediacloud/acfun/acfun_video/' + ts
            content = requests.get(url, headers=headers).content
            f.write(content)
    f.close()


# 获取目录页的视频链接
def get_index_links(index_url):
    r = requests.get(index_url, headers=headers)
    soup = BeautifulSoup(r.text, 'html.parser')
    links = soup.find_all('div', class_="list-content-item")
    links_list = []
    for link in links:
        url = "https://www.acfun.cn" + link.a.get('href')
        links_list.append(url)
    return links_list



def main(urll):
    index_url = urll
    links = get_index_links(index_url)
    for url in links:
        m3u8_url, name = get_m3u8_list(url)
        ts_files = get_ts_files(m3u8_url)
        download_combine(ts_files, name)


if __name__ == '__main__':
    url = "https://www.acfun.cn/v/list135/index.htm?sortField=rankScore&duration=all&date=default&page=1"
    main(url)

发帖前要善用论坛搜索功能,那里可能会有你要找的答案或者已经有人发布过相同内容了,请勿重复发帖。

rose521rain 发表于 2024-11-12 16:31
试了一下,能下载,给力哦!
您需要登录后才可以回帖 登录 | 注册[Register]

本版积分规则

返回列表

RSS订阅|小黑屋|处罚记录|联系我们|吾爱破解 - LCG - LSG ( 京ICP备16042023号 | 京公网安备 11010502030087号 )

GMT+8, 2024-11-24 11:25

Powered by Discuz!

Copyright © 2001-2020, Tencent Cloud.

快速回复 返回顶部 返回列表