吾爱破解 - 52pojie.cn

 找回密码
 注册[Register]

QQ登录

只需一步,快速开始

查看: 2464|回复: 10
收起左侧

[已解决] 爬虫-人人影视JSON嵌套那么多层,有没有好方法精准定位?MP4的磁力

[复制链接]
d8349565 发表于 2020-11-14 01:41
本帖最后由 d8349565 于 2020-11-14 11:53 编辑

如题,求指导!

只想要这里的名称和链接
image.png
[Asm] 纯文本查看 复制代码
from lxml import etree
import re
import requests
import jsonpath


UA伪装 = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.183 Safari/537.36'}

def bianhao(keyword):
    url=f'http://www.rrys2020.com/search?keyword={keyword}'
    response = requests.get(url=url,headers=UA伪装).text
    tree = etree.HTML(response)
    name = tree.xpath('//strong[@class="list_title"]//text()')
    a=tree.xpath('//div[@class="t f14"]//@href')
    编号=[i.split("/")[2] for i in a]
    # 类别=[i.split("/")[1] for i in a]
    输出=dict(zip(name,编号))
    return 输出

def daima(编号):
    url=f'http://www.rrys2020.com/resource/index_json/rid/{编号}/channel/movie'
    response = requests.get(url=url,headers=UA伪装).text
    response = response.replace('var index_info=','')
    # 响应数据 = 响应数据.replace(');','')
    # json = json.loads(response)
    tree = etree.HTML(response)
    daima = tree.xpath('//a/@href')[0]
    daima=daima.split("=")[1].replace('\\"', '')
    return daima

def url_get(daima):
    url=f'http://got002.com/api/v1/static/resource/detail?code={daima}'
    response = requests.get(url=url, headers=UA伪装).json()
    # response = requests.get(url=url, headers=UA伪装).json().get('data').get('list')
    输出=jsonpath.jsonpath(response,'$..address')
    for i in 输出:
        if 'magnet:' in i:
            print(i)


a=bianhao('黑袍纠察队')
print(a)
b=list(a.values())[0]
c=daima(b)
url_get(c)

发帖前要善用论坛搜索功能,那里可能会有你要找的答案或者已经有人发布过相同内容了,请勿重复发帖。

 楼主| d8349565 发表于 2020-11-14 11:52
谢谢各位,我早上想到了好的方法了,代码分享如下:
[Asm] 纯文本查看 复制代码
from lxml import etree
import requests


UA伪装 = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.183 Safari/537.36'}

def bianhao(keyword):
    url=f'http://www.rrys2020.com/search?keyword={keyword}'
    response = requests.get(url=url,headers=UA伪装).text
    tree = etree.HTML(response)
    name = tree.xpath('//strong[@class="list_title"]//text()')
    a=tree.xpath('//div[@class="t f14"]//@href')
    编号=[i.split("/")[2] for i in a]
    # 类别=[i.split("/")[1] for i in a]
    输出=dict(zip(name,编号))
    return 输出

def daima(编号):
    url=f'http://www.rrys2020.com/resource/index_json/rid/{编号}/channel/movie'
    response = requests.get(url=url,headers=UA伪装).text
    response = response.replace('var index_info=','')
    # 响应数据 = 响应数据.replace(');','')
    # json = json.loads(response)
    tree = etree.HTML(response)
    daima = tree.xpath('//a/@href')[0]
    daima=daima.split("=")[1].replace('\\"', '')
    return daima

def url_get(daima):
    url=f'http://got002.com/api/v1/static/resource/detail?code={daima}'
    response = requests.get(url=url, headers=UA伪装).json().get('data').get('list')
    season =response
    season_count=len(season)
    for i in range(0,season_count):
        第几季=season[i]['season_cn']
        print(第几季)
        集=season[i].get('items').get('MP4')
        #可把'MP4'替换为APP、HDTV、WEB-720P、WEB-1080P
        集_count=len(集)
        # for n in range(0,集_count):
        name=[集[n].get('name') for n in range(0,集_count)]
        # data.list[0].items.MP4[0].name
        address=[集[n].get('files')[1].get('address') for n in range(0,集_count)]
        # data.list[0].items.MP4[0].files[0].address

        print(name)
        print('-' * 120)
        print(address)
        print('*'*120)

if __name__ == '__main__':
    a=bianhao('无垠的太空')
    序号=1
    for i in a:
        print(f'{序号}、{i}')
        序号+=1

    num=int(input('请输入要搜索的序号'))
    b=list(a.values())[num-1]
    c=daima(b)
    url_get(c)

莫丶莫欺少年穷 发表于 2020-11-14 03:56
chen4321 发表于 2020-11-14 07:10
在vscode或者pycharm里都可以查看变量生成json路径
Eaglecad 发表于 2020-11-14 08:13
用sed命令,匹配
kidneyissource 发表于 2020-11-14 08:38
反正结构又不会变,就第一次复杂一点,后面用循环不就行了吗
E飞翔 发表于 2020-11-14 08:52
直接json取不就行啦。
Hangjau 发表于 2020-11-14 09:07
json 可直接提 如果是。字符串可以用正则 如果能保证是页面唯一直接xpath全局搜索定位
oudaidai 发表于 2020-11-14 09:13
用正则吧
super谦 发表于 2020-11-14 09:25
magnet_text =  re.findall('address:"magnet(.*?)"', text)
找到,然后遍历和magnet拼接,应该就可以了吧
realgreenhand 发表于 2020-11-14 10:09
正则表达式和for循环吧
您需要登录后才可以回帖 登录 | 注册[Register]

本版积分规则

返回列表

RSS订阅|小黑屋|处罚记录|联系我们|吾爱破解 - LCG - LSG ( 京ICP备16042023号 | 京公网安备 11010502030087号 )

GMT+8, 2025-1-16 09:35

Powered by Discuz!

Copyright © 2001-2020, Tencent Cloud.

快速回复 返回顶部 返回列表