BBC软件(com.iyuba.bbcstub)六分钟英语爬虫v2.0
本帖最后由 三滑稽甲苯 于 2020-9-30 20:04 编辑BBC软件截图⬇️
脚本截图⬇️
代码:
from os import listdir, remove, mkdir
from requests import get
from time import sleep
class Category():
def __init__(self, n):
self.num = n
self.title = None
self.sound = None
self.pic = None
self.readcount = None
self.id = None
def show(self):
print(f'#{self.num}/{self.readcount}/{self.title}/{self.title_cn}')
with open(f'Homepage/#{self.num}-{self.title}.jpg', 'wb') as f:
img = get(self.pic).content
f.write(img)
def download(self):
with open(f'Download/{self.title}.mp3', 'wb') as f:
mp3 = get('http://static.iyuba.cn/sounds/minutes/' + self.sound).content
f.write(mp3)
url = 'http://apps.iyuba.cn/minutes/titleNewApi.jsp?maxid={}&pages=1&pageNum=20&parentID=1&type=android&format=json'# {}内为当前所得最小的id值,欲最新则为0
r = get(url.format(0))
dic = r.json()['data']
banned = {'/', '\\', ':', '?', '*', '"', '<', '>', '|'}
for name in {'Homepage', 'Download'}:
try: mkdir(name)
except: pass
target = []
i = 0
print('#/Read Count/Title_EN/Title_CN(Picture in Homepage/)')
for item in dic:
epi = Category(i)
title = item['Title']
for b in banned: title = title.replace(b, '')
epi.title = title
epi.title_cn = item['Title_cn']
epi.sound = item['Sound']
epi.pic = item['Pic']
epi.readcount = item['ReadCount']
epi.id = item['BbcId']
epi.show()
target.append(epi)
i += 1
print('Input number to get one, and "next" to get a next page.')
while True:
t = input('I want #')
if t == 'next':
r = get(url.format(target.id))
dic = r.json()['data']
for item in dic:
epi = Category(i)
title = item['Title']
for b in banned: title = title.replace(b, '')
epi.title = title
epi.title_cn = item['Title_cn']
epi.sound = item['Sound']
epi.pic = item['Pic']
epi.readcount = item['ReadCount']
epi.id = item['BbcId']
epi.show()
target.append(epi)
i += 1
else:
try: n = int(t)
except: break
else: target.download()
print('Cleaning cache...')
target = listdir('Homepage')
for item in target:
remove('Homepage/'+item)
sleep(1)
注意:需要第三方requests库支持
食用方法:使用python运行代码,输入你想要的音频编号(可进入脚本生成的'Homepage'文件夹下查看对应主题图),自动下载至生成的'Download'目录下。(爬取的是'BBC六分钟英语'中的最新推送)
演示视频(偷懒,用termux演示),.py文件以及以后的更新:
https://www.lanzoux.com/b00zqpndi
密码:4yhk
LOG
9.30[v2.0] 添加'下一页'功能
这是手机软件吗? flypds 发表于 2020-8-15 08:05
这是手机软件吗?
是针对BBC应用的python脚本 谢谢楼主分享 请问这个app叫什么名字,可以在应用商店搜到吗? 天空宫阙 发表于 2020-8-15 08:56
请问这个app叫什么名字,可以在应用商店搜到吗?
就叫BBC,华为应用市场里有{:301_997:}
想到了VOA,DW也有不少好东西,免费可以下载就是需要翻墙,有时候还挺麻烦 感谢大牛分享,学习了
页:
[1]