本帖最后由 hj170520 于 2020-5-8 21:47 编辑
爬取的网站是http://www.skesl.com
看网页介绍是个不错的网站 ,主要是告诉你怎么学好英语的。
只是扒了一部分,主要是conversation还有一些ESL_Basic_Course。
看这些效果怎么用,如果好用就全扒下来。据说这一套音频文件他们站要卖好几百块。
[Python] 纯文本查看 复制代码 import requests
import os
def conversation (a, b):
count_conversation = 0
url = "http://www.skesl.com/audio871/audioconv/"
if not os.path.exists('./audioconv'):
os.mkdir('./audioconv')
for i in range(a, b):
url_audio = url + str(i) + ".mp3"
res = requests.get(url_audio, headers=headers)
count_conversation += 1
print("正在下载第{}个'conversation'音频".format(count_conversation))
with open('./audioconv/' + str(i) + '.mp3', 'wb')as f:
f.write(res.content)
print("已完成{}个'conversation'音频".format(count_conversation))
def Basic_ESL_Course ():
count_Basic_ESL_Course = 0
url = "http://www.skesl.com/audio871/audiob/"
if not os.path.exists('./audiob'):
os.mkdir('./audiob')
for i in range(1, 91):
url_audio = url + str(i).zfill(2) + ".mp3"
res = requests.get(url_audio, headers=headers)
count_Basic_ESL_Course += 1
print("正在下载第{}个'Basic_ESL_Course'音频".format(count_Basic_ESL_Course))
with open('./audiob/' + str(i) + '.mp3', 'wb')as f:
f.write(res.content)
print("已完成{}个'Basic_ESL_Course'音频".format(count_Basic_ESL_Course))
if __name__ == '__main__':
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36',
'Cookie': '_ga=GA1.2.1150797413.1588604111; _gid=GA1.2.504643454.1588811433; _gat_gtag_UA_104534194_1=1; __gads=ID=74567312d013fd4d:T=1588608220:S=ALNI_MYgmuuZj8Sb3ObnOfq2SzP2DUwTwQ',
'Referer': 'http://www.skesl.com/esl/lesson/about-me',
'Host': 'www.skesl.com'}
conversation(47, 222)
conversation(301, 701)
Basic_ESL_Course ()
我只是在找很好的英语素材的同时,接触python语言的写法。
我在学雅思,可能近期不怎么会写python了。我吐了 |