好友
阅读权限20
听众
最后登录1970-1-1
|
本帖最后由 hksnow 于 2019-8-2 22:52 编辑
软件截图:
使用单线程下载,不容易卡死,自动设置下载目录,防止下载文件不知道搞到了哪里。
代码比较乱,使用正则对信息进行处理
[Python] 纯文本查看 复制代码 #-*- coding:utf-8 -*-
import requests
import re
import os
headers = {
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0'
}
def download(url,chapters_name,title,path):
html = requests.get(url,headers = headers)
text = re.findall('<body class="clearfix">(.*?)<div class="chapter-view">',html.text,re.S)[0]
chapterImages = re.findall('(\\[[^\\]]*\\])',text,re.S)
host = re.findall('pageImage = "(.*?)";',text,re.S)[0].split('/')[2]
chapterPath = re.findall('chapterPath = "(.*?)"',text,re.S)[0]
#print(host)
#print(chapterImages)
chapterImages_list = eval(chapterImages[2])
for x in chapterImages_list:
print(x)
download_url = 'http://' + host + '/' + chapterPath + x
print(download_url)
file1 = requests.get(download_url,headers = headers)
with open(path + '\\' + x,'ab') as code:
code.write(file1.content)
def get_chapter(url):
html = requests.get(url,headers = headers)
html.encoding='utf-8'
text = re.findall('<div class="chapter-body clearfix">(.*?)<div class="chapter-category clearfix">',html.text,re.S)[0].replace('\n','').replace(' ','')
#print(data)
title = re.findall('<h1><span>(.*?)</span></h1>',html.text,re.S)[0]
data = re.findall('<li><ahref="(.*?)"class=""><span>(.*?)</span>',text,re.S)
print(title)
for x in data:
url = 'https://www.36mh.com' + x[0]
chapters_name = x[1]
print(chapters_name)
path1 = os.getcwd()
path2 = path1 + '\\' + title + '\\' + chapters_name
os.makedirs(path2)
#print(path2)
download(url,chapters_name,title,path2)
if __name__ == '__main__':
print('请输入链接,例如:https://www.36mh.com/manhua/yukuaideshiyi/')
url = input()
get_chapter(url)
这个漫画
https://www.36mh.com/manhua/yukuaideshiyi/
我去好好看看
|
免费评分
-
查看全部评分
|
发帖前要善用【论坛搜索】功能,那里可能会有你要找的答案或者已经有人发布过相同内容了,请勿重复发帖。 |
|
|
|
|