异步协程 - 91韩漫采集 (exe)
昨天晚上花两个小时复习了一下异步协程,本想趁脑子里还有记忆,来实践一下,谁知道我那个比代码有bug, 死活用不上异步,
经过反复测试,发现返回的调用上逻辑处理不好,虽然能跑,但异步用不上,无奈只能改写了。花了不少时间,但也学到很多,改bug也是一种乐趣。
小插曲(吐槽)
昨天看有坛友问怎么没下载,我就花了点时间学习怎么打包,就很烦,中午的时候,想配置在虚拟环境中打包,怕忘记了怎么使用虚拟环境还特意去搜索了一下,创建虚拟环境的时候结果给报了个:'mkvirtualenv' 不是内部或外部命令,也不是可运行的程序 ,或批处理文件。
然而 virtualenv test 就可以创建,但使用进入虚拟环境就不行,哎给我整不会了。
我这个死倔的精神可不会惯着他,虽然不影响使用,但就是不爽,捣腾了一会在stackoverflow上找到答案。
最后解决方法是 pip install virtualenvwrapper-win, 不知道是啥,安装就能正常用virtualenv了,可折腾死了。
代码部分:
# _*_ coding: utf-8 _*_
# @Time: 2023/3/28 23:25
# @Author: 🎈
# @File: 91hanman
import aiohttp
import asyncio
import aiofiles
import requests
from urllib.parse import quote
from lxml import etree
import os
from re import sub
base_url = 'https://9y03.xyz'
kw = input('请输入要下载的查询词:')
def get_page(url):
headers = {
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36',
'cookie': 'SL_G_WPT_TO=zh; SL_GWPT_Show_Hide_tmp=1; SL_wptGlobTipTmp=1',
'referer': 'https://9y03.xyz/',
}
try:
with requests.get(url, headers=headers, timeout=(20,50)) as response:
if response.status_code == 200:
return response.text
else:
return 'code error!'
except ConnectionError as e:
return
def parse(html):
"""解析网页获取漫画的完整下载链接和章节标题"""
dom_tree = etree.HTML(html)
r_url = dom_tree.xpath('//p[@class="comic__title"]/a/@href')[0]
# name = dom_tree.xpath('//p[@class="comic__title"]/a/text()')[0]
detail_url = base_url + r_url
# print(detail_url) # out: https://9y03.xyz/comic/mimijiaohua
detail_page = get_page(detail_url)
detail_page_tree = etree.HTML(detail_page)
lis = detail_page_tree.xpath('//div[@class="chapter__list clearfix"]/ul/li')
for li in lis:
chapter_link = li.xpath('./a/@href')[0]
chapter_name = li.xpath('./a/text()')[0].strip()
# print(chapter_name, chapter_link)
mango_url = base_url + chapter_link # out: https://9y03.xyz/chapter/3683 是第一章的链接
yield chapter_name, mango_url
def get_chapter(title, pics_url):
"""
获取章节下的所有图片url
:return img_url, img_name --> generator
"""
pic_page_html = get_page(pics_url)
pic_page_tree = etree.HTML(pic_page_html)
divs = pic_page_tree.xpath('/html/body/div[2]/div[5]')
for div in divs:
img_url = div.xpath('./div/img/@data-original')
img_name = div.xpath('./div/img/@alt')
# print(img_name,img_url) # out: list
yield list(zip(img_url, img_name)) # 不好处理,只能给他们打包在一起了
async def download(url, name, folder_path):
"""
保存每一章的所有图片
title: 章节名
url_data: 章节中的每一张图片地址 --> 生成器对象
"""
async with aiohttp.ClientSession() as session:
async with session.get(url)as response:
img_content = await response.content.read()
# await asyncio.sleep(0)
file_path = os.path.join(folder_path, name)
async with aiofiles.open(file_path, mode='wb')as f:
await f.write(img_content)
async def main():
url = f'https://9y03.xyz/index.php/search?key={quote(kw)}'
html = get_page(url)
chapter_data = parse(html)
for chapter_title, chapter_url in chapter_data:
folder_path = f'{kw}\\'
title = sub(r'[\\/:\*\?"<>\|]', '', chapter_title) # 去除文件名可能出现的非法字符
folder_path = os.path.join(folder_path, title)
os.path.exists(folder_path) or os.makedirs(folder_path)
img_data = get_chapter(chapter_title, chapter_url) # 所有图片链接和名称
tasks = []
print(f'正在下载>>> {title} <<<')
for data in img_data:
for pic_url, pic_name in data:
tasks.append(asyncio.ensure_future(download(pic_url, pic_name, folder_path)))
await asyncio.wait(tasks)
if __name__ == '__main__':
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
还得是异步的程序,速度是肉眼可见的快,普通下载可以弃了。
有很多人问怎么输入就一闪而退,那时因为这个网站国内直接访问是访问不到的,是需要梯子的。
(30天有效)打包的exe文件下载:91hanman.exe 提取码:i5sx