【原创力文档下载工具】利用python下载原创力文档

zz443470785 发表于 2023-10-16 12:30

本帖最后由 zz443470785 于 2023-10-16 12:31 编辑

## 说明
>授人以鱼不如授人以渔，经常看见论坛有人发贴求原创力文档下载，正好最近在学python，就用python写了个原创力文档下载工具。
- 本代码为本人原创，转载请注明出处。
- 仅限论坛学习交流使用，请勿滥用，谢谢！
- 代码并未做全面测试，有问题请在评论区留言。

## 源代码如下：
```python
"""
-*- coding: utf-8 -*-
文件名:原创力文档下载.py
作者：zhaozhao
环境: PyCharm
功能：原创力文档下载(仅支持可免费预览的部分)
"""
import os
import re
import requests
import time
from PIL import Image
from tqdm import tqdm

def get_html(url):
html = requests.get(url)
html.encoding = 'utf-8'
return html.text

def get_params(url):
html = get_html(url)
aid = re.findall(pattern='aid: (.*?),', string=html, flags=re.S)
pages = re.findall(pattern='preview_page: (.*?),', string=html, flags=re.S)
view_token = re.findall(pattern="view_token: '(.*?)' //预览的token", string=html, flags=re.S)
params = []
for page in range(1, int(pages) + 1, 6):
   param = {
         'project_id': '1',
         'aid': aid,
         'view_token': view_token,
         'page': page}
   params.append(param)
return params

def img_to_pdf(folder_path, pdf_file_path):
files = os.listdir(folder_path)
png_files = []
sources = []
for file in files:
   if "png" in file or "jpg" in file:
         png_files.append(folder_path + file)
try:
   png_files.sort(key=lambda x: int(str(re.findall("\d+", x))))
except IndexError:
   files.sort()
output = Image.open(png_files)
png_files.pop(0)
for file in png_files:
   png_file = Image.open(file)
   sources.append(png_file)
output.save(pdf_file_path, "pdf", save_all=True, append_images=sources)

def main():
url = input("请输入文档链接：")
path = input("请输入保存路径：")
title = re.findall(pattern="title: '(.*?)', //文档标题", string=get_html(url), flags=re.S)
img_path = path +'\\'+ title.split('.')
for param in tqdm(get_params(url), desc="下载进度", unit="epoch", colour='green', ncols=100):
   headers = {'Accept': '*/*',
               'Accept-Encoding': 'gzip, deflate, br',
               'Accept-Language': 'zh-CN,zh;q=0.9,en-US;q=0.8,en;q=0.7',
               'Connection': 'keep-alive',
               'DNT': '1',
               'Host': 'openapi.book118.com',
               'Referer': 'https://max.book118.com/',
               'sec-ch-ua': '"Chromium";v="104", " Not A;Brand";v="99", "Microsoft Edge";v="104"',
               'sec-ch-ua-platform': '"Windows"',
               'Sec-Fetch-Dest': 'script',
               'Sec-Fetch-Mode': 'no-cors',
               'Sec-Fetch-Site': 'same-site',
               'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.81 Safari/537.36 Edg/104.0.1293.54'}
   html = requests.get(url='https://openapi.book118.com/getPreview.html', headers=headers, params=param)
   html.encoding = 'utf-8'
   res = re.findall(pattern=r'"data":(.*?),"pages"', string=html.text, flags=re.S)
   res = eval(res.replace('\\', ''))# 将字符串转换为字典
   for k, v in res.items():
         img = requests.get('https:' + v).content
         if not os.path.exists(img_path):
            os.mkdir(img_path)
         with open(img_path+'\\'+k +'.png', 'wb') as f:
            f.write(img)
         # print("第 {} 页下载成功".format(k))
   time.sleep(3)
img_to_pdf(img_path+'\\', img_path+'\\'+title.split('.')+'.pdf')
print("文档下载成功！")

if __name__ == '__main__':
main()

```
## 效果

zz443470785 发表于 2024-8-11 08:23

之前的链接失效了，更新一下。
https://www.123pan.com/s/SFzojv-M179A.html?提取码:52pj

zz443470785 发表于 2024-4-10 22:43

考虑到有些朋友没有python环境，所以我把源码打包了exe程序，下载地址：https://www.123pan.com/s/SFzojv-eVS9A.html提取码:Affm

zz443470785 发表于 2024-1-15 22:28

破解人生发表于 2024-1-15 13:30
请输入文档链接：https://max.book118.com/html/2023/0523/5142322142010212.shtm
请输入保存路径：E:\pyt ...

没有安装 requests库

随风万里 发表于 2023-10-17 17:14

大佬，可以把工具发出来吗？

zz443470785 发表于 2024-6-14 16:19

铭焱发表于 2024-6-14 14:40
请问大佬能不能具体写一个教程

很简单，代码你运行有提示的

DZDJ 发表于 2023-10-19 14:38

准备先学一下Python{:1_927:}

zz443470785 发表于 2023-10-17 21:26

随风万里发表于 2023-10-17 17:14
大佬，可以把工具发出来吗？

上面的源代码就是啊，在python环境里面运行就行

z376409017 发表于 2023-10-17 22:30

运行不成功，还是直接做成工具吧感谢

pastorcd 发表于 2023-10-17 22:32

谢谢楼主分享

ql0w0lp 发表于 2023-10-18 00:18

zz443470785 发表于 2023-10-17 21:26
上面的源代码就是啊，在python环境里面运行就行

不懂大佬可以讲讲咋用吗？没用过Python。

chenbaker 发表于 2023-10-18 10:32

源码有效，非常感谢分享

zz443470785 发表于 2023-10-18 21:04

ql0w0lp 发表于 2023-10-18 00:18
不懂大佬可以讲讲咋用吗？没用过Python。

可以百度一下安装pycharm，很简单，b站上也有很多教学视频

lingwushexi 发表于 2023-10-19 16:13

谢谢楼主分享

页: [1] 2 3 4 5 6 7 8 9 10

吾爱破解 - 52pojie.cn's Archiver

【原创力文档下载工具】利用python下载原创力文档