期刊网杂志解析生成PDF（完整解析流程+源码）

逗啊逗 · 发表于 2025-2-16 22:29

本帖最后由逗啊逗于 2025-2-16 22:47 编辑

需要成品的直接看代码和演示成果即可。

前言

找杂志的过程中发现的网站，需求就是将杂志下载下来存储为pdf，因为需要下载多本杂志，就需要写个程序来做这件事情，最近在研究AI辅助编程，就尝试用使用DeepSeek辅助编写python代码来实现该功能，本人情况属于有代码基础，但是Python只有自学一两天的水平，所以整个流程有一定借鉴意义。

解析&流程

流程其实比编程更为重要，因为代码部分AI能帮你实现大部分内容，而流程更多的需要自己来思考。

分析网站

# 杂志Url
http://www.qikan.com.cn/original/2AB3ECB3-1F0E-4A06-AF1F-F08670F42B80/2025/03.html#book7/page1

url携带了很多信息，先简单记一下。杂志阅读的过程中，发现只能预览五六页，点击可以可以看高清图。打开开发者工具看一下Network

这里可以看到调用了接口，请求得到20个切片图片，这20个图片拼接成了高清大图

# 获取切片图接口Url
http://www.qikan.com.cn/FReader/h5/handle/originalapi.ashx?year=2025&issue=03&codename=nafc&page=5&types=getbigimages&_=1739702586107

分析一下接口实际上参数year，issue，codename，page，types，_ ，其实很清晰：year（年份），issue（期数），codename（期刊名），page（页码），types（固定为getbigimages，意思也能看明白：获取大图），后面再加个时间戳。我们需要的其实就是看一下代码或者其他接口里是否有相关参数信息。

打开网页源码，可以清晰的看到需要的参数都在，这一步页很顺利的完成了，那么就剩下一个问题就是之前说的只能预览五六页，正常的思路是注册账号购买模拟登录进行下载，但是非常幸运的发现获取切片图接口没有做任何限制，这里表示非常震惊~

分析过程结束，到此我们梳理一下编码流程：

编码流程

获取杂志url
正则匹配得到获取切片图接口Url的参数：year，issue，codename，page（pagecount页码总数）
逐页下载切片图片
将切片图片拼接成完整大图
将完整大图合并为pdf
删除下载的图片（可保留）

源码

其实流程足够清晰且可行的话，这种程度的代码就完全可以交给AI完成了，剩下的就是调试修改了。以下是实际代码：

import os
import re
import requests
from PIL import Image
from fpdf import FPDF
from io import BytesIO
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.util.retry import Retry

class Qikan:

    default_img_start = [
        [[0,0],[468,0],[936,0],[1404,0]],
        [[0,468],[468,468],[936,468],[1404,468]],
        [[0,936],[468,936],[936,936],[1404,936]],
        [[0,1404],[468,1404],[936,1404],[1404,1404]],
        [[0,1872],[468,1872],[936,1872],[1404,1872]],
    ]

    default_img_total_size = {'width':1597, 'height':2255}

    web_headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36 Edg/131.0.0.0'
    }

    def __init__(self):
        self.session = requests.Session()
        retries = Retry(total=3, backoff_factor=1, status_forcelist=[500, 502, 503, 504])
        self.session.mount('http://', HTTPAdapter(max_retries=retries))
        self.session.mount('https://', HTTPAdapter(max_retries=retries))

    def http_get(self, url, headers=None):
        try:
            response = self.session.get(url, headers=headers or self.web_headers, timeout=10)
            response.raise_for_status()
            return response.content
        except requests.exceptions.RequestException as e:
            print(f"HTTP GET error: {e}")
            return None

    def download_image(self, url, destination):
        content = self.http_get(url)
        if content:
            os.makedirs(os.path.dirname(destination), exist_ok=True)
            with open(destination, 'wb') as f:
                f.write(content)
            return destination
        return None

    def get_magazine_issue(self, url):
        content = self.http_get(url)
        if not content:
            return None

        variables_to_match = ['guid', 'year', 'issue', 'codename', 'pagecount']
        result = {}

        # 提取JavaScript变量
        pattern = re.compile(r'var\s+([^=]+)\s*=\s*"([^"]+)"\s*;')
        matches = pattern.findall(content.decode('utf-8'))
        for var_name, var_value in matches:
            var_name = var_name.strip()
            if var_name in variables_to_match:
                result[var_name] = var_value.strip()

        # 提取标题
        title_match = re.search(r'<p class="maga-tc-title">(.*?)</p>', content.decode('utf-8'))
        if title_match:
            result['title'] = title_match.group(1)

        return result

    def download_image_list(self, info):
        path_temp = f"./{info['year']}_{info['issue']}_{info['codename']}"
        os.makedirs(path_temp, exist_ok=True)

        for page in range(1, int(info['pagecount']) + 1):
            print(f"\r正在下载第 {page} 页  ", end="")
            url = f"http://www.qikan.com.cn/FReader/h5/handle/originalapi.ashx?year={info['year']}&issue={info['issue']}&codename={info['codename']}&page={page}&types=getbigimages"
            content = self.http_get(url)
            if content:
                img_list = content.decode('utf-8')
                # 这里需要根据实际返回格式解析图片列表
                # 假设返回的是JSON数组
                import json
                img_urls = json.loads(img_list)
                for img_url in img_urls:
                    filename = os.path.basename(img_url.split('?')[0])
                    self.download_image(img_url, os.path.join(path_temp, filename))

                self.splicing_img(path_temp, page)
                print(f"\r已下载完成 {page} 页  ", end="")

    def splicing_img(self, path, page):
        page_t = str(page).zfill(4)
        merged_image = Image.new('RGB', (self.default_img_total_size['width'], self.default_img_total_size['height']))

        for row in range(5):
            row_t = str(row).zfill(4)
            for col in range(4):
                col_t = str(col).zfill(4)
                img_path = os.path.join(path, f"{page_t}_{row_t}_{col_t}.jpg")
                try:
                    img = Image.open(img_path)
                    x, y = self.default_img_start[row][col]
                    width, height = img.size
                    merged_image.paste(img, (x, y, x + width, y + height))
                except Exception as e:
                    print(f"Error processing {img_path}: {e}")

        output_path = os.path.join(path, f"{page_t}.jpg")
        merged_image.save(output_path)

    def create_pdf(self, path, page_all, name=None):
        pdf = FPDF()
        pdf.set_auto_page_break(0)

        # 获取第一张图片的实际尺寸
        first_page = str(1).zfill(4)
        first_img_path = os.path.join(path, f"{first_page}.jpg")

        try:
            with Image.open(first_img_path) as img:
                original_width, original_height = img.size
        except Exception as e:
            print(f"❌ 读取首图尺寸失败: {str(e)}")
            return

        # 计算适合A4的缩放比例（宽度适配210mm）
        a4_width_mm = 210  # A4纸宽度
        mm_per_pixel = a4_width_mm / original_width  # 每像素对应的毫米数
        page_width = a4_width_mm
        page_height = original_height * mm_per_pixel  # 保持宽高比

        # 设置页面尺寸（关键修正点）✅
        page_size = (page_width, page_height)
        pdf = FPDF(unit="mm", format=page_size)

        for page in range(1, page_all + 1):
            page_t = str(page).zfill(4)
            img_path = os.path.join(path, f"{page_t}.jpg")

            # 兼容旧版 fpdf（< 1.7）
            pdf.add_page()  # 无需再传参数

            try:
                pdf.image(img_path, 0, 0, page_width, page_height)
            except Exception as e:
                print(f"添加图片失败 {img_path}: {str(e)}")
                continue

        output_path = f"./{name}.pdf" if name else f"{path}.pdf"
        pdf.output(output_path)
        print(f"\n文件被保存在：{output_path}")

    # 新增辅助方法获取图片物理尺寸（毫米）
    def _get_physical_size(self, img_path):
        try:
            with Image.open(img_path) as img:
                # 获取DPI信息（默认为72）
                dpi = img.info.get('dpi', (72, 72))  
                width_inch = img.width / dpi[0]
                height_inch = img.height / dpi[1]
                return (width_inch * 25.4, height_inch * 25.4)  # 英寸转毫米
        except Exception as e:
            print(f"⚠️ 获取物理尺寸失败：{str(e)}")
            return None
    def download_magazine(self, url):
        url = url.replace('/magdetails/', '/original/').replace('/m/', '/')
        info = self.get_magazine_issue(url)
        if not info:
            print("\n解析失败")
            return

        print(f"\n杂志名称：{info.get('title', '')}")
        print(f"页码总数：{info.get('pagecount', 0)}\n")

        self.download_image_list(info)
        self.create_pdf(
            path=f"./{info['year']}_{info['issue']}_{info['codename']}",
            page_all=int(info['pagecount']),
            name=info.get('title', None)
        )

if __name__ == "__main__":
    qikan = Qikan()
    url = input("请输入杂志url: ")
    qikan.download_magazine(url)

结果展示

优化

错误处理、日志
批量下载
多进程
图形界面

源码其实比较粗糙，但是可以解决实际问题，后续的话，可以考虑以上优化内容，不过作为解决原始需求的工具已经足够了。

结尾

希望以上思路或者工具能够给大家有一些借鉴意义，撒花~

吐槽

论坛MD的图片功能太不好用了，找个机会写个工具~

逗啊逗 · 发表于 2025-2-25 23:52

本帖最后由逗啊逗于 2025-2-25 23:58 编辑

BUG修复

BUG现象

经@B1GYang 反馈，过往期刊出现拼接异常的现象。

原因

测试发现旧杂志有的确实出现这个问题，原因是与最新杂志的分辨率不同，这里改成动态获取图片尺寸。

修正后源码

import os
import re
import requests
from PIL import Image
from fpdf import FPDF
from io import BytesIO
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.util.retry import Retry

class Qikan:

    web_headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36 Edg/131.0.0.0'
    }

    def __init__(self):
        self.session = requests.Session()
        retries = Retry(total=3, backoff_factor=1, status_forcelist=[500, 502, 503, 504])
        self.session.mount('http://', HTTPAdapter(max_retries=retries))
        self.session.mount('https://', HTTPAdapter(max_retries=retries))

    def http_get(self, url, headers=None):
        try:
            response = self.session.get(url, headers=headers or self.web_headers, timeout=10)
            response.raise_for_status()
            return response.content
        except requests.exceptions.RequestException as e:
            print(f"HTTP GET error: {e}")
            return None

    def download_image(self, url, destination):
        content = self.http_get(url)
        if content:
            os.makedirs(os.path.dirname(destination), exist_ok=True)
            with open(destination, 'wb') as f:
                f.write(content)
            return destination
        return None

    def get_magazine_issue(self, url):
        content = self.http_get(url)
        if not content:
            return None

        variables_to_match = ['guid', 'year', 'issue', 'codename', 'pagecount']
        result = {}

        # 提取JavaScript变量
        pattern = re.compile(r'var\s+([^=]+)\s*=\s*"([^"]+)"\s*;')
        matches = pattern.findall(content.decode('utf-8'))
        for var_name, var_value in matches:
            var_name = var_name.strip()
            if var_name in variables_to_match:
                result[var_name] = var_value.strip()

        # 提取标题
        title_match = re.search(r'<p class="maga-tc-title">(.*?)</p>', content.decode('utf-8'))
        if title_match:
            result['title'] = title_match.group(1)

        return result

    def download_image_list(self, info):
        path_temp = f"./{info['year']}_{info['issue']}_{info['codename']}"
        os.makedirs(path_temp, exist_ok=True)

        for page in range(1, int(info['pagecount']) + 1):
            print(f"\r正在下载第 {page} 页  ", end="")
            url = f"http://www.qikan.com.cn/FReader/h5/handle/originalapi.ashx?year={info['year']}&issue={info['issue']}&codename={info['codename']}&page={page}&types=getbigimages"
            content = self.http_get(url)
            if content:
                img_list = content.decode('utf-8')
                # 这里需要根据实际返回格式解析图片列表
                # 假设返回的是JSON数组
                import json
                img_urls = json.loads(img_list)
                for img_url in img_urls:
                    filename = os.path.basename(img_url.split('?')[0])
                    self.download_image(img_url, os.path.join(path_temp, filename))

                self.splicing_img(path_temp, page)
                print(f"\r已下载完成 {page} 页  ", end="")

    def get_image_paths(self, directory, page):
        """ 获取指定页的所有图片路径 """
        page_t = str(page).zfill(4)
        image_paths = []
        for file_name in os.listdir(directory):
            if file_name.startswith(f"{page_t}_") and file_name.endswith(".jpg"):
                image_paths.append(os.path.join(directory, file_name))
        image_paths.sort(key=lambda x: (self.extract_row_col(x)[0], self.extract_row_col(x)[1]))
        return image_paths

    def extract_row_col(self, file_path):
        """ 从文件名中提取行号和列号 """
        base_name = os.path.basename(file_path)
        parts = base_name.split('_')
        row = int(parts[1])
        col = int(parts[2].split('.')[0])
        return (row, col)

    def splicing_img(self, path, page):
        # 获取当前页面的所有图片路径
        image_paths = self.get_image_paths(path, page)
        if not image_paths:
            print("No images found for the specified page.")
            return

        # 按行分组图片
        rows = {}
        for img_path in image_paths:
            row, col = self.extract_row_col(img_path)
            if row not in rows:
                rows[row] = []
            rows[row].append(img_path)

        # 按行号排序行
        sorted_rows = sorted(rows.values(), key=lambda x: self.extract_row_col(x[0])[0])

        # 合并所有图片到画布
        current_y = 0
        merged_image = None
        total_width = 0
        total_height = 0

        for row_imgs in sorted_rows:
            row_width = 0
            row_height = 0
            current_x = 0
            row_imgs_info = []

            # 处理该行的图片
            for img_path in row_imgs:
                try:
                    img = Image.open(img_path)
                    img_width, img_height = img.size
                    row_imgs_info.append((img, img_width, img_height))
                    row_width += img_width
                    row_height = max(row_height, img_height)
                except Exception as e:
                    print(f"Error processing {img_path}: {e}")

            # 计算行的位置
            if merged_image is None:
                merged_image = Image.new('RGB', (row_width, row_height))
                total_width = row_width
                total_height = row_height
            else:
                new_total_width = max(total_width, row_width)
                new_total_height = current_y + row_height
                new_image = Image.new('RGB', (new_total_width, new_total_height))
                new_image.paste(merged_image, (0, 0))
                merged_image = new_image
                total_width = new_total_width
                total_height = new_total_height

            # 粘贴该行的图片到画布
            current_row_x = 0
            current_row_y = current_y
            for img_info in row_imgs_info:
                img, img_width, img_height = img_info
                merged_image.paste(img, (current_row_x, current_row_y, current_row_x + img_width, current_row_y + img_height))
                current_row_x += img_width

            current_y += row_height

        # 保存拼接后的图片
        output_path = os.path.join(path, f"{str(page).zfill(4)}.jpg")
        merged_image.crop((0, 0, total_width, total_height)).save(output_path)

    def create_pdf(self, path, page_all, name=None):
        pdf = FPDF()
        pdf.set_auto_page_break(0)

        # 获取第一张图片的实际尺寸
        first_page = str(1).zfill(4)
        first_img_path = os.path.join(path, f"{first_page}.jpg")

        try:
            with Image.open(first_img_path) as img:
                original_width, original_height = img.size
        except Exception as e:
            print(f"❌ 读取首图尺寸失败: {str(e)}")
            return

        # 计算适合A4的缩放比例（宽度适配210mm）
        a4_width_mm = 210  # A4纸宽度
        mm_per_pixel = a4_width_mm / original_width  # 每像素对应的毫米数
        page_width = a4_width_mm
        page_height = original_height * mm_per_pixel  # 保持宽高比

        # 设置页面尺寸（关键修正点）✅
        page_size = (page_width, page_height)
        pdf = FPDF(unit="mm", format=page_size)

        for page in range(1, page_all + 1):
            page_t = str(page).zfill(4)
            img_path = os.path.join(path, f"{page_t}.jpg")

            # 兼容旧版 fpdf（< 1.7）
            pdf.add_page()  # 无需再传参数

            try:
                pdf.image(img_path, 0, 0, page_width, page_height)
            except Exception as e:
                print(f"添加图片失败 {img_path}: {str(e)}")
                continue

        output_path = f"./{name}.pdf" if name else f"{path}.pdf"
        pdf.output(output_path)
        print(f"\n文件被保存在：{output_path}")

    # 新增辅助方法获取图片物理尺寸（毫米）
    def _get_physical_size(self, img_path):
        try:
            with Image.open(img_path) as img:
                # 获取DPI信息（默认为72）
                dpi = img.info.get('dpi', (72, 72))  
                width_inch = img.width / dpi[0]
                height_inch = img.height / dpi[1]
                return (width_inch * 25.4, height_inch * 25.4)  # 英寸转毫米
        except Exception as e:
            print(f"⚠️ 获取物理尺寸失败：{str(e)}")
            return None
    def download_magazine(self, url):
        url = url.replace('/magdetails/', '/original/').replace('/m/', '/')
        info = self.get_magazine_issue(url)
        if not info:
            print("\n解析失败")
            return

        print(f"\n杂志名称：{info.get('title', '')}")
        print(f"页码总数：{info.get('pagecount', 0)}\n")

        self.download_image_list(info)
        self.create_pdf(
            path=f"./{info['year']}_{info['issue']}_{info['codename']}",
            page_all=int(info['pagecount']),
            name=info.get('title', None)
        )

if __name__ == "__main__":
    qikan = Qikan()
    url = input("请输入杂志url: ")
    qikan.download_magazine(url)

结果展示：

coliuer · 发表于 2025-2-17 15:42

在大佬的基础上，我添加了一个自动删除图片的命令

[Python] 纯文本查看 复制代码

001

002

003

004

005

006

007

008

009

010

011

012

013

014

015

016

017

018

019

020

021

022

023

024

025

026

027

028

029

030

031

032

033

034

035

036

037

038

039

040

041

042

043

044

045

046

047

048

049

050

051

052

053

054

055

056

057

058

059

060

061

062

063

064

065

066

067

068

069

070

071

072

073

074

075

076

077

078

079

080

081

082

083

084

085

086

087

088

089

090

091

092

093

094

095

096

097

098

099

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

import os
import re
import requests
from PIL import Image
from fpdf import FPDF
from io import BytesIO
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.util.retry import Retry
import shutil
 
 
class Qikan:
 
    default_img_start = [
        [[0, 0], [468, 0], [936, 0], [1404, 0]],
        [[0, 468], [468, 468], [936, 468], [1404, 468]],
        [[0, 936], [468, 936], [936, 936], [1404, 936]],
        [[0, 1404], [468, 1404], [936, 1404], [1404, 1404]],
        [[0, 1872], [468, 1872], [936, 1872], [1404, 1872]],
    ]
 
    default_img_total_size = {'width': 1597, 'height': 2255}
 
    web_headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36 Edg/131.0.0.0'
    }
 
    def __init__(self):
        self.session = requests.Session()
        retries = Retry(total=3, backoff_factor=1,
                        status_forcelist=[500, 502, 503, 504])
        self.session.mount('http://', HTTPAdapter(max_retries=retries))
        self.session.mount('https://', HTTPAdapter(max_retries=retries))
 
    def http_get(self, url, headers=None):
        try:
            response = self.session.get(
                url, headers=headers or self.web_headers, timeout=10)
            response.raise_for_status()
            return response.content
        except requests.exceptions.RequestException as e:
            print(f"HTTP GET error: {e}")
            return None
 
    def download_image(self, url, destination):
        content = self.http_get(url)
        if content:
            os.makedirs(os.path.dirname(destination), exist_ok=True)
            with open(destination, 'wb') as f:
                f.write(content)
            return destination
        return None
 
    def get_magazine_issue(self, url):
        content = self.http_get(url)
        if not content:
            return None
 
        variables_to_match = ['guid', 'year', 'issue', 'codename', 'pagecount']
        result = {}
 
        # 提取JavaScript变量
        pattern = re.compile(r'var\s+([^=]+)\s*=\s*"([^"]+)"\s*;')
        matches = pattern.findall(content.decode('utf-8'))
        for var_name, var_value in matches:
            var_name = var_name.strip()
            if var_name in variables_to_match:
                result[var_name] = var_value.strip()
 
        # 提取标题
        title_match = re.search(
            r'<p class="maga-tc-title">(.*?)</p>', content.decode('utf-8'))
        if title_match:
            result['title'] = title_match.group(1)
 
        return result
 
    def download_image_list(self, info):
        path_temp = f"./{info['year']}_{info['issue']}_{info['codename']}"
        os.makedirs(path_temp, exist_ok=True)
 
        for page in range(1, int(info['pagecount']) + 1):
            print(f"\r正在下载第 {page} 页  ", end="")
            url = f"http://www.qikan.com.cn/FReader/h5/handle/originalapi.ashx?year={info['year']}&issue={info['issue']}&codename={info['codename']}&page={page}&types=getbigimages"
            content = self.http_get(url)
            if content:
                img_list = content.decode('utf-8')
                # 这里需要根据实际返回格式解析图片列表
                # 假设返回的是JSON数组
                import json
                img_urls = json.loads(img_list)
                for img_url in img_urls:
                    filename = os.path.basename(img_url.split('?')[0])
                    self.download_image(
                        img_url, os.path.join(path_temp, filename))
 
                self.splicing_img(path_temp, page)
                print(f"\r已下载完成 {page} 页  ", end="")
 
    def splicing_img(self, path, page):
        page_t = str(page).zfill(4)
        merged_image = Image.new(
            'RGB', (self.default_img_total_size['width'], self.default_img_total_size['height']))
 
        for row in range(5):
            row_t = str(row).zfill(4)
            for col in range(4):
                col_t = str(col).zfill(4)
                img_path = os.path.join(path, f"{page_t}_{row_t}_{col_t}.jpg")
                try:
                    img = Image.open(img_path)
                    x, y = self.default_img_start[row][col]
                    width, height = img.size
                    merged_image.paste(img, (x, y, x + width, y + height))
                except Exception as e:
                    print(f"Error processing {img_path}: {e}")
 
        output_path = os.path.join(path, f"{page_t}.jpg")
        merged_image.save(output_path)
 
    def create_pdf(self, path, page_all, name=None):
        pdf = FPDF()
        pdf.set_auto_page_break(0)
 
        # 获取第一张图片的实际尺寸
        first_page = str(1).zfill(4)
        first_img_path = os.path.join(path, f"{first_page}.jpg")
 
        try:
            with Image.open(first_img_path) as img:
                original_width, original_height = img.size
        except Exception as e:
            print(f"&#10060; 读取首图尺寸失败: {str(e)}")
            return
 
        # 计算适合A4的缩放比例（宽度适配210mm）
        a4_width_mm = 210  # A4纸宽度
        mm_per_pixel = a4_width_mm / original_width  # 每像素对应的毫米数
        page_width = a4_width_mm
        page_height = original_height * mm_per_pixel  # 保持宽高比
 
        # 设置页面尺寸（关键修正点）&#9989;
        page_size = (page_width, page_height)
        pdf = FPDF(unit="mm", format=page_size)
 
        for page in range(1, page_all + 1):
            page_t = str(page).zfill(4)
            img_path = os.path.join(path, f"{page_t}.jpg")
 
            # 兼容旧版 fpdf（< 1.7）
            pdf.add_page()  # 无需再传参数
 
            try:
                pdf.image(img_path, 0, 0, page_width, page_height)
            except Exception as e:
                print(f"添加图片失败 {img_path}: {str(e)}")
                continue
 
        output_path = f"./{name}.pdf" if name else f"{path}.pdf"
        pdf.output(output_path)
        print(f"\n文件被保存在：{output_path}")
 
    # 新增辅助方法获取图片物理尺寸（毫米）
    def _get_physical_size(self, img_path):
        try:
            with Image.open(img_path) as img:
                # 获取DPI信息（默认为72）
                dpi = img.info.get('dpi', (72, 72))
                width_inch = img.width / dpi[0]
                height_inch = img.height / dpi[1]
                return (width_inch * 25.4, height_inch * 25.4)  # 英寸转毫米
        except Exception as e:
            print(f"&#9888;&#65039; 获取物理尺寸失败：{str(e)}")
            return None
 
    def download_magazine(self, url):
        url = url.replace('/magdetails/', '/original/').replace('/m/', '/')
        info = self.get_magazine_issue(url)
        if not info:
            print("\n解析失败")
            return
 
        print(f"\n杂志名称：{info.get('title', '')}")
        print(f"页码总数：{info.get('pagecount', 0)}\n")
 
        self.download_image_list(info)
        self.create_pdf(
            path=f"./{info['year']}_{info['issue']}_{info['codename']}",
            page_all=int(info['pagecount']),
            name=info.get('title', None)
        )
        shutil.rmtree(f"./{info['year']}_{info['issue']}_{info['codename']}")
 
 
if __name__ == "__main__":
    qikan = Qikan()
    url = input("请输入杂志url: ")
    qikan.download_magazine(url)

效果是把缓存的图片文件删除了，运行完之后就是一个PDF文件
感谢楼主大神

MRXZ1994 · 发表于 2025-2-17 00:14

菜鸡弱弱求问，这串代码是怎么用的

Fredyjujun · 发表于 2025-3-4 22:53

楼主的工具真的很厉害，在另一个帖子已经见识过了。但是奈何我自己是python小白，不知楼主能否出个小白成品版的吗？一直玩不转python

wyza121 · 发表于 2025-3-18 21:08

有人帮忙转成EXE的吗，万分感谢

shui22 · 发表于 2025-2-16 22:57

感谢分享

雾都孤尔 · 发表于 2025-2-16 23:06

学习了，这是个好办法。感谢分享。

hbu126 · 发表于 2025-2-16 23:55

这个方法不错，针对特定网做一些参数校对，谢谢分享

zjtzjt · 发表于 2025-2-17 07:00

感谢分享，杂志偶尔会看下

shitoujiandaobu · 发表于 2025-2-17 07:12

学习了，思路挺好

sg6465038 · 发表于 2025-2-17 07:39

MRXZ1994 发表于 2025-2-17 00:14
菜鸡弱弱求问，这串代码是怎么用的

用pycharm运行啊

Owenpojie · 发表于 2025-2-17 08:03

这个写得比较详细，谢谢分享

wangdanq · 发表于 2025-2-17 08:42

谢谢楼主分享学习一下

帐号		自动登录	找回密码
密码			注册[Register]

[Python 原创] 期刊网杂志解析生成PDF（完整解析流程+源码）

前言

解析&流程

分析网站

编码流程

源码

结果展示

优化

结尾

吐槽

免费评分

本帖被以下淘专辑推荐:

BUG修复

BUG现象

原因

修正后源码

结果展示：

免费评分