我之前的帖子发过批量下载公众号文章和数据(阅读数点赞数在看数留言数),今天分享批量下载公众号文章导出pdf, 合并成一个带书签的pdf文件,代码如下:
[Asm] 纯文本查看 复制代码 from PyPDF2 import PdfFileReader, PdfFileWriter,PdfFileMerger
file_writer = PdfFileWriter()
merger = PdfFileMerger()
num = 0
for root, dirs, files in os.walk('.'):
for name in files:
if name.endswith(".pdf"):
print(name)
file_reader = PdfFileReader(f"{name}")
file_writer.addBookmark(html.unescape(name).replace('.pdf',''), num, parent=None)
for page in range(file_reader.getNumPages()):
num += 1
file_writer.addPage(file_reader.getPage(page))
with open(r"公众号文章合集.pdf",'wb') as f:
file_writer.write(f)
效果如图,以莫言的公众号为例,点击左侧书签跳转到对应文章:
当然也可以将pdf的书签导出到excel,代码:
[Asm] 纯文本查看 复制代码 def bookmark_export(lines):
bookmark = ''
for line in lines:
if isinstance(line, dict):
bookmark += line['/Title'] + ','+str(line['/Page']+1)+'\n'
else:
bookmark_export(line)
return bookmark
with open('公众号文章合集.pdf', 'rb') as f:
lines = PdfFileReader(f).getOutlines()
bookmark = bookmark_export(lines)
with open('公众号文章合集.csv', 'a+', encoding='utf-8-sig') as f:
f.write(bookmark)
效果如图:
|