百度阅读爬取（只能搞定到试读其他无能为力）

lihu5841314 发表于 2021-7-11 15:37

import requests,re,time
import json
from tqdm importtqdm
# u7f6a\u6076\u7a7f\u8d8a 字体加密罪恶穿越 \u7f6a\u6076\u7a7f\u8d8a

url1 ='https://yuedu.baidu.com/ebook/413d24361a37f111f1855be5?fr=booklist'#小说目录页
bok = url1.split("?").split("/")[-1]
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.101 Safari/537.36',
'Referer': 'https://yuedu.baidu.com/'
}
resp_m = requests.get(url1,headers=headers).text

ex=re.compile(r"'bdjsonUrl', 'https://wenku.baidu.com/content/(.*?)\?m=(?P<prm>.*?)'",re.S)
m = ex.search(resp_m).group('prm')
cn = re.search(r"bookInfo\['chapterCount'\] = parseInt\('(?P<cn>.*?)'\)",resp_m,re.S).group("cn")
cn1 = int(cn)
tim = str(int(time.time()))
fornintqdm(range(1,cn1+1)):
url = 'https://wenku.baidu.com/content/'+ bok# 章节请求页
#m的获取
params={
'm': m,
'type': 'json',
'cn': n, #cn 是章节想办法获取总的章节数
'_': 0,
't': tim,
'token': 'b732fd00f8f311d416b01cc0a0698cce',
}
print(n)
resp = requests.get(url,headers=headers,params=params)
text_resp = resp.text.encode('utf-8').decode("unicode_escape")
text_resp = json.loads(text_resp)['c']
book_charper=[]
forcharper intext_resp[:-1]:
      charper1 = charper['c']
      book_charper.append(charper1)
book_char = "\n".join(book_charper)
with open('1.txt','a',encoding="utf-8") asf:
      f.write(book_char)
      f.write('\n')
print("下载完成")

Ercilan 发表于 2021-7-11 19:00

什么字体加密

lihu5841314 发表于 2021-7-11 20:01

Ercilan 发表于 2021-7-11 19:00
什么字体加密

最简单的就是转码

CCQc 发表于 2021-7-11 21:38

52爬虫!感谢分享学习，论坛有你更精彩

Ercilan 发表于 2021-7-11 23:02

lihu5841314 发表于 2021-7-11 20:01
最简单的就是转码

其实那不算加密，就是一个编码而已

页: [1]

吾爱破解 - 52pojie.cn's Archiver

百度阅读爬取（只能搞定到试读 其他无能为力）

百度阅读爬取（只能搞定到试读其他无能为力）