python下载番茄小说

xuyanhenry · 发表于 2024-7-28 19:38

本帖最后由 xuyanhenry 于 2024-9-16 19:16 编辑

新人第一次发帖，如有不对欢迎指出，大佬勿喷
由于番茄小说本来就是免费的所以拿来练手
需要输入番茄网页端对应网址后面一串的id。例：https://fanqienovel.com/page/6982529841564224526后面的6982529841564224526来进行下载。
可能因为过于频繁的访问导致短时间的访问的报错

引用:
1.由于番茄对于网页版的限制导致后面大部分章节无法全部观看因此参考油猴插件

https://greasyfork.org/zh-CN/scripts/486817-%E7%95%AA%E8%8C%84%E5%B0%8F%E8%AF%B4%E7%BD%91%E9%A1%B5%E7%89%88-%E5%85%8D%E8%B4%B9%E9%98%85%E8%AF%BB-%E6%94%AF%E6%8C%81%E8%A7%A3%E9%94%81vip

并对代码进行大部分的引用
2.由于番茄字体有加密，参考

https://blog.csdn.net/SM_zeng/article/details/140564469

此文章对字体进行解析，没有仔细检查可能有部分出错

源码：

<blockquote>import requests
import json
import re
from parsel import Selector
from bs4 import BeautifulSoup
from concurrent.futures import ThreadPoolExecutor, as_completed
from lxml import etree
from tqdm import tqdm
import time
dic_data = {
   "58344": "d", "58345": "在", "58346": "干", "58347": "特", "58348": "家", "58349": "军", "58350": "然",
   "58351": "表", "58352": "场", "58353": "4", "58354": "要", "58355": "只", "58356": "v", "58357": "和",
   "58359": "6", "58360": "别", "58361": "还", "58362": "g", "58363": "现", "58364": "儿", "58365": "岁",
   "58368": "此", "58369": "象", "58370": "月", "58371": "3", "58372": "出", "58373": "战", "58374": "工",
   "58375": "相", "58376": "。", "58377": "男", "58378": "直", "58379": "失", "58380": "世", "58381": "F",
   "58382": "都", "58383": "平", "58384": "文", "58385": "什", "58386": "V", "58387": "o", "58388": "将",
   "58389": "真", "58390": "工", "58391": "那", "58392": "当", "58394": "会", "58395": "立", "58396": "此",
   "58397": "山", "58398": "是", "58399": "十", "58400": "张", "58401": "学", "58402": "气", "58403": "大",
   "58404": "爱", "58405": "两", "58406": "命", "58407": "全", "58408": "后", "58409": "东", "58410": "性",
   "58411": "通", "58412": "被", "58413": "1", "58414": "它", "58415": "乐", "58416": "接", "58417": "而",
   "58418": "感", "58419": "车", "58420": "山", "58421": "公", "58422": "了", "58423": "常", "58424": "以",
   "58425": "何", "58426": "可", "58427": "话", "58428": "先", "58429": "p", "58430": "j", "58431": "叫",
   "58432": "轻", "58433": "m", "58434": "十", "58435": "以", "58436": "着", "58437": "变", "58438": "尔",
   "58439": "快", "58440": "上", "58441": "个", "58442": "说", "58443": "小", "58444": "色", "58445": "里",
   "58446": "安", "58447": "花", "58448": "远", "58449": "7", "58450": "难", "58451": "师", "58452": "放",
   "58453": "十", "58454": "报", "58455": "认", "58456": "亩", "58457": "道", "58458": "s", "58460": "克",
   "58461": "地", "58462": "度", "58463": "上", "58464": "好", "58465": "机", "58466": "u", "58467": "民",
   "58468": "写", "58469": "把", "58470": "万", "58471": "同", "58472": "水", "58473": "新", "58474": "没",
   "58475": "书", "58476": "申", "58477": "吃", "58478": "像", "58479": "斯", "58480": "5", "58481": "为",
   "58482": "v", "58483": "白", "58484": "几", "58485": "日", "58486": "教", "58487": "看", "58488": "但",
   "58489": "第", "58490": "加", "58491": "候", "58492": "作", "58493": "上", "58494": "拉", "58495": "住",
   "58496": "有", "58497": "法", "58498": "r", "58499": "事", "58500": "应", "58501": "位", "58502": "利",
   "58503": "你", "58504": "声", "58505": "身", "58506": "国", "58507": "问", "58508": "马", "58509": "女",
   "58510": "他", "58511": "y", "58512": "比", "58513": "父", "58514": "", "58515": "a", "58516": "h",
   "58517": "n", "58518": "c", "58519": "x", "58520": "边", "58521": "美", "58522": "对", "58523": "所",
   "58524": "金", "58525": "活", "58526": "回", "58527": "意", "58528": "到", "58529": "之", "58530": "从",
   "58531": "j", "58532": "知", "58533": "又", "58534": "内", "58535": "冈", "58536": "点", "58537": "o",
   "58538": "一", "58539": "定", "58540": "8", "58541": "r", "58542": "b", "58543": "正", "58544": "或",
   "58545": "夫", "58546": "向", "58547": "德", "58548": "听", "58549": "更", "58551": "得", "58552": "告",
   "58553": "并", "58554": "本", "58555": "q", "58556": "过", "58557": "记", "58558": "上", "58559": "让",
   "58560": "打", "58561": "f", "58562": "人", "58563": "就", "58564": "者", "58565": "去", "58566": "原",
   "58567": "满", "58568": "体", "58569": "做", "58570": "经", "58571": "k", "58572": "走", "58573": "如",
   "58574": "孩", "58575": "c", "58576": "g", "58577": "给", "58578": "使", "58579": "物", "58581": "最",
   "58582": "笑", "58583": "部", "58585": "员", "58586": "等", "58587": "受", "58588": "k", "58589": "行",
   "58590": "", "58591": "条", "58592": "果", "58593": "动", "58594": "光", "58595": "门", "58596": "头",
   "58597": "见", "58598": "往", "58599": "白", "58600": "解", "58601": "成", "58602": "外", "58603": "天",
   "58604": "能", "58605": "干", "58606": "名", "58607": "其", "58608": "发", "58609": "总", "58610": "母",
   "58611": "的", "58612": "死", "58613": "手", "58614": "入", "58615": "路", "58616": "进", "58617": "心",
   "58618": "来", "58619": "h", "58620": "时", "58621": "力", "58622": "多", "58623": "开", "58624": "已",
   "58625": "许", "58626": "d", "58627": "至", "58628": "由", "58629": "很", "58630": "界", "58631": "n",
   "58632": "小", "58633": "与", "58634": "之", "58635": "想", "58636": "代", "58637": "么", "58638": "分",
   "58639": "牛", "58640": "口", "58641": "再", "58642": "妈", "58643": "望", "58644": "次", "58645": "西",
   "58646": "风", "58647": "种", "58648": "带", "58649": "J", "58651": "实", "58652": "情", "58653": "才",
   "58654": "这", "58656": "F", "58657": "我", "58658": "神", "58659": "格", "58660": "长", "58661": "觉",
   "58662": "间", "58663": "年", "58664": "眼", "58665": "无", "58666": "不", "58667": "亲", "58668": "关",
   "58669": "结", "58670": "0", "58671": "友", "58672": "信", "58673": "下", "58674": "却", "58675": "重",
   "58676": "已", "58677": "老", "58678": "2", "58679": "音", "58680": "字", "58681": "m", "58682": "呢",
   "58683": "明", "58684": "之", "58685": "前", "58686": "高", "58687": "p", "58688": "b", "58689": "目",
   "58690": "太", "58691": "。", "58692": "9", "58693": "起", "58694": "穆", "58695": "她", "58696": "也",
   "58697": "w", "58698": "用", "58699": "方", "58700": "子", "58701": "英", "58702": "每", "58703": "理",
   "58704": "便", "58705": "四", "58706": "数", "58707": "期", "58708": "中", "58709": "c", "58710": "外",
   "58711": "样", "58712": "a", "58713": "海", "58714": "们", "58715": "任"
}


def get_text_from_xpath(book_id, css):
   url = f"https://fanqienovel.com/page/{book_id}"
   response = requests.get(url)
   if response.status_code == 200:
       selector = Selector(response.text)
       elements = selector.css(css).get()
       if elements:
           return elements
   return None

def get_content_from_fanqienovel(item_id, retries=5):
   url = f"https://fanqienovel.com/api/reader/full?itemId={item_id}"
   headers = {
       "Content-Type": "application/json",
       "Accept": "application/json, text/plain, */*",
       "Cookie": "novel_web_id=7357767624615331362;"
   }
   for attempt in range(retries):
       try:
           response = requests.get(url, headers=headers)
           if response.status_code == 200:
               return response.json().get('data', {}).get('chapterData', {}).get('content', '无法解析')
       except requests.exceptions.RequestException as e:
           print(f"Attempt {attempt + 1} failed: {e}")
           time.sleep(2)  # 等待2秒后重试
   return "无法解析"

def get_content_from_dushuge(book_name, chapter_name):
   search_url = f"http://www.dushuge.com/hsdgiohsdigohsog.php?ie=gbk&q={book_name}"
   response = requests.get(search_url)
   if response.status_code != 200:
       return "无法解析"
   
   soup = BeautifulSoup(response.content, 'html.parser')
   book_href = ""
   for a in soup.find_all('a', href=True):
       if book_name in a.text:
           book_href = a['href']
           break

   if not book_href:
       return "无法解析"
   
   book_url = f"http://www.dushuge.com{book_href}"
   response = requests.get(book_url)
   if response.status_code != 200:
       return "无法解析"
   
   soup = BeautifulSoup(response.content, 'html.parser')
   chapter_href = ""
   for dd in soup.find_all('dd'):
       a = dd.find('a')
       if a and chapter_name in a.text:
           chapter_href = a['href']
           break
   
   if not chapter_href:
       return "无法解析"
   
   chapter_url = f"http://www.dushuge.com{chapter_href}"
   response = requests.get(chapter_url)
   if response.status_code != 200:
       return "无法解析"

   soup = BeautifulSoup(response.content, 'html.parser')
   content_div = soup.find('div', id='content')
   if content_div:
       return content_div.get_text(separator="\n").replace(" ", "").strip()
   
   return "无法解析"

def remove_p_tags(content):
   content = re.sub(r'<p>', '', content)
   content = re.sub(r'</p>', '\n\n', content)
   return content

def fetch_content(item):
   item_id = item['ID']
   chapter_name = item['title']
   content = get_content_from_fanqienovel(item_id)
   if content == "无法解析":
       content = get_content_from_dushuge(book_name, chapter_name)
   
   if content != "无法解析":
       content = remove_p_tags(content)
       processed_content = f"{chapter_name}\n\n"
       for index in content:
           try:
               word = dic_data[str(ord(index))]
               processed_content += word
           except:
               processed_content += index
       processed_content += '\n\n'
       return item, processed_content
   else:
       return item, f"Failed to fetch content for {chapter_name}.\n"

def download_novels(item_id_list, book_name, output_file):
   total_chapters = len(item_id_list)
   with open(output_file, 'w', encoding='utf-8') as file, tqdm(total=total_chapters, desc="Progress", unit="chapter") as pbar:
       all_contents = []
       with ThreadPoolExecutor(max_workers=5) as executor:                             #可以更改线程数
           futures = {executor.submit(fetch_content, item): item for item in item_id_list}
           for future in as_completed(futures):
               item, chapter_content = future.result()
               index = item_id_list.index(item)
               all_contents.append((index, chapter_content))
               pbar.update(1)
               
               if len(all_contents) >= 20:                                        # 每20章保存一次
                   all_contents.sort(key=lambda x: x[0])
                   for _, content in all_contents:
                       file.write(content)
                   all_contents.clear()

       # 保存剩余内容
       if all_contents:
           all_contents.sort(key=lambda x: x[0])
           for _, content in all_contents:
               file.write(content)

def get_chapter_list(book_id):
   url = f"https://fanqienovel.com/api/reader/directory/detail?bookId={book_id}"
   response = requests.get(url)
   if response.status_code == 200:
       data = response.json()
       chapter_list_with_volume = data.get('data', {}).get('chapterListWithVolume', [])
       
       item_id_list = []
       for volume in chapter_list_with_volume:
           for chapter in volume:
               item_id = chapter.get('itemId')
               title = chapter.get('title')
               if item_id and title:
                   item_id_list.append({"ID": item_id, "title": title})
       
       return item_id_list
   else:
       print("Failed to fetch data")
       return []

# Main execution
while True:
   book_id = input("Please enter the book ID: ")
   css = '.page-header-info .info-name h1::text'
   item_id_list = get_chapter_list(book_id)
   text_content = get_text_from_xpath(book_id, css)
   if text_content:
       book_name = text_content.strip()
       print("book name:",book_name)
   else:
       book_name = "Unknown_Book"

   output_file = f"E:/爬虫/{book_name}.txt"                        #要改

   download_novels(item_id_list, book_name, output_file)
   print(f"All chapters saved to {output_file}")

</blockquote>

这代码的爬取部分由于是参考油猴脚本的按其逻辑不是从番茄的的数据库拿到的数据，还是通过别的网站的资源。

最近新看到大佬的程序

https://www.52pojie.cn/thread-1950372-1-1.html

发现cookie中改变novel_web_id来直接得到所有内容。估计番茄是通过判断novel_web_id来发送数据的，因此增加了部分代码通过爆破的方式来获得成功的id（这样就可以保证只要番茄这部分判断代码不变就能够成功获取资源）换掉了原来通过别的源的方式来获取资源

<blockquote>import requests
import json
import re
from parsel import Selector
from bs4 import BeautifulSoup
from concurrent.futures import ThreadPoolExecutor, as_completed
from lxml import etree
from tqdm import tqdm
import time


id = 7357767624615331361
dic_data = {
    "58344": "d", "58345": "在", "58346": "干", "58347": "特", "58348": "家", "58349": "军", "58350": "然",
    "58351": "表", "58352": "场", "58353": "4", "58354": "要", "58355": "只", "58356": "v", "58357": "和",
    "58359": "6", "58360": "别", "58361": "还", "58362": "g", "58363": "现", "58364": "儿", "58365": "岁",
    "58368": "此", "58369": "象", "58370": "月", "58371": "3", "58372": "出", "58373": "战", "58374": "工",
    "58375": "相", "58376": "。", "58377": "男", "58378": "直", "58379": "失", "58380": "世", "58381": "F",
    "58382": "都", "58383": "平", "58384": "文", "58385": "什", "58386": "V", "58387": "o", "58388": "将",
    "58389": "真", "58390": "工", "58391": "那", "58392": "当", "58394": "会", "58395": "立", "58396": "此",
    "58397": "山", "58398": "是", "58399": "十", "58400": "张", "58401": "学", "58402": "气", "58403": "大",
    "58404": "爱", "58405": "两", "58406": "命", "58407": "全", "58408": "后", "58409": "东", "58410": "性",
    "58411": "通", "58412": "被", "58413": "1", "58414": "它", "58415": "乐", "58416": "接", "58417": "而",
    "58418": "感", "58419": "车", "58420": "山", "58421": "公", "58422": "了", "58423": "常", "58424": "以",
    "58425": "何", "58426": "可", "58427": "话", "58428": "先", "58429": "p", "58430": "j", "58431": "叫",
    "58432": "轻", "58433": "m", "58434": "十", "58435": "以", "58436": "着", "58437": "变", "58438": "尔",
    "58439": "快", "58440": "上", "58441": "个", "58442": "说", "58443": "小", "58444": "色", "58445": "里",
    "58446": "安", "58447": "花", "58448": "远", "58449": "7", "58450": "难", "58451": "师", "58452": "放",
    "58453": "十", "58454": "报", "58455": "认", "58456": "亩", "58457": "道", "58458": "s", "58460": "克",
    "58461": "地", "58462": "度", "58463": "上", "58464": "好", "58465": "机", "58466": "u", "58467": "民",
    "58468": "写", "58469": "把", "58470": "万", "58471": "同", "58472": "水", "58473": "新", "58474": "没",
    "58475": "书", "58476": "申", "58477": "吃", "58478": "像", "58479": "斯", "58480": "5", "58481": "为",
    "58482": "v", "58483": "白", "58484": "几", "58485": "日", "58486": "教", "58487": "看", "58488": "但",
    "58489": "第", "58490": "加", "58491": "候", "58492": "作", "58493": "上", "58494": "拉", "58495": "住",
    "58496": "有", "58497": "法", "58498": "r", "58499": "事", "58500": "应", "58501": "位", "58502": "利",
    "58503": "你", "58504": "声", "58505": "身", "58506": "国", "58507": "问", "58508": "马", "58509": "女",
    "58510": "他", "58511": "y", "58512": "比", "58513": "父", "58514": "", "58515": "a", "58516": "h",
    "58517": "n", "58518": "c", "58519": "x", "58520": "边", "58521": "美", "58522": "对", "58523": "所",
    "58524": "金", "58525": "活", "58526": "回", "58527": "意", "58528": "到", "58529": "之", "58530": "从",
    "58531": "j", "58532": "知", "58533": "又", "58534": "内", "58535": "冈", "58536": "点", "58537": "o",
    "58538": "一", "58539": "定", "58540": "8", "58541": "r", "58542": "b", "58543": "正", "58544": "或",
    "58545": "夫", "58546": "向", "58547": "德", "58548": "听", "58549": "更", "58551": "得", "58552": "告",
    "58553": "并", "58554": "本", "58555": "q", "58556": "过", "58557": "记", "58558": "上", "58559": "让",
    "58560": "打", "58561": "f", "58562": "人", "58563": "就", "58564": "者", "58565": "去", "58566": "原",
    "58567": "满", "58568": "体", "58569": "做", "58570": "经", "58571": "k", "58572": "走", "58573": "如",
    "58574": "孩", "58575": "c", "58576": "g", "58577": "给", "58578": "使", "58579": "物", "58581": "最",
    "58582": "笑", "58583": "部", "58585": "员", "58586": "等", "58587": "受", "58588": "k", "58589": "行",
    "58590": "", "58591": "条", "58592": "果", "58593": "动", "58594": "光", "58595": "门", "58596": "头",
    "58597": "见", "58598": "往", "58599": "白", "58600": "解", "58601": "成", "58602": "外", "58603": "天",
    "58604": "能", "58605": "干", "58606": "名", "58607": "其", "58608": "发", "58609": "总", "58610": "母",
    "58611": "的", "58612": "死", "58613": "手", "58614": "入", "58615": "路", "58616": "进", "58617": "心",
    "58618": "来", "58619": "h", "58620": "时", "58621": "力", "58622": "多", "58623": "开", "58624": "已",
    "58625": "许", "58626": "d", "58627": "至", "58628": "由", "58629": "很", "58630": "界", "58631": "n",
    "58632": "小", "58633": "与", "58634": "之", "58635": "想", "58636": "代", "58637": "么", "58638": "分",
    "58639": "牛", "58640": "口", "58641": "再", "58642": "妈", "58643": "望", "58644": "次", "58645": "西",
    "58646": "风", "58647": "种", "58648": "带", "58649": "J", "58651": "实", "58652": "情", "58653": "才",
    "58654": "这", "58656": "F", "58657": "我", "58658": "神", "58659": "格", "58660": "长", "58661": "觉",
    "58662": "间", "58663": "年", "58664": "眼", "58665": "无", "58666": "不", "58667": "亲", "58668": "关",
    "58669": "结", "58670": "0", "58671": "友", "58672": "信", "58673": "下", "58674": "却", "58675": "重",
    "58676": "已", "58677": "老", "58678": "2", "58679": "音", "58680": "字", "58681": "m", "58682": "呢",
    "58683": "明", "58684": "之", "58685": "前", "58686": "高", "58687": "p", "58688": "b", "58689": "目",
    "58690": "太", "58691": "。", "58692": "9", "58693": "起", "58694": "穆", "58695": "她", "58696": "也",
    "58697": "w", "58698": "用", "58699": "方", "58700": "子", "58701": "英", "58702": "每", "58703": "理",
    "58704": "便", "58705": "四", "58706": "数", "58707": "期", "58708": "中", "58709": "c", "58710": "外",
    "58711": "样", "58712": "a", "58713": "海", "58714": "们", "58715": "任"
}




def get_text_from_xpath(book_id, css):
    url = f"https://fanqienovel.com/page/{book_id}"
    response = requests.get(url)
    if response.status_code == 200:
        selector = Selector(response.text)
        elements = selector.css(css).get()         #书名
        if elements:
            print("elements",elements)
            return elements
    return None


def change_userid():
    global id
    testid = 7101675619141976584
    useid=id
    while True:
        test_headers = {
            'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36 QIHU 360SE',
            'cookie' : f'novel_web_id={useid}'
        }
        try:
            # 发送请求并处理数据
            res = requests.get(f'https://fanqienovel.com/api/reader/full?itemId={testid}', headers=test_headers)
            data = json.loads(res.text)['data']
            content = data['chapterData']['content']
            # print(content)
            if len(content) > 1000:
                print("成功的id", useid)
                id=useid
                return
        except Exception as e:
            print(f"请求失败，ID={useid}, 错误: {e}")
            time.sleep(1)  # 等待1秒再重试
            continue
        
        useid += 1  


def get_context_from_fanqie(item_id,retries=5):


    headers = {
        'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36 QIHU 360SE',
        'cookie':f'novel_web_id={id}'
    }
    try:
        # 发送请求并处理数据
        


        while True:
            response = requests.get(f'https://fanqienovel.com/api/reader/full?itemId={item_id}', headers=headers)
            data = json.loads(response.text)['data']
            if 'chapterData' in data:
                break
            elif retries>0:
                retries=retries-1
                time.sleep(1)
            elif retries<=0:
                break


        content = data['chapterData']['content']
        return content


    except Exception as e:
            return "无法解析"


def remove_p_tags(content):
    content = re.sub(r'<p>', '', content)
    content = re.sub(r'</p>', '\n\n', content)
    return content


def fetch_content(item):
    item_id = item['ID']
    chapter_name = item['title']
    content=get_context_from_fanqie(item_id)


    if content != "无法解析":
        content = remove_p_tags(content)
        processed_content = f"{chapter_name}\n\n"
        for index in content:
            try:
                word = dic_data[str(ord(index))]
                processed_content += word
            except:
                processed_content += index
        processed_content += '\n\n'
        return item, processed_content
    else:
        return item, f"Failed to fetch content for {chapter_name}.\n"


def download_novels(item_id_list, book_name, output_file):
    total_chapters = len(item_id_list)
    with open(output_file, 'w', encoding='utf-8') as file, tqdm(total=total_chapters, desc="Progress", unit="chapter") as pbar:
        all_contents = []
        with ThreadPoolExecutor(max_workers=5) as executor:                             #可以更改线程数
            futures = {executor.submit(fetch_content, item): item for item in item_id_list}
            for future in as_completed(futures):
                item, chapter_content = future.result()
                index = item_id_list.index(item)
                all_contents.append((index, chapter_content))
                pbar.update(1)
                
                if len(all_contents) >= 20:                                        # 每20章保存一次
                    all_contents.sort(key=lambda x: x[0])
                    for _, content in all_contents:
                        file.write(content)
                    all_contents.clear()


        # 保存剩余内容
        if all_contents:
            all_contents.sort(key=lambda x: x[0])
            for _, content in all_contents:
                file.write(content)


def get_chapter_list(book_id):
    url = f"https://fanqienovel.com/api/reader/directory/detail?bookId={book_id}"
    response = requests.get(url)
    if response.status_code == 200:
        data = response.json()
        chapter_list_with_volume = data.get('data', {}).get('chapterListWithVolume', [])
        
        item_id_list = []
        for volume in chapter_list_with_volume:
            for chapter in volume:
                item_id = chapter.get('itemId')
                title = chapter.get('title')
                if item_id and title:
                    item_id_list.append({"ID": item_id, "title": title})
        
        return item_id_list
    else:
        print("Failed to fetch data")
        return []
    


    








# Main execution
while True:
    change_userid()
    book_id = input("Please enter the book ID: ")
    css = '.page-header-info .info-name h1::text'
    item_id_list = get_chapter_list(book_id)
    text_content = get_text_from_xpath(book_id, css)
    if text_content:
        book_name = text_content.strip()
        print("book name:",book_name)
    else:
        book_name = "Unknown_Book"


    output_file = f"E:/爬虫/{book_name}.txt"                        #要改


    download_novels(item_id_list, book_name, output_file)
    print(f"All chapters saved to {output_file}")</blockquote>

注：次逻辑暂时已经不能用了，若要下载请看本人的新文章，暂还处于维护状态

rufan321 · 发表于 2024-7-28 20:18

很想下载7猫，番茄等小说，但win7不支持

xuyanhenry · 发表于 2024-8-6 11:49

sailor22 发表于 2024-8-4 11:03
现在各种加密乱序乱码

应该是部分书籍的网页代码形式稍微有些变化，把对标签的移除替换部分修改增加一下就好了

[Python] 纯文本查看 复制代码

def remove_p_tags(content):
    content=content.replace('\r', '')
    content = re.sub(r'^(.*?)<p', '<p', content, flags=re.DOTALL)
    content = re.sub(r'<p[^>]*>', '<p>', content)
    content = re.sub(r'<p>', '', content)
    content = re.sub(r'</p>', '\n\n', content)
    content = re.sub(r'<[^>]+>', '', content)
    return content

bucaiGitHub · 发表于 2024-7-28 20:25

代码的话可以使用【代码】功能，会更好一些

Arcticlyc · 发表于 2024-7-28 20:32

使用论坛插入代码功能显示代码吧

xuyanhenry · 发表于 2024-7-28 20:37

rufan321 发表于 2024-7-28 20:18
很想下载7猫，番茄等小说，但win7不支持

七猫有油猴插件专门下载的https://greasyfork.org/zh-CN/scripts/479460-%E4%B8%83%E7%8C%AB%E5%85%A8%E6%96%87%E5%9C%A8%E7%BA%BF%E5%85%8D%E8%B4%B9%E8%AF%BB

qianjue · 发表于 2024-7-28 22:18

最近在看土狗小说很有用，学习了。

amwquhwqas128 · 发表于 2024-7-28 23:01

楼主强，学习了你的代码让我受益很多

liyunchao · 发表于 2024-7-28 23:08

牛哇6666666

fuwo · 发表于 2024-7-29 00:09

很有用，学习了！谢谢分享！

O2H2O · 发表于 2024-7-29 11:23

俺需要静心消化一下，谢谢楼主分享！

帐号		自动登录	找回密码
密码			注册[Register]

[Python 原创] python下载番茄小说

免费评分

本帖被以下淘专辑推荐:

免费评分