入门python爬取精彩阅读网的小说 - 1 - 吾爱破解 - 52pojie.cn

HaNnnng 发表于 2018-11-11 23:27

入门python爬取精彩阅读网的小说 --- 1

本帖最后由 HaNnnng 于 2018-11-14 00:51 编辑

第一次发帖，记录一下自己学习爬虫的过程。

很简单的一个例子，爬取精彩阅读网的小说，如果爬取指定小说则需要手动更改第九行的URL。

这是面向过程的爬虫，明天改一下写个面向过程的

# _*_ coding: utf_8 _*_
__author__ = 'lwh'
__date__ = '2018/11/10 15:12'

import requests
import re

# 获取网页信息
url = 'http://www.jingcaiyuedu.com/book/317834.html'
response = requests.get(url)
response.encoding = 'utf-8'
html = response.text
# 获取小说的名称
title = re.findall(r'<meta property="og:novel:book_name" content="(.*?)"/>', html)

# 获取小说的章节数据，章节名称跟url
dl = re.findall(r'<dl class="panel-body panel-chapterlist"><dd class="col-md-3">.*?</dl>', html, re.S)
chapter_info_list = re.findall(r'href="(.*?)">(.*?)<', dl)

# 写文件
f = open('%s.txt' % title, "w", encoding='utf-8')
# 循环下载每一个章节
for chapter_url, chapter_title in chapter_info_list:
chapter_url = 'http://www.jingcaiyuedu.com%s' % chapter_url
response = requests.get(chapter_url)
response.encoding = 'utf-8'
html = response.text
# 提取章节内容
chapter_content = re.findall(r' <div class="panel-body" id="htmlContent">(.*?)</div> ', html, re.S)
chapter_content = chapter_content.replace('<br />', '')
chapter_content = chapter_content.replace('<br>', '')
chapter_content = chapter_content.replace('<br />', '')
chapter_content = chapter_content.replace('<p>', '')
chapter_content = chapter_content.replace('</p>', '')
chapter_content = chapter_content.replace(' ', '')

f.write(chapter_title)
f.write('\n')
f.write(chapter_content)
f.write('\n\n\n\n\n')
print(chapter_url)

ofo 发表于 2018-11-12 00:00

新来的吧，代码里有支付宝红包口令，快隐掉，不然你的号白注册了

HaNnnng 发表于 2018-11-12 00:57

gxl1208 发表于 2018-11-12 00:42
现在想学习编程，一点基础没有，想多了解一起这方面的知识。

可以先找一门语言来学习，推荐python，易上手，基础打好很快就能利用python做各种事情
我也是这个暑假开始接触，零零散散学了两三个月

HaNnnng 发表于 2018-11-12 00:07

ofo 发表于 2018-11-12 00:00
新来的吧，代码里有支付宝红包口令，快隐掉，不然你的号白注册了

哦哦我没注意，那个是精彩阅读网发的，我只是想把它清洗掉。
感谢提醒

unkownc 发表于 2018-11-12 00:09

谢谢楼主分享

dy18 发表于 2018-11-12 00:15

学习学习。

gxl1208 发表于 2018-11-12 00:42

现在想学习编程，一点基础没有，想多了解一起这方面的知识。

nanguoxiansheng 发表于 2018-11-12 00:55

ofo 发表于 2018-11-12 00:00
新来的吧，代码里有支付宝红包口令，快隐掉，不然你的号白注册了

这个回复厉害了！

wkzwxs 发表于 2018-11-12 01:01

先记录下，有需要

adq135158 发表于 2018-11-12 08:24

果然内行看门道，外行看热闹啊

页: [1] 2

吾爱破解 - 52pojie.cn's Archiver

入门python爬取精彩阅读网的小说 --- 1