【新手】爬取吾爱精品软件模块

niebaohua 发表于 2018-11-19 12:26

本帖最后由 niebaohua 于 2018-11-19 20:51 编辑

Python初学者啥也不懂......
代码是按照看的视频照鼻子画瓢做的
大家可以帮忙改一下
应该没违规吧。。

给点免费评分吧:lol:lol:lol

import requests
from lxml import etree

headers = {"Mozilla/5.0": "(Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.98 Safari/537.36"}

number = int(input("请输入你要爬取内容的页数："))
for i in range(1,number+1):
next_url = "https://www.52pojie.cn/forum-16-%d.html" % i
response = requests.get(next_url, headers = headers)
url = "https://www.52pojie.cn/"
html = response.content.decode("gb18030")
text = etree.HTML(html)
word_href = text.xpath('//a[@class="s xst"]/@href')
word_title = text.xpath('//a[@class="s xst"]/text()')
# 下一页<a href="forum-16-2.html" class="nxt"本来尝试直接通过下一页，不过好像出现问题了
print("****" * 20 +"第"+str(i)+"页" + "****" * 20)
for href,title in zip(word_href,word_title):
print(title+"-----"*5+(url+href))
#last_url = url+next_page

时空之外 发表于 2018-11-19 14:25

niebaohua 发表于 2018-11-19 14:21
该怎么填呢我是直接复制的网页上面的求指教

niebaohua 发表于 2018-11-19 14:21

时空之外发表于 2018-11-19 14:19
headers里面写错了，兄弟。

该怎么填呢我是直接复制的网页上面的求指教{:1_893:}

fake 发表于 2018-11-19 12:44

{:301_1000:}楼主学习了{:301_971:}

浩蛋发表于 2018-11-19 12:46

有点小6啊

niebaohua 发表于 2018-11-19 12:55

浩蛋发表于 2018-11-19 12:46
有点小6啊

谢谢夸奖其实这些模块我也不太懂:lol

hjdx001 发表于 2018-11-19 13:01

Python这么牛逼，也想学学了

淮左名都 发表于 2018-11-19 13:04

mark，学习一下

或许。 发表于 2018-11-19 13:06

牛逼啊胸DIE

浮尘云烟 发表于 2018-11-19 13:10

不懂帮顶

yjian415 发表于 2018-11-19 13:46

不错，666

madson 发表于 2018-11-19 14:17

不懂也帮你顶一下。

页: [1] 2

吾爱破解 - 52pojie.cn's Archiver

【新手】爬取吾爱精品软件模块