fanvalen 发表于 2020-7-18 00:40

又来一个奇书小说目录爬取表格查看

本帖最后由 fanvalen 于 2020-7-18 20:58 编辑

#coding=utf-8
import requests
import re
import openpyxl


ld=openpyxl.load_workbook
book=ld("d:\\qishu.xlsx")
sheet1=book["Sheet1"]

hd={'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36'}

p=1
while p<1234:
    url="http://m.iqishu.la/full/"+str(p)+".html"


    try:

      dat=requests.get(url,headers=hd,timeout=60)

    except BaseException:
      dat=requests.get(url,headers=hd,timeout=60)

    if dat.status_code==200:


      with open("d:\\qishu.txt","a+",encoding="utf-8")as f:
            f.write(dat.text)
            f.close()
    else:
      pass

    p=p+1
    print(str(p))




s=open("d:\\qishu.txt",encoding="utf-8").read()




pat=r"<div class=\"full_content\"><p class=\"p1\">(.*?)</p><p class=\"p2\"> <a href=\"(.*?)\" class=\"blue\">(.*?)</a></p><p class=\"p3\"><a>(.*?)</a></p></div>"
r=re.findall(pattern=pat,string=s)




for i in range(len(r)):
    x=r
    print(x)
    row=sheet1.max_row+1
    for b in range(len(x)):
      # print(row)
      sheet1.cell(row,b+1).value=x





book.save("d:\\qishu.xlsx")


fanvalen 发表于 2020-7-18 00:57

表格目录下载链接
https://fanvalen.lanzouj.com/i04d8eon9di

春雨忆江南 发表于 2020-7-18 08:01

谢谢分享
页: [1]
查看完整版本: 又来一个奇书小说目录爬取表格查看