carole1102 发表于 2019-11-30 22:48

加班上通宵无聊,爬本小说读读

听说斗破苍穹,恐怖如斯,爬下来瞧瞧。。。。。

import requests
import re
import time

hds = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.120 Safari/537.36'
}

f = open(r'e:\book.txt','a+',encoding='utf-8')

def get_txt(url):
    res = requests.get(url,headers = hds)
    if res.status_code == 200:
      contents = re.findall('<p>(.*?)</p>',res.content.decode('utf-8'),re.S)
      for content in contents:
            f.write(content + '\n')
    else:
      pass

if __name__ == '__main__':
    urls = ['http://www.doupoxs.com/doupocangqiong/{}.html'.format(str(i)) for i in range(2,1647)]
    for url in urls:
      get_txt(url)
    time.sleep(1)

f.close()

autist 发表于 2019-11-30 23:48

刚接触两天的小萌新睁大了双眼

zxshouxian 发表于 2019-12-1 17:52

不错呀 厉害
页: [1]
查看完整版本: 加班上通宵无聊,爬本小说读读