爬虫之你需要知道的隔夜全球要闻并发送至微信
本帖最后由 txq0211 于 2023-3-15 11:14 编辑业余爱好者,好久没写爬虫了,近期世界经济变化莫测,
突然看到一篇“你需要知道的隔夜全球要闻”,忍不住写个爬虫把它爬下来每天看看。
第一步 找到通用接口
第二步 测试接口
第三步 尝试编写代码,反复测试
import requests
import json
import re
import time
from lxml import etree
if __name__ == '__main__':
url = 'https://www.cls.cn/api/sw?app=CailianpressWeb&os=web&sv=7.7.5&sign=bf0f367462d8cd70917ba5eab3853bce'
headers = {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/110.0"}
data = {"type":"telegram","keyword":"你需要知道的隔夜全球要闻","page":0,"rn":20,"os":"web","sv":"7.7.5","app":"CailianpressWeb"}
response = requests.post(url=url,headers=headers,data=data)
news = json.loads(response.text)['data']['telegram']['data']['descr']
timeStamp = json.loads(response.text)['data']['telegram']['data']['time']
timeArray = time.localtime(timeStamp)
formatTime = time.strftime("%Y年%m月%d日", timeArray)
news = re.split(r'\d+、',news)
title = ''.join(etree.HTML(news).xpath('//text()'))
print(formatTime,title)
for i in range(1, len(news)):
new = '%s、%s'%(i,news)
print(new)
第四步 迭代完善(2023.3.14修改)。
最初编写代码过程中存在问题,本想通过正则表达式在"。数字、"数字前面通过正则替换内容替换新增换行,没想到输出的时候数字被\d+替换了。感谢大神@Arcticlyc指点,优化正则替换。
如有不足,欢迎大家提出意见
import requests
import json
import re
import time
from lxml import etree
if __name__ == '__main__':
url = 'https://www.cls.cn/api/sw?app=CailianpressWeb&os=web&sv=7.7.5&sign=bf0f367462d8cd70917ba5eab3853bce'
headers = {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/110.0"}
data = {"type":"telegram","keyword":"你需要知道的隔夜全球要闻","page":0,"rn":20,"os":"web","sv":"7.7.5","app":"CailianpressWeb"}
response = requests.post(url=url,headers=headers,data=data)
news = json.loads(response.text)['data']['telegram']['data']['descr']
timeStamp = json.loads(response.text)['data']['telegram']['data']['time']
timeArray = time.localtime(timeStamp)
formatTime = time.strftime("%Y年%m月%d日", timeArray)
news = re.sub(r'(\d+、)', r'\n\1', news)
formatNews = ''.join(etree.HTML(news).xpath('//text()'))
print(formatTime,formatNews)
第五步 发送至微信(2023.3.15修改)
发现一款新的微信插件wxauto支持最新的微信3.9+。
地址https://github.com/cluic/wxauto
import requests
import json
import re
import time
from lxml import etree
from wxauto import *
if __name__ == '__main__':
url = 'https://www.cls.cn/api/sw?app=CailianpressWeb&os=web&sv=7.7.5&sign=bf0f367462d8cd70917ba5eab3853bce'
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/110.0"}
data = {"type": "telegram", "keyword": "你需要知道的隔夜全球要闻", "page": 0, "rn": 20, "os": "web", "sv": "7.7.5",
"app": "CailianpressWeb"}
response = requests.post(url=url, headers=headers, data=data)
news = json.loads(response.text)['data']['telegram']['data']['descr']
timeStamp = json.loads(response.text)['data']['telegram']['data']['time']
timeArray = time.localtime(timeStamp)
formatTime = time.strftime("%Y年%m月%d日", timeArray)
news = re.sub(r'(\d+、)', r'\n\1', news)
formatNews = ''.join(etree.HTML(news).xpath('//text()'))
# 获取当前微信客户端
wx = WeChat()
# 获取会话列表
wx.GetSessionList()
# 向某人发送消息(以`文件传输助手`为例)
msg = formatTime + '\n' + formatNews
who = '文件传输助手'
WxUtils.SetClipboard(msg)# 将内容复制到剪贴板,类似于Ctrl + C
wx.ChatWith(who)# 打开`文件传输助手`聊天窗口
wx.SendClipboard()# 发送剪贴板的内容,类似于Ctrl + V
re.sub(r'(\d+、)', r'\n\1', news) 建议楼主直接用push_plus(https://www.pushplus.plus/) 直接通过post发送到自己微信. 然后把这个脚本挂在服务器上每天定时运行就行了大概像这样token 需要你替换成自己的
if __name__ == '__main__':
url = 'https://www.cls.cn/api/sw?app=CailianpressWeb&os=web&sv=7.7.5&sign=bf0f367462d8cd70917ba5eab3853bce'
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/110.0"}
data = {"type": "telegram", "keyword": "你需要知道的隔夜全球要闻", "page": 0, "rn": 20, "os": "web", "sv": "7.7.5",
"app": "CailianpressWeb"}
response = requests.post(url=url, headers=headers, data=data)
news = json.loads(response.text)['data']['telegram']['data']['descr']
timeStamp = json.loads(response.text)['data']['telegram']['data']['time']
timeArray = time.localtime(timeStamp)
formatTime = time.strftime("%Y年%m月%d日", timeArray)
news = re.split(r'\d+、', news)
title = ''.join(etree.HTML(news).xpath('//text()'))
param = {"title": f"{formatTime} {title}",
"content": f"{'<br>☀'.join(news)}",
"template": "html",
"token": "*******************",
}
requests.post(url="https://www.pushplus.plus/send", headers=headers, data=param)
感谢大佬的分享! 感觉好难,我还有很长的路要走。 感谢大佬的分享! 感谢大佬! 感谢大佬的分享!! Arcticlyc 发表于 2023-3-14 22:31
这样所有的编号都变成1了 可以可以 txq0211 发表于 2023-3-14 22:35
这样所有的编号都变成1了