emmm做一个微信公众号图片爬取的扒
本帖最后由 cdsgg 于 2020-12-2 21:33 编辑读取编码: UTF-8, 大小: 2.01KB
import re
import datetime
import requests
from bs4 import BeautifulSoup
import os
a = 0
while True:
url = input("请输入url:")
curr_time = datetime.datetime.now()
print(curr_time)
headers = {
'user-agent': 'Mozilla/5.0 (iPhone; CPU iPhone OS 11_0 like Mac OS X) AppleWebKit/604.1.38 (KHTML, like Gecko) Version/11.0 Mobile/15A372 Safari/604.1',
'cookie': 'tvfe_boss_uuid=4427f26b6d83d5d7; pgv_pvid=8192465356; pgv_pvi=2750494720; RK=cfw14pvSFY; ptcz=026939cd8bdd917551be81f3d0d2563bdb9e2d0805f4c83de8df0ea6af457e49; eas_sid=i1e690x1l8v2I68559J4e8K995; LW_sid=W1C6S0u1y8a2A6E864o8L480Z0; LW_uid=51H6V041L8i2n6Q8M4S8e4k0D0; uin_cookie=o0878530130; ied_qq=o0878530130; o_cookie=878530130; pac_uid=1_878530130; luin=o0878530130; lskey=000100000f95a236a0b3f6a309a1f6e4809612024104f9a476a9b0803995ce53ec225971d5d95f3164c7df7a; rewardsn=; wxtokenkey=777'}
path=datetime.datetime.strftime(curr_time,'%Y%m%d%H%M')
print(path)
if os.path.exists(path):
print("属于这个时间点的文件夹已经创建好")
else:
os.mkdir(path)
print("创建成功!!!!正在保存图片")
dirname=os.getcwd()+'\\'+path+'\\'
print(dirname)
# with open(dirname+'a.txt','w') as f:
# f.write(url)
# f.close
req = requests.get(url=url, headers=headers).content.decode()
soup = BeautifulSoup(req, 'lxml')
img = soup.find_all('img')
for i in img:
imglist = i.get('data-src')
print(imglist)
pat = r"https://.*?wx_fmt=(.*)"
rel = re.findall(pat, str(imglist))
for j in rel:
print(j)
try:
with open(dirname + '%s.%s' % (str(a),j), 'wb') as f:
ig = requests.get(imglist, headers=headers).content
f.write(ig)
f.close()
a = a + 1
except Exception as e:
print(e)
https://static.52pojie.cn/static/image/hrline/4.gif
成品链接 蓝奏云:https://wwa.lanzouj.com/izsmEiztcaj
cdsgg 发表于 2020-12-2 21:44
可以好的 那你发给我看看
链接: https://pan.baidu.com/s/1nIeGbdx22O11uYlx-ZIHvA 提取码: cmfv 复制这段内容后打开百度网盘手机App,操作更方便哦
我只需要爬取这8列地点为山东(包括渤海海域、黄海海域)的数据,起止页码可以自己输入。保存到excel或者csv格式 jiguanlang 发表于 2020-12-2 21:52
链接: https://pan.baidu.com/s/1nIeGbdx22O11uYlx-ZIHvA 提取码: cmfv 复制这段内容后打开百度网盘手机A ...
要不远程吧 楼主,怎么获取到微信的web地址呢? 大佬帮忙写个爬虫 jiguanlang 发表于 2020-12-2 21:36
大佬帮忙写个爬虫
啥样子的 cdsgg 发表于 2020-12-2 21:37
啥样子的
https://www.52pojie.cn/thread-1318728-1-1.html
此悬赏贴 jiguanlang 发表于 2020-12-2 21:40
https://www.52pojie.cn/thread-1318728-1-1.html
此悬赏贴
内网的= =咋搞 cdsgg 发表于 2020-12-2 21:41
内网的= =咋搞
提供网页保存的源码可否? jiguanlang 发表于 2020-12-2 21:42
提供网页保存的源码可否?
可以好的 那你发给我看看 request建议考虑使用session,加快速度。因为微信公众号是属于同一个域名,减少连接建立断开的时间
页:
[1]
2