吾爱破解 - 52pojie.cn

 找回密码
 注册[Register]

QQ登录

只需一步,快速开始

查看: 992|回复: 23
收起左侧

[已解决] 【已解决】某文泉爬取求助

[复制链接]
Zhili.An 发表于 2024-3-25 20:33
本帖最后由 Zhili.An 于 2024-3-26 09:35 编辑

最近在对某文泉一些爬取工作,虽然页面图片只能访问一次,但是它的缩略图却可以一直访问;
所以也准备一下爬取工作,但是遇到了问题
下面将对缩略图简称 【图片】;
图片在网页中可以一直访问,无论是点出去【只要保留cookie】,还是请求重发
image.png
都可以访问。
但是用python虽然显示是200,但得到的图片却是损坏的
代码如下:
[Python] 纯文本查看 复制代码
import requests
session = requests.session()
url ="https://lib-xjtu.wqxuetang.com/deep/page/imgs/3225567/7?width=160&k=eyJ1IjoiRVpv..."

headers = {
    "Host": "lib-xjtu.wqxuetang.com",
    "Connection": "keep-alive",
    "Pragma": "no-cache",
    'Cache-Control': "no-cache",
    "sec-ch-ua": '"Chromium";v="122", "Not(A:Brand";v="24", "Microsoft Edge";v="122"',
    "sec-ch-ua-mobile": "?0",
    "RequestID": "0",
    "sec-ch-ua-platform": "Windows",
    "DNT": "1",
    "Upgrade-Insecure-Requests": "1",
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36 Edg/122.0.0.0",
    "Accept": "*/*",
    "Sec-Fetch-Site": "same-origin",
    "Sec-Fetch-Mode": "cors",
    "Sec-Fetch-Dest": "empty",
    "Accept-Encoding": "gzip, deflate, br",
    "Accept-Language": "zh-CN,zh;q=0.9",
    "Referer": "https://lib-xjtu.wqxuetang.com/deep/read/pdf?bid=3225567",
    "Cookie": "acw_tc=0b6e704617113690830561493e06fd50da72d2bb7ea2f760909e5e54bcaef3; _gid=177223...."
    
    }
try:
    response =  session.get(url, headers=headers)
    print(response)
    if response.status_code == 200:
        with open('1.jpg', 'wb') as f:
            f.write(response.content)
            print('下载完成:')
except Exception as e:
    print(e)


而且python得到图片大小与原图片大小相近,但是无法打开。

cookie这些都完整的,也没错啊,就很离谱啊,,,,,,谢各位大佬帮忙!!!!!!

发帖前要善用论坛搜索功能,那里可能会有你要找的答案或者已经有人发布过相同内容了,请勿重复发帖。

Time丨Brand 发表于 2024-3-25 20:44
"Accept-Encoding": "gzip, deflate, br", 是gzip,可能要解压,gzip.decompress(pic_gzip)
qfxldhw 发表于 2024-3-25 20:51
[Python] 纯文本查看 复制代码
import requests

url = "https://lib-xjtu.wqxuetang.com/deep/page/imgs/3225567/7?width=160&k=eyJ1IjoiRVpv"

headers = {
    "authority": "lib-xjtu.wqxuetang.com",
    "accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7",
    "accept-language": "zh-CN,zh;q=0.9",
    "cache-control": "max-age=0",
    "cookie": "acw_tc=0bdd346e17113706029986224edd4465a233e52e8a3b8f17bdbd735f23f69c; SERVERID=f164105ccbc961f51f901041b71e3b0d|1711370603|1711370603; SERVERCORSID=f164105ccbc961f51f901041b71e3b0d|1711370603|1711370603",
    "sec-ch-ua": '"Chromium";v="122", "Not A Brand";v="24", "Google Chrome";v="122"',
    "sec-ch-ua-mobile": "?0",
    "sec-ch-ua-platform": "Windows",
    "sec-fetch-dest": "document",
    "sec-fetch-mode": "navigate",
    "sec-fetch-site": "none",
    "sec-fetch-user": "?1",
    "upgrade-insecure-requests": "1",
    "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36",
}

response = requests.get(url, headers=headers)

if response.status_code == 200:
    with open('1.jpg', 'wb') as file:
        file.write(response.content)
    print("下载完成")
else:
    print("下载失败,状态码:", response.status_code)
  现在可以下载了,但是下载下来也看不清呀
 楼主| Zhili.An 发表于 2024-3-25 21:07
qfxldhw 发表于 2024-3-25 20:51
[mw_shl_code=python,true]import requests

url = "https://lib-xjtu.wqxuetang.com/deep/page/imgs/322 ...

是什么原因哎?感觉也没差什么啊
qfxldhw 发表于 2024-3-25 21:10
Zhili.An 发表于 2024-3-25 21:07
是什么原因哎?感觉也没差什么啊


"Accept-Encoding": "gzip, deflate, br",   楼上说那个问题,把参数删了
sai609 发表于 2024-3-25 21:11
页面图片只能访问一次,啥意思?同一ip只能访问一次?
 楼主| Zhili.An 发表于 2024-3-25 21:18
qfxldhw 发表于 2024-3-25 21:10
"Accept-Encoding": "gzip, deflate, br",   楼上说那个问题,把参数删了

奥嗷嗷哦,谢谢明天试试
头像被屏蔽
Mr.Jimmy 发表于 2024-3-25 21:19
提示: 该帖被管理员或版主屏蔽
 楼主| Zhili.An 发表于 2024-3-25 21:20
Time丨Brand 发表于 2024-3-25 20:44
"Accept-Encoding": "gzip, deflate, br", 是gzip,可能要解压,gzip.decompress(pic_gzip)

好哒,明天试一下,谢谢了
 楼主| Zhili.An 发表于 2024-3-25 21:21
sai609 发表于 2024-3-25 21:11
页面图片只能访问一次,啥意思?同一ip只能访问一次?

对于可以看的图片,那个链接只能访问一次,就失效了。缩略图不限制
您需要登录后才可以回帖 登录 | 注册[Register]

本版积分规则

返回列表

RSS订阅|小黑屋|处罚记录|联系我们|吾爱破解 - LCG - LSG ( 京ICP备16042023号 | 京公网安备 11010502030087号 )

GMT+8, 2024-11-24 16:52

Powered by Discuz!

Copyright © 2001-2020, Tencent Cloud.

快速回复 返回顶部 返回列表