新手求助，data:image 这种动态加载的图片如何下载？

miracle1989 发表于 2024-8-17 00:01

通过代码请求到的html始终没有data:image相关的值，怀疑是前端动态加载导致无法通过python的BS4获取,请问这种情况除了使用selenium 还有其他方法吗？
html代码如下：
<div class="post-content">
<img src="data:image/jpeg;base64,/9j/2wCEABALDZ" data-xuid="1" data-xkrkllgl="https://pic.uzsofv.cn/upload_01/xiao/20240816/2024081616535950939.jpeg" alt="photo_2024-08-16_11-52-27.jpg" title="photo_2024-08-16_11-52-27.jpg" data-action="zoom">
</div>

import requests
from bs4 import BeautifulSoup
import base64
import imghdr

url = 'xxxx'

# 获取网页内容
response = requests.get(url)
html_content = response.text

# 解析HTML
soup = BeautifulSoup(html_content, 'html.parser')

# 找到img标签
img_tag = soup.find('img', src=lambda src: src and 'base64' in src)
print(img_tag)

# 检查img标签是否存在
if img_tag and 'src' in img_tag.attrs:
# 获取加密的base64字符串
encrypted_base64 = img_tag['src'].split(',')
else:
print('没有找到包含base64的img标签')
exit()

十万菠萝拍黄瓜 发表于 2024-8-17 02:39

加密了, AES-CBC-Pkcs7, key是f5d965df75336270, iv是97b60394abc2fbe1, ab2b64直接问AI, 请求2024081616535950939.jpeg先转b64,再解密

三滑稽甲苯 发表于 2024-8-17 07:26

分析 js 脚本，找到解密的地方

miracle1989 发表于 2024-8-17 08:46

十万菠萝拍黄瓜发表于 2024-8-17 02:39
加密了, AES-CBC-Pkcs7, key是f5d965df75336270, iv是97b60394abc2fbe1, ab2b64直接问AI, 请求202408161653 ...

大佬我这边根据您的提示试了下，还是不行，空了能否帮忙看看import os

from Crypto.Cipher import AES
from Crypto.Util.Padding import unpad
import base64
import requests
from bs4 import BeautifulSoup

def get_response(url, timeout=10):
headers = {
   "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
   "Connection": "Keep-Alive",
   "Accept-Encoding": "gzip, deflate, br",
   "Accept-Language": "zh-CN,zh;q=0.9",
   "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.102 Safari/537.36"
}
try:
   response = requests.get(url, headers=headers,timeout=timeout)
   response.raise_for_status()# 检查请求是否成功
   response.encoding = 'utf-8'
   return response
except requests.exceptions.Timeout:
   print(f"请求超时：{url}")
except requests.exceptions.HTTPError as e:
   print(f"HTTP错误：{e.response.status_code}, {url}")
except requests.exceptions.RequestException as e:
   print(f"请求异常：{e}, {url}")
return None

def fetch_pic_urls(url):
if url.startswith('http'):
   response = get_response(url)
   html_content = response.text
# 使用BeautifulSoup解析HTML
soup = BeautifulSoup(html_content, 'html.parser')

# 查找所有包含data-xkrkllgl属性的img标签
img_tags = soup.find_all('img', attrs={'data-xkrkllgl': True})

# 提取并返回data-xkrkllgl属性的值
pic_urls = for img in img_tags]
return pic_urls

def decrypt_image_url(encrypted_url, key, iv):
# 将key和iv转换为字节串
key = bytes.fromhex(key)
iv = bytes.fromhex(iv)

# 将加密的URL从base64解码
encrypted_data = base64.b64decode(encrypted_url)

# 创建一个AES的CBC模式的解密器
cipher = AES.new(key, AES.MODE_CBC, iv)

# 解密数据
decrypted_padded = cipher.decrypt(encrypted_data)

# 去除填充
decrypted_data = unpad(decrypted_padded, AES.block_size)

# 将解密后的数据转换为字符串
decrypted_url = decrypted_data.decode('utf-8')

return decrypted_url

def download_image(url, save_dir='.', timeout=10):
response = get_response(url, timeout)
if response and response.status_code == 200:
   # 从URL中提取文件名
   filename = os.path.basename(url)
   # 确保保存目录存在
   if not os.path.exists(save_dir):
         os.makedirs(save_dir)
   # 拼接完整的文件路径
   file_path = os.path.join(save_dir, filename)
   with open(file_path, 'wb') as f:
         f.write(response.content)
   print(f'Image downloaded successfully to {file_path}.')
else:
   print('Failed to download image.')

def main():
key = 'f5d965df75336270'
iv = '97b60394abc2fbe1'
url = 'xxxx'
encrypted_urls = fetch_pic_urls(url)

for encrypted_url in encrypted_urls:
   decrypted_url = decrypt_image_url(encrypted_url, key, iv)
   download_image(decrypted_url)

if __name__ == '__main__':
main()

十万菠萝拍黄瓜 发表于 2024-8-17 09:20

本帖最后由十万菠萝拍黄瓜于 2024-8-17 09:22 编辑

单张图的例子, 改一下就行
from Crypto.Cipher import AES
from Crypto.Util.Padding import unpad
import base64
from io import BytesIO
import requests

def decrypt_image(encrypted_base64):
key = b"f5d965df75336270"
iv = b"97b60394abc2fbe1"
encrypted_data = base64.b64decode(encrypted_base64)
cipher = AES.new(key, AES.MODE_CBC, iv)
decrypted_padded = cipher.decrypt(encrypted_data)
decrypted_data = unpad(decrypted_padded, AES.block_size)
return decrypted_data

def ab2b64(t):
binary_data = BytesIO(t)
data = binary_data.read()
b64encoded = base64.b64encode(data)
return b64encoded

def main():
url = 'https://pic.uzsofv.cn/upload_01/xiao/20240816/2024081616535950939.jpeg'
res = requests.get(url).content
b64 = ab2b64(res)
s = decrypt_image(b64)
with open('1.jpg', 'wb') as f:
f.write(s)
print('111')

if __name__ == '__main__':
main()

wasm2023 发表于 2024-8-17 09:47

如果是wasm生成的图片，并且元素里只有一个canvasid，请问如何去定位生成位置呢

puz_zle 发表于 2024-8-17 12:04

wasm2023 发表于 2024-8-17 09:47
如果是wasm生成的图片，并且元素里只有一个canvasid，请问如何去定位生成位置呢

分析接口比这个省事

wasm2023 发表于 2024-8-17 14:34

puz_zle 发表于 2024-8-17 12:04
分析接口比这个省事

没找到接口{:1_925:}

页: [1]

吾爱破解 - 52pojie.cn's Archiver

新手求助，data:image 这种动态加载的图片如何下载？