miracle1989 发表于 2024-8-17 00:01

新手求助,data:image 这种动态加载的图片如何下载?

通过代码请求到的html始终没有data:image相关的值,怀疑是前端动态加载导致无法通过python的BS4获取,请问这种情况除了使用selenium 还有其他方法吗?
html代码如下:
<div class="post-content">
<img src="" data-xuid="1" data-xkrkllgl="https://pic.uzsofv.cn/upload_01/xiao/20240816/2024081616535950939.jpeg" alt="photo_2024-08-16_11-52-27.jpg" title="photo_2024-08-16_11-52-27.jpg" data-action="zoom">
</div>

import requests
from bs4 import BeautifulSoup
import base64
import imghdr


url = 'xxxx'

# 获取网页内容
response = requests.get(url)
html_content = response.text

# 解析HTML
soup = BeautifulSoup(html_content, 'html.parser')

# 找到img标签
img_tag = soup.find('img', src=lambda src: src and 'base64' in src)
print(img_tag)

# 检查img标签是否存在
if img_tag and 'src' in img_tag.attrs:
    # 获取加密的base64字符串
    encrypted_base64 = img_tag['src'].split(',')
else:
    print('没有找到包含base64的img标签')
    exit()

十万菠萝拍黄瓜 发表于 2024-8-17 02:39

加密了, AES-CBC-Pkcs7, key是f5d965df75336270, iv是97b60394abc2fbe1, ab2b64直接问AI, 请求2024081616535950939.jpeg先转b64,再解密

三滑稽甲苯 发表于 2024-8-17 07:26

分析 js 脚本,找到解密的地方

miracle1989 发表于 2024-8-17 08:46

十万菠萝拍黄瓜 发表于 2024-8-17 02:39
加密了, AES-CBC-Pkcs7, key是f5d965df75336270, iv是97b60394abc2fbe1, ab2b64直接问AI, 请求202408161653 ...

大佬我这边根据您的提示试了下,还是不行,空了能否帮忙看看import os

from Crypto.Cipher import AES
from Crypto.Util.Padding import unpad
import base64
import requests
from bs4 import BeautifulSoup


def get_response(url, timeout=10):
    headers = {
      "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
      "Connection": "Keep-Alive",
      "Accept-Encoding": "gzip, deflate, br",
      "Accept-Language": "zh-CN,zh;q=0.9",
      "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.102 Safari/537.36"
    }
    try:
      response = requests.get(url, headers=headers,timeout=timeout)
      response.raise_for_status()# 检查请求是否成功
      response.encoding = 'utf-8'
      return response
    except requests.exceptions.Timeout:
      print(f"请求超时:{url}")
    except requests.exceptions.HTTPError as e:
      print(f"HTTP错误:{e.response.status_code}, {url}")
    except requests.exceptions.RequestException as e:
      print(f"请求异常:{e}, {url}")
    return None

def fetch_pic_urls(url):
    if url.startswith('http'):
      response = get_response(url)
      html_content = response.text
    # 使用BeautifulSoup解析HTML
    soup = BeautifulSoup(html_content, 'html.parser')

    # 查找所有包含data-xkrkllgl属性的img标签
    img_tags = soup.find_all('img', attrs={'data-xkrkllgl': True})

    # 提取并返回data-xkrkllgl属性的值
    pic_urls = for img in img_tags]
    return pic_urls



def decrypt_image_url(encrypted_url, key, iv):
    # 将key和iv转换为字节串
    key = bytes.fromhex(key)
    iv = bytes.fromhex(iv)

    # 将加密的URL从base64解码
    encrypted_data = base64.b64decode(encrypted_url)

    # 创建一个AES的CBC模式的解密器
    cipher = AES.new(key, AES.MODE_CBC, iv)

    # 解密数据
    decrypted_padded = cipher.decrypt(encrypted_data)

    # 去除填充
    decrypted_data = unpad(decrypted_padded, AES.block_size)

    # 将解密后的数据转换为字符串
    decrypted_url = decrypted_data.decode('utf-8')

    return decrypted_url

def download_image(url, save_dir='.', timeout=10):
    response = get_response(url, timeout)
    if response and response.status_code == 200:
      # 从URL中提取文件名
      filename = os.path.basename(url)
      # 确保保存目录存在
      if not os.path.exists(save_dir):
            os.makedirs(save_dir)
      # 拼接完整的文件路径
      file_path = os.path.join(save_dir, filename)
      with open(file_path, 'wb') as f:
            f.write(response.content)
      print(f'Image downloaded successfully to {file_path}.')
    else:
      print('Failed to download image.')


def main():
    key = 'f5d965df75336270'
    iv = '97b60394abc2fbe1'
    url = 'xxxx'
    encrypted_urls = fetch_pic_urls(url)

    for encrypted_url in encrypted_urls:
      decrypted_url = decrypt_image_url(encrypted_url, key, iv)
      download_image(decrypted_url)


if __name__ == '__main__':
   main()

十万菠萝拍黄瓜 发表于 2024-8-17 09:20

本帖最后由 十万菠萝拍黄瓜 于 2024-8-17 09:22 编辑


单张图的例子, 改一下就行
from Crypto.Cipher import AES
from Crypto.Util.Padding import unpad
import base64
from io import BytesIO
import requests

def decrypt_image(encrypted_base64):
    key = b"f5d965df75336270"
    iv = b"97b60394abc2fbe1"
    encrypted_data = base64.b64decode(encrypted_base64)
    cipher = AES.new(key, AES.MODE_CBC, iv)
    decrypted_padded = cipher.decrypt(encrypted_data)
    decrypted_data = unpad(decrypted_padded, AES.block_size)
    return decrypted_data

def ab2b64(t):
    binary_data = BytesIO(t)
    data = binary_data.read()
    b64encoded = base64.b64encode(data)
    return b64encoded

def main():
    url = 'https://pic.uzsofv.cn/upload_01/xiao/20240816/2024081616535950939.jpeg'
    res = requests.get(url).content
    b64 = ab2b64(res)
    s = decrypt_image(b64)
    with open('1.jpg', 'wb') as f:
      f.write(s)
    print('111')


if __name__ == '__main__':
    main()

wasm2023 发表于 2024-8-17 09:47

如果是wasm生成的图片,并且元素里只有一个canvasid,请问如何去定位生成位置呢

puz_zle 发表于 2024-8-17 12:04

wasm2023 发表于 2024-8-17 09:47
如果是wasm生成的图片,并且元素里只有一个canvasid,请问如何去定位生成位置呢

分析接口 比这个省事

wasm2023 发表于 2024-8-17 14:34

puz_zle 发表于 2024-8-17 12:04
分析接口 比这个省事

没找到接口{:1_925:}
页: [1]
查看完整版本: 新手求助,data:image 这种动态加载的图片如何下载?