anime-pictures好看壁纸爬虫

s936505608 发表于 2023-8-24 15:14

今天又发现一个壁纸网站https://anime-pictures.net/posts?page=0&lang=zh_CN更新速度快质量也不错，果断写个爬虫拿下，公众号再也不愁素材啦！
运行脚本选订图片存储文件夹，同样的下载过的图片会存在根目录log.txt下，重新运行脚本的话，已经下载过的图片会跳过，低于200kb的图片也会跳过，可以随意调整
import requests
from bs4 import BeautifulSoup
import os
import urllib.parse
from concurrent.futures import ThreadPoolExecutor, as_completed
from tkinter import Tk, filedialog

def download_image(img_info, log_file_path):
img_url, image_path = img_info
image_name = os.path.basename(image_path)

# 追加写入log.txt
with open(log_file_path, 'a') as log_file:
   log_file.write(f'{image_name},')
if os.path.exists(image_path):
   print(f'Skipped image: {image_name}')
else:
   # 获取文件大小
   response = requests.head(img_url)
   file_size = int(response.headers.get('Content-Length', 0))

   if file_size < 200 * 1024:# 200KB
         print(f'Skipped image: {image_name} (File size too small)')
   else:
         image_data = requests.get(img_url).content
         with open(image_path, 'wb') as f:
            f.write(image_data)
         print(f'Downloaded image: {image_name}')



def download_images(url, save_path):
# 创建保存图片的文件夹
os.makedirs(save_path, exist_ok=True)

page = 0
log_file_path = os.path.join(save_path, 'log.txt')

# 读取已下载的文件列表
downloaded_files = []
if os.path.exists(log_file_path):
   with open(log_file_path, 'r') as log_file:
         downloaded_files = log_file.read().split(',')

with ThreadPoolExecutor(max_workers=2) as executor:
   while True:
         page_url = url + f'?page={page}&lang=en'

         response = requests.get(page_url)
         soup = BeautifulSoup(response.text, 'html.parser')

         img_tags = soup.find_all('img', class_='svelte-1ibbyvk')

         if not img_tags:
            break

         download_tasks = []
         for img_tag in img_tags:
            img_src = img_tag['src']
            img_path = img_src.split('previews/').replace('_cp', '')
            img_url = urllib.parse.urljoin('https://images.anime-pictures.net/', img_path)

            image_name = os.path.basename(img_path)
            image_path = os.path.join(save_path, image_name)

            if image_name not in downloaded_files:
               download_tasks.append((img_url, image_path))

         for future in as_completed(executor.submit(download_image, task, log_file_path) for task in download_tasks):
            try:
               future.result()
            except Exception as e:
               print(f'Error occurred: {str(e)}')

         page += 1
         print(f'当前下载第: {page}页')

# 使用文件夹选择对话框选择保存路径
root = Tk()
root.withdraw()
save_path = filedialog.askdirectory(title='选择保存路径')

if save_path:
url = 'https://anime-pictures.net/posts'# 替换为你的链接
download_images(url, save_path)

Jonathanzjy 发表于 2023-8-24 19:32

感谢分享！学习学习！

wmda 发表于 2023-8-24 15:29

不错有
诉讼

5584444 发表于 2023-8-24 15:39

可惜不会用Python，有EXE傻瓜式操作就好啦

kenxy 发表于 2023-8-24 16:11

体积好大的图片啊

下载小王子 发表于 2023-8-24 16:32

学习思路，下来试试看。

xiaopeng128 发表于 2023-8-24 17:25

学习思路，下来试试看

bohong65 发表于 2023-8-24 17:35

什么公众号要这种素材:lol分享一下

percdd 发表于 2023-8-24 17:36

学习了可惜不会设计程序

CcCharlotte 发表于 2023-8-24 18:28

感谢大佬，学习下思路

页: [1] 2

吾爱破解 - 52pojie.cn's Archiver

anime-pictures好看壁纸爬虫