Python多线程下载故宫壁纸

surepj 发表于 2023-8-13 17:28

Python新手练习多线程一个简单的例子，图片1280*800(非高清)
https://img.dpm.org.cn/Uploads/Picture/2023/07/30/s64c6545c12498.jpg
https://img.dpm.org.cn/Uploads/Picture/2022/11/08/s6369f4aae4920.jpg
代码如下：
import time
import os
import requests
from lxml import etree
from multiprocessing.dummy import Pool

headers = {
"Accept": "*/*",
"Referer": "https://www.dpm.org.cn/lights/royal.html",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36",
}

def get_img_urls(url):
response = requests.get(url, headers=headers)
response.encoding = 'utf-8'
# print(response.text)
html = etree.HTML(response.text)
imgs = html.xpath('//div[@class="pic"]/a/img')
multi_pool = Pool(16)# 创建16个线程池
for i in imgs:
   title = i.xpath('./@title')
   src = i.xpath('./@src')
   multi_pool.apply_async(down_load, args=(title, src))
multi_pool.close()
multi_pool.join()

def down_load(name, url):
res = requests.get(url, headers=headers)
global count
path = './故宫壁纸'# 下载路径（当前文件夹下‘故宫壁纸’文件夹）
if not os.path.exists(path):
   os.mkdir(path)
print(f'正在下载>>> {count}_{name}.jpg ...')
with open(f'{path}/{count}_{name}.jpg','wb') as f:
   f.write(res.content)
   print(f'图_{count} 下载完成!')
   count += 1
   time.sleep(0.1)

if __name__ == '__main__':
star_t = time.time()
count = 1
url = "https://www.dpm.org.cn/lights/royal/p/{}.html"
for page in range(1, 11):# 下载1-10页的图
   get_img_urls(url.format(page))

print(f'All Done in {time.time()-star_t:.3f} seconds')

https://s1.ax1x.com/2023/08/13/pPKF7ZQ.png

BLUE7777777 发表于 2023-8-14 21:12

surepj 发表于 2023-8-14 08:58
感谢提醒，也没有暴力获取网站数据，也只是简单的几页图片数据，应该没事吧。

Python多线程爬虫就是大量访问服务器数据，容易视为网络攻击，你在内lu实名上网，手机也实名，怎么都容易抓到，当然你访问几个图片没什么事情，一般不会引起警觉！
假如你的线程几十上百个，就会引起关注了。

eWVhaA 发表于 2023-8-13 18:03

好心提醒下……org.cn和gov.cn别瞎爬

andycjdx 发表于 2023-8-13 20:31

不错，感谢提供，都是高清大图啊

earlc 发表于 2023-8-13 21:16

eWVhaA 发表于 2023-8-13 18:03
好心提醒下……org.cn和gov.cn别瞎爬

哈哈，就是就是，找点个人站练手稳当点

moruye 发表于 2023-8-13 21:36

ffrank 发表于 2023-8-13 22:13

感谢，学习学习

5584444 发表于 2023-8-13 22:40

eWVhaA 发表于 2023-8-13 18:03
好心提醒下……org.cn和gov.cn别瞎爬

这么吓人的吗

surepj 发表于 2023-8-14 08:58

eWVhaA 发表于 2023-8-13 18:03
好心提醒下……org.cn和gov.cn别瞎爬

感谢提醒，也没有暴力获取网站数据，也只是简单的几页图片数据，应该没事吧。

nccdap 发表于 2023-8-14 09:29

学习使人进步，感谢

ppxj0 发表于 2023-8-14 09:36

eWVhaA 发表于 2023-8-13 18:03
好心提醒下……org.cn和gov.cn别瞎爬

爬了，它怎么能查出来是谁么

页: [1] 2

吾爱破解 - 52pojie.cn's Archiver

Python多线程下载故宫壁纸