吾爱破解 - 52pojie.cn

 找回密码
 注册[Register]

QQ登录

只需一步,快速开始

查看: 1085|回复: 3
收起左侧

[Python 原创] 简单爬取百度图片并转换为灰度照(二)

[复制链接]
科迈罗 发表于 2024-6-25 20:40
本帖最后由 科迈罗 于 2024-6-25 20:42 编辑

偶然间看到了几年前写的帖子,简单爬取百度图片并转换为灰度照,自己在运行了一下,果然不行了,于是心血来潮想改一下。

还是像之前一样用F12观察,触发源是XHR中的html类型文件,仍然是pn参数翻页,只不过比之前多了一些参数,


批注 2024-06-24 213958.jpg

批注 2024-06-24 214033.jpg

把参数加上,以及修改优化了一些细节,总体代码如下:

[Python] 纯文本查看 复制代码
import requests
import os
import random
from PIL import Image
import numpy
import time

user_agent = ['Mozilla/5.0(Windows;U;WindowsNT6.1;en-us)AppleWebKit/534.50(KHTML,likeGecko)Version/5.1Safari/534.50',
              'Mozilla/5.0(compatible;MSIE9.0;WindowsNT6.1;Trident/5.0',
              'Mozilla/4.0(compatible;MSIE8.0;WindowsNT6.0;Trident/4.0)',
              'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML,'
              ' like Gecko) Chrome/126.0.0.0 Safari/537.36',
              'Mozilla/5.0(Macintosh;IntelMacOSX10_7_0)AppleWebKit/535.11(KHTML,'
              'likeGecko)Chrome/17.0.963.56Safari/535.11',
              'Mozilla/4.0(compatible;MSIE7.0;WindowsNT5.1;Maxthon2.0)',
              'Mozilla/4.0(compatible;MSIE7.0;WindowsNT5.1;TencentTraveler4.0)',
              'Mozilla/4.0(compatible;MSIE7.0;WindowsNT5.1;Trident/4.0;SE2.XMetaSr1.0;SE2.XMetaSr1.0;.NETCLR2.0.50727'
              ';SE2.XMetaSr1.0)',
              'Mozilla/4.0(compatible;MSIE7.0;WindowsNT5.1;360SE)',
              'Mozilla/5.0(Macintosh;U;IntelMacOSX10_6_8;en-us)AppleWebKit/534.50(KHTML,'
              'likeGecko)Version/5.1Safari/534.50']


class GetBaiduImg(object):

    def __init__(self, result, n):
        """
        :param result: 输入百度图片搜索的关键词
        :param n: 输入要爬取的页数
        """
        self.keyword = result
        self.num = n
        self.headers = {'Host': 'image.baidu.com','user-agent': random.choice(user_agent)}   # 随机获取一个UA
        self.urls = list()

    def getPages(self):
        try:
            param = {'tn': 'resultjson_com', 'logid': '11034116407827324644','ipn': 'rj',
                      'ct': '201326592', 'is': '', 'fp': 'result', 'fr':'','word': self.keyword,'queryWord': self.keyword,
                      'cl': '2', 'lm': '-1', 'ie': 'utf-8', 'oe': 'utf-8', 'adpicid': '', 'st': '',
                      'z': '', 'ic': '', 'hd': '', 'latest': '', 'copyright': '',
                       's': '', 'se': '', 'tab': '', 'width': '', 'height': '',
                      'face': '', 'istype': '', 'qc': '', 'nc': '1', 'expermode': '','nojc':'',
                      'isAsync': '', 'pn': '0', 'rn': '30', 'gsm': 'e0000005a', '1719236329137':''}
            start_url = 'https://image.baidu.com/search/acjson'
            for i in range(30, self.num * 30 + 30, 30):
                param['pn'] = i
                res = requests.request(method='get', url=start_url, headers=self.headers, params=param)
                res.raise_for_status()
                res.encoding = res.apparent_encoding
                response = res.content.decode('utf-8')
                for item in response.split('"'):   # 网址前面的字符换了一下
                    if "https://img" in item:
                        self.urls.append(item)
            return set(self.urls)
        except requests.RequestException as e:
            print('mistake info==>', str(e))
            return set()

    def downloadJpg(self, datalist, direct='./baidu'):
        if not os.path.exists(direct):   
            os.mkdir(direct)

        for i, data in enumerate(datalist,start=1):    # 循环和条件判断改了一下
            filename = f'{direct}/{self.keyword}_{i}.jpg'
            print(f'Downloading img {filename}')
            try:
                resp = requests.request(method='get', url=data, headers=self.headers)
                open(filename, 'wb').write(resp.content)
                time.sleep(3)   # 加了时间间隔
            except Exception as exp:
                print("mistake info==>", str(exp))

    def convertColor(self, direct):
        for i in os.listdir(direct):
            im = numpy.array(Image.open(f'{direct}/{i}').convert('L')).astype('float')
            print(f"converting img [{i}] with ", im.shape, im.dtype)
            depth = 10
            grad = numpy.gradient(im)
            grad_x, grad_y = grad
            grad_x = grad_x * depth / 100
            grad_y = grad_y * depth / 100
            A = numpy.sqrt(grad_x ** 2 + grad_y ** 2 + 1)
            uni_x = grad_x / A
            uni_y = grad_y / A
            uni_z = 1 / A
            vec_el = numpy.pi / 2.2
            vec_az = numpy.pi / 4
            dx = numpy.cos(vec_el) * numpy.cos(vec_az)
            dy = numpy.cos(vec_el) * numpy.sin(vec_az)
            dz = numpy.sin(vec_el)
            b = 255 * (dx * uni_x + dy * uni_y + dz * uni_z)
            a = b.clip(0, 255)
            im = Image.fromarray(a.astype('uint8'))
            im.save(f'{direct}/[灰度照]{i}')


if __name__ == "__main__":
    result = input('input a search keyword:')
    n = eval(input('input a num of pages:'))
    baiduimg = GetBaiduImg(result=result, n=n)
    datalist = baiduimg.getPages()
    baiduimg.downloadJpg(datalist=datalist, direct='./baidu')
    baiduimg.convertColor('./baidu')


欢迎各位大佬讨论修改

免费评分

参与人数 1吾爱币 +7 热心值 +1 收起 理由
苏紫方璇 + 7 + 1 欢迎分析讨论交流,吾爱破解论坛有你更精彩!

查看全部评分

本帖被以下淘专辑推荐:

发帖前要善用论坛搜索功能,那里可能会有你要找的答案或者已经有人发布过相同内容了,请勿重复发帖。

sihua3000 发表于 2024-6-25 21:49
百度系卡的不是太死,还可以捞点出来,美团系真是狗啊,各种限制封号
Zaof1 发表于 2024-6-26 01:17
vip060 发表于 2024-6-26 10:04
您需要登录后才可以回帖 登录 | 注册[Register]

本版积分规则

返回列表

RSS订阅|小黑屋|处罚记录|联系我们|吾爱破解 - LCG - LSG ( 京ICP备16042023号 | 京公网安备 11010502030087号 )

GMT+8, 2024-11-24 16:07

Powered by Discuz!

Copyright © 2001-2020, Tencent Cloud.

快速回复 返回顶部 返回列表