吾爱破解 - 52pojie.cn

 找回密码
 注册[Register]

QQ登录

只需一步,快速开始

查看: 4981|回复: 12
收起左侧

[Python 转载] 百度文库爬虫批量下载

[复制链接]
hksnow 发表于 2019-8-15 13:12
本帖最后由 hksnow 于 2019-8-15 13:13 编辑

前言
2019-08-15_130226.png
需要对照练习册答案,百度找到了答案,想要下载下来。
代码
[Python] 纯文本查看 复制代码
import requests
import json
import os
#from concurrent import futures
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0'}
# https://wk.baidu.com/bigque/jsonp/naapi/answer/getwapflowanswerview?na_uncheck=1&pn=1&rn=5&answer_id=fa3ab7c30c22590102029d3f&sign=meow
def download_file(file_url,download_path):
    global file_num
    html = requests.get(file_url,headers = headers)
    file_name = download_path + '\\' + str(file_num) + '.jpg'
    with open(file_name,'wb') as code:
        code.write(html.content)
    file_num = file_num + 1
def start(link):
    global thread
    # 第一次提交为了获取基本信息
    answer_id = link.split('/')[-1]
    url = 'https://wk.baidu.com/bigque/jsonp/naapi/answer/getwapflowanswerview?na_uncheck=1&pn=1&rn=5&answer_id=' + answer_id + '&sign=meow'
    html = requests.get(url,headers = headers)
    json_data = json.loads(html.text)
    img_totals = json_data['data']['answer_info']['pages']
    title = json_data['data']['answer_info']['title']
    page_nums = int(img_totals)//5
    last_page_img_totals = int(img_totals)%5
    # 第二次获取图片数据
    path = os.getcwd() + '\\' + title
    os.makedirs(path)
    for n in range(0,page_nums + 1):
        url = 'https://wk.baidu.com/bigque/jsonp/naapi/answer/getwapflowanswerview?na_uncheck=1&pn=' + str(n) + '&rn=5&answer_id=' + answer_id + '&sign=meow'
        html = requests.get(url,headers = headers)
        json_data = json.loads(html.text)
        answer_urls_list = json_data['data']['answer_urls']
        #print(answer_urls_list)
        if (n == page_nums):
            num_lists = range(0,last_page_img_totals)
            print('最后一页了!')
        else:
            num_lists = range(0,5)
        for x in num_lists:
            img_url = answer_urls_list[x]
            #print(img_url)
            #thread.submit()
            download_file(img_url,path)
if __name__ == "__main__":
    #thread = futures.ThreadPoolExecutor(max_workers = 5)
    file_num = 1
    url = '[/color][/color][/b][/size][size=6][b][color=Red][color=Black]'
    start(url)



仅支持https://wk.baidu.com/bigque/book/xxxxxxxxxx22590102029d3f这样类似的链接


2019-08-15_131153.png

免费评分

参与人数 1热心值 +1 收起 理由
KobeBryantmentu + 1 <font style="vertical-align: inherit;"><font style=

查看全部评分

发帖前要善用论坛搜索功能,那里可能会有你要找的答案或者已经有人发布过相同内容了,请勿重复发帖。

KobeBryantmentu 发表于 2019-8-15 13:24
试试看,源码好评
www1678 发表于 2019-8-15 13:25
薄荷叶1996 发表于 2019-8-15 13:34
cdwdz 发表于 2019-8-15 13:38
感谢分享  谢谢
淮左名都 发表于 2019-8-15 13:48
mark,学习一下。。。
virgo915 发表于 2019-8-15 13:53
好东西,支持支持
fudashuai 发表于 2019-8-15 13:57
欢迎为大家分享啊!
wty1641 发表于 2019-8-15 14:10
mark..........
s98 发表于 2019-8-15 14:12
厉害了大佬
您需要登录后才可以回帖 登录 | 注册[Register]

本版积分规则

返回列表

RSS订阅|小黑屋|处罚记录|联系我们|吾爱破解 - LCG - LSG ( 京ICP备16042023号 | 京公网安备 11010502030087号 )

GMT+8, 2024-11-16 13:56

Powered by Discuz!

Copyright © 2001-2020, Tencent Cloud.

快速回复 返回顶部 返回列表