吾爱破解 - 52pojie.cn

 找回密码
 注册[Register]

QQ登录

只需一步,快速开始

查看: 4356|回复: 31
收起左侧

[Python 转载] 使用Python爬取快手主页视频

[复制链接]
TZ糖纸 发表于 2021-4-8 14:37
[Python] 纯文本查看 复制代码
import json
import re
import os
import requests
import urllib.request
from multiprocessing import Pool

requestUrl = 'https://video.kuaishou.com/graphql'
folder_path = 'D:\kuaishou'
userId=''
cookie = ''
pcursor = ''

def post(userId,Cookie,pcursor):
    data = {"operationName":"visionProfilePhotoList","variables":{"userId":userId,"pcursor":pcursor,"page":"profile"},"query":"query visionProfilePhotoList($pcursor: String, $userId: String, $page: String, $webPageArea: String) {\n  visionProfilePhotoList(pcursor: $pcursor, userId: $userId, page: $page, webPageArea: $webPageArea) {\n    result\n    llsid\n    webPageArea\n    feeds {\n      type\n      author {\n        id\n        name\n        following\n        headerUrl\n        headerUrls {\n          cdn\n          url\n          __typename\n        }\n        __typename\n      }\n      tags {\n        type\n        name\n        __typename\n      }\n      photo {\n        id\n        duration\n        caption\n        likeCount\n        realLikeCount\n        coverUrl\n        coverUrls {\n          cdn\n          url\n          __typename\n        }\n        photoUrls {\n          cdn\n          url\n          __typename\n        }\n        photoUrl\n        liked\n        timestamp\n        expTag\n        animatedCoverUrl\n        stereoType\n        videoRatio\n        __typename\n      }\n      canAddComment\n      currentPcursor\n      llsid\n      status\n      __typename\n    }\n    hostName\n    pcursor\n    __typename\n  }\n}\n"}
    failed = {'msg': 'failed...'}
    headers = {
        'Host':'video.kuaishou.com',
        'Connection':'keep-alive',
        'Content-Length':'1261',
        'accept':'*/*',
        'User-Agent':'Mozilla/5.0(WindowsNT10.0;Win64;x64)AppleWebKit/537.36(KHTML,likeGecko)Chrome/89.0.4389.114Safari/537.36Edg/89.0.774.68',
        'content-type':'application/json',
        'Origin':'https://video.kuaishou.com',
        'Sec-Fetch-Site':'same-origin',
        'Sec-Fetch-Mode':'cors',
        'Sec-Fetch-Dest':'empty',
        'Referer':'https://video.kuaishou.com/profile/' + userId,
        'Accept-Language':'zh-CN,zh;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6',
        'Cookie':Cookie,

    }
    r = requests.post(requestUrl, data=json.dumps(data), headers=headers)  
    r.encoding = 'UTF-8'
    html = r.text
    return html
def down(feeds):
    for feed in feeds:
        #print(feed['photo']['caption'])
        #print(feed['photo']['photoUrl'])
        author = feed['author']['name']
        filename = feed['photo']['caption']
        if not os.path.exists(folder_path + '/' + author + '/'):
            os.makedirs(path)
        filepath = folder_path + '/' + author + '/' + filename + '.mp4'
        filepath.replace("~", "")
        if not os.path.exists(filepath):
            urllib.request.urlretrieve(feed['photo']['photoUrl'], filename=filepath)
            print(filepath + ",下载完成")
        else:
            print(filepath + ",已存在,跳过")

if __name__ == "__main__":
    
    
    p = Pool(10)
    while True:
        result = post(userId,cookie,pcursor)
        data = json.loads(result)
        pcursor = data['data']['visionProfilePhotoList']['pcursor']
        feeds = data['data']['visionProfilePhotoList']['feeds']
        for feed in feeds:
            print(feed['photo']['caption'])
        p.apply_async(down, args=(feeds,))
        #down(feeds)
        if pcursor is '':
            break

    print('Waiting for all subprocesses done...')
    p.close()
    p.join()
    print('All subprocesses done.')
    

免费评分

参与人数 4吾爱币 +2 热心值 +4 收起 理由
积微星球 + 1 + 1 热心回复!牛牛
22222228 + 1 我很赞同!
御座 + 1 + 1 我很赞同!
ma4907758 + 1 谢谢@Thanks!

查看全部评分

发帖前要善用论坛搜索功能,那里可能会有你要找的答案或者已经有人发布过相同内容了,请勿重复发帖。

弗里德曼 发表于 2021-4-8 16:51
D:\kuaishou\KS.PY:66: SyntaxWarning: "is" with a literal. Did you mean "=="?
  if pcursor is '':
D:\kuaishou\KS.PY:66: SyntaxWarning: "is" with a literal. Did you mean "=="?
  if pcursor is '':
Traceback (most recent call last):
  File "D:\kuaishou\KS.PY", line 60, in <module>
    pcursor = data['data']['visionProfilePhotoList']['pcursor']
TypeError: 'NoneType' object is not subscriptable
D:\kuaishou\KS.PY:66: SyntaxWarning: "is" with a literal. Did you mean "=="?
  if pcursor is '':
D:\kuaishou\KS.PY:66: SyntaxWarning: "is" with a literal. Did you mean "=="?
  if pcursor is '':

请问这是怎么了
lmyperson 发表于 2021-4-8 17:24
{'data': {'visionProfilePhotoList': {'result': 2, 'llsid': None, 'webPageArea': None, 'feeds': [], 'hostName': None, 'pcursor': None, '__typename': 'VisionProfilePhotoList'}}}
填写cookie后
data = json.loads(result)
print(data)
打印出来的结果,请问楼主是我哪里没配置好还是?
rmouse 发表于 2021-4-8 14:47
 楼主| TZ糖纸 发表于 2021-4-8 14:50

必须的啊
gentlespider 发表于 2021-4-8 15:01
自动快手出了网页版,爬起来爽多了学习了
 楼主| TZ糖纸 发表于 2021-4-8 15:06
gentlespider 发表于 2021-4-8 15:01
自动快手出了网页版,爬起来爽多了学习了

直接裸奔
弗里德曼 发表于 2021-4-8 15:08
pcursor = ''  这一段是填什么
gentlespider 发表于 2021-4-8 15:38
弗里德曼 发表于 2021-4-8 15:08
pcursor = ''  这一段是填什么

这是翻页。为空的话默认是第一页。下一页的参数可以在当前页面获取,楼主已经写好了,是
pcursor = data['data']['visionProfilePhotoList']['pcursor']
 楼主| TZ糖纸 发表于 2021-4-8 15:39
弗里德曼 发表于 2021-4-8 15:08
pcursor = ''  这一段是填什么

这一段不用填,默认页码。请求下一次 就会直接自动赋值,这一次返回结果会把下一次页码带过来
kiopc 发表于 2021-4-8 15:54
这个不能爬评论吧
huiker231 发表于 2021-4-8 16:01
你能爬其它平台短视频吗?
您需要登录后才可以回帖 登录 | 注册[Register]

本版积分规则

返回列表

RSS订阅|小黑屋|处罚记录|联系我们|吾爱破解 - LCG - LSG ( 京ICP备16042023号 | 京公网安备 11010502030087号 )

GMT+8, 2024-11-25 14:29

Powered by Discuz!

Copyright © 2001-2020, Tencent Cloud.

快速回复 返回顶部 返回列表