【原创】爬取抖音gif表情包

沧浪之水濯我心 发表于 2022-2-11 23:08

本帖最后由沧浪之水濯我心于 2022-2-12 00:12 编辑

某日，我刷到一个表情包视频，下载方法是抖音搜索“渣渣表情包”，输入某id，获取表情包。
我点进去一看，下载表情包还要看广告视频，这我能忍？
直接thor抓包

过程不表，直接说结论：
1.该功能（类似小程序），每个表情包作者都有一个编号，通过搜索编号可以浏览该表情包作者的所有gif
2.通过抓包可以直接获得gif地址
ps：该功能还提供了通过搜索图片标题来获取表情包。

代码思路：
1.我直接循环作者编号（0-999），当然我没测出来该功能所有的作者编号范围
2.从page 0 到 page i获取所有表情包，直到到头没有了
3.（通过表情包标题搜索我也写了个函数，大家可以自行调用以实现）

代码如下：
土豪链接（自行修改后缀为py）：

import urllib.request
import requests
import json
import os
from requests.packages.urllib3.exceptions import InsecureRequestWarning
import socket
#设置超时时间为30s
socket.setdefaulttimeout(30)
requests.packages.urllib3.disable_warnings(InsecureRequestWarning)

#请求头
headers = {
'User-Agent': 'Mozilla/5.0 (iPhone; CPU iPhone OS 15_4 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Mobile/15E148 aweme_19.4.0 JsSdk/2.0 NetType/4G Channel/App Store ByteLocale/zh Region/CN AppTheme/light BytedanceWebview/d8a21c6 Aweme/19.4.0 Mobile ToutiaoMicroApp/2.40.0.1',
'Content-Type': 'application/x-www-form-urlencoded',
'Referer': 'https://tmaservice.developer.toutiao.com/?appid=ttca16bcab552fe68201&version=2.0.0',
'Connection': 'keep-alive',
'Host': 'zaza.guojiangdong.com.cn',
}

#通过标题搜索
# def SearchByTitle():
# keyword = input("请输入表情包标题：\n")
# data = {
#       'do': 'sousuo',
#       'keyword': keyword,
#       'page': '1',
#       'i': '2',
#       'c': 'entry',
#       'a': 'toutiaoapp',
#       'v': '1.0.0',
#       'm': 'mm_qu'
# }
# search_result = requests.post('https://zaza.guojiangdong.com.cn/app/index.php', headers=headers, data=data)
# js = json.loads(search_result.text)

#通过id搜索，我直接遍历所有，你也可以改一下获取指定作者的表情包
def SearchById():
# upid = input("请输入表情包up主编号：\n")
for upid in range(0, 999):
   i = 0
   while i != -1:
         data = {
            'do': 'upimage',
            'upid': upid,
            'tid': '0',
            'page': i,
            'i': '2',
            'c': 'entry',
            'a': 'toutiaoapp',
            'v': '1.0.0',
            'm': 'mm_qu'
         }
         search_result = requests.post('https://zaza.guojiangdong.com.cn/app/index.php', headers=headers, data=data)
         js = json.loads(search_result.text)

         #实测第一页和其他页返回的json结构不一样
         try:
            listimg = js['listimg']
         except:
            listimg = js

         try:
            i += 1
            GetGifById(upid, listimg)
         except:
            #报错就说明到底了
            i = -1

#获取gif地址
def GetGifById(upid, js):
i = 0
#这句代码是用来判断js是否为空的,不然为空不报错
code=js['img']
for name in js:
   img = js['img']
   name = js['name']
   if "https" in img:
         url = img
   else:
         url = "https://zhage1.yayashijue.com/" + img

   GifDownload(upid, name, url)
   i += 1

#获取桌面文件夹
def get_desk_p():
return os.path.join(os.path.expanduser('~'), "Desktop")

# 在桌面创建表情包文件夹，根据作者编号创建子文件夹，然后下载至该文件夹
def GifDownload(upid, name, url):
desktop = get_desk_p() + "\\" + "表情包"
folder_name = str(upid)
filepath = os.path.join(desktop, folder_name)
if not os.path.isdir(filepath):
   os.makedirs(filepath)
img_name = name + ".gif"
filename = filepath + "\\" + img_name

try:
   urllib.request.urlretrieve(url, filename)
except socket.timeout:
   count = 1
   while count <= 2:
         try:
            urllib.request.urlretrieve(url, filename)
            break
         except socket.timeout:
            count += 1
   if count > 2:
         print("downloading this gif fialed!")

if __name__ == "__main__":
SearchById()

前文提到了还可以通过搜索表情包标题来获取表情包，我写了一个快捷指令，使用本快捷指令会在你的图库创建名为“表情包”的相簿，可以根据标题下载表情包到“表情包”相簿。（限IOS用户）
快捷指令链接：

最后放几张爬到的图片
https://zhage1.yayashijue.com/images/2/2022/02/m3TT3kpwC408PKMHS1428tiZI38e81.gif
https://zhage1.yayashijue.com/images/2/2022/02/xBYzmy6gGg1q00000hNQMy10n6bGR0.gif
https://zhage1.yayashijue.com/images/2/2022/02/tzBej9PZ8jEkD6kdtP4PPPj228Dd86.gif
https://zhage1.yayashijue.com/images/2/2022/02/ebag88fsz4909ksI48D8KAI9EZcD00.gif
https://zhage1.yayashijue.com/images/2/2022/02/DsgeK9E27EGzfkEFeFG9Hefex79X7s.gif

沧浪之水濯我心 发表于 2022-2-12 00:09

又测试了一下，发现urllib.request.urlretrieve偶尔会等待时间过长
解决方法：
import socket
#设置超时时间为30s
socket.setdefaulttimeout(30)

然后将urllib.request.urlretrieve(url, filename)替换为如下代码：

try:
   urllib.request.urlretrieve(url, filename)
except socket.timeout:
   count = 1
   while count <= 2:
         try:
            urllib.request.urlretrieve(url, filename)
            break
         except socket.timeout:
            count += 1
   if count > 2:
         print("downloading this gif fialed!")

Libra_c 发表于 2022-2-11 23:15

虽然看不懂,但是大受震撼.牛逼!

诅咒者之魂 发表于 2022-2-12 01:06

感谢分享

pzm102 发表于 2022-2-12 03:38

感谢分享

richardzxq 发表于 2022-2-12 05:37

谢谢分享！

龍謹发表于 2022-2-12 07:16

谢谢楼主分享思路，其实偶想扒微信的。

CCQc 发表于 2022-2-12 07:38

感谢分享

李佑辰 发表于 2022-2-12 09:24

不错，不错，学习到了

xia4166 发表于 2022-2-12 09:30

感谢分享！！！下载试试

页: [1] 2

吾爱破解 - 52pojie.cn's Archiver

【原创】爬取抖音gif表情包