哔哩哔哩番剧（新人投名状）

zac7 发表于 2019-8-14 15:04

今天刚刚注册52pojie，平时每天都有看。
但是因为某些原因今天才注册这个账号，作为一个新人，特别像迫不及待的跟大家探讨，交流。
楼主学18年学的python，19年没有用过，导致忘得差不多了，所以我想把python给捡起来。emmmm,希望大家共同进步吧，作为发布的第一个帖子，我选择的是一个哔哩哔哩的爬取，其实也没啥难度。
"""
1. 分析返回番剧列表信息的真正请求。用requests包构造请求,得到响应的json数据。（20+10）
2. 爬取第一页信息，控制台print两个字段番剧名、缩略图网址。（20）
3. 前十页，新增字段播放量、投稿时间。按投稿时间倒序，保存为csv或xlsx格式。（20+20）
提示：减少无用的又不容易构造的可选参数，不影响请求。
request_url=https://api.bilibili.com/x/web-interface/newlist?callback=jqueryCallback_bili_42885308963105384&rid=33&type=0&pn=1&ps=20&jsonp=jsonp&_=1546829716288
"""

上面的注释是当时做这个东西的要求，以及分数占比
# -*- coding: utf-8 -*-# @Time : 2019/1/7 10:53
# @AuThor: Zachary

import requests
import json
import time
import csv
# from lxml import etree
# from urllib import request
class Bilispider(object):
def __init__(self):
   self.url='https://api.bilibili.com/x/web-interface/newlist?'
   # 伪装请求头
   self.header={

         'Referer': 'https://www.bilibili.com/v/anime/serial/?spm_id_from=333.334.b_7072696d6172795f6d656e75.8',
         'Host': 'api.bilibili.com',
         'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36'
   }
   self.save_header()

   # self.html=''

def get_html(self):
   # 请求html
   for i in range(1, 11):
         print(f'正在爬取第{i}页')
         params={
            'rid':33,
            'type':0,
            'pn':i,
            'ps':20

         }
         # 拼接url，设置爬取 i页数
         # 限制爬取速度2S，
         time.sleep(5)
         # 请求源代码
         resp = requests.get(url=self.url,params=params,headers=self.header).text
         self.parse_html(resp)

def parse_html(self,resp):
   # json解析
   html_dict = json.loads(resp)
   # print(html_dict)
   # 获取archives里面的数据
   data = html_dict['data']['archives']
   #遍历json里data大字典
   list = []
   for content in data:
         title = content['title'] # 标题
         # print(title)
         img = content['pic'] # 照片
         view = content['stat']['view'] # 播放量
         ctime = content['ctime'] # 投稿时间
         if ctime:
            # 将投稿时间转换为指定格式的日期
            timeArray = time.localtime(ctime)
            otherStyleTime = time.strftime("%Y-%m-%d %H:%M:%S", timeArray)

            new =
            list.append(new)
            self.save_svg(list)
         print(f'番剧名：{title}\n缩略图：{img}\n播放量：{view}\n投稿时间：{otherStyleTime}')
def save_svg(self, new):
   # 保存到本地文件
   with open('bili番剧.csv',mode='w',newline='', encoding='utf-8')as f:
         writer = csv.writer(f)
         for n in new:
            writer.writerow(n)

def save_header(self):
   header = [['番剧名', '图片链接', '播放量', '投稿时间']]
   with open('bili番剧.csv', mode='w', newline='', encoding='utf-8') as f:
         writer = csv.writer(f)
         writer.writerow(header)

def run(self):
   self.get_html()
   self.save_svg(new=self.parse_html)

if __name__ == '__main__':
s=Bilispider()
s.run()

胡椒粉 发表于 2019-8-15 09:51

看不懂，哈哈{:301_971:}

绝版coco 发表于 2019-8-14 18:47

学习了！

zac7 发表于 2019-8-14 22:07

绝版coco 发表于 2019-8-14 18:47
学习了！

互相学习

beifangnongfu 发表于 2019-8-14 23:29

我不懂，学习学习

zac7 发表于 2019-8-15 10:46

胡椒粉发表于 2019-8-15 09:51
看不懂，哈哈

全是注释啊兄弟，你再打打基础

页: [1]

吾爱破解 - 52pojie.cn's Archiver

哔哩哔哩番剧（新人投名状）