吾爱破解 - 52pojie.cn

 找回密码
 注册[Register]

QQ登录

只需一步,快速开始

查看: 1131|回复: 9
收起左侧

[已解决] 怎么把遍历出的数据清洗一下保存在表格里面好看一些,还想求助一下怎么爬多页

  [复制链接]
c672569644 发表于 2022-12-5 22:43
本帖最后由 c672569644 于 2022-12-10 00:13 编辑

import requests
import csv
# url = 'https://stock.xueqiu.com/v5/stock/screener/fund/list.json?type=18&parent_type=1&order=desc&order_by=percent&page=1&size=30'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) '
                         'Chrome/107.0.0.0 Safari/537.36 Edg/107.0.1418.62',
           'Cookie': 's=ci12owalds;device_id=bf170c1a5ad2e8f024c7daaa8ea85226; '
                     'Hm_lvt_1db88642e346389874251b5a1eded6e3=1664207247,1664350847; xq_a_token=df4b782b118f7f9cabab6989b39a24cb04685f95; xqat=df4b782b118f7f9cabab6989b39a24cb04685f95; xq_r_token=3ae1ada2a33de0f698daa53fb4e1b61edf335952; xq_id_token=eyJ0eXAiOiJKV1QiLCJhbGciOiJSUzI1NiJ9.eyJ1aWQiOi0xLCJpc3MiOiJ1YyIsImV4cCI6MTY3MjE4Njc1MSwiY3RtIjoxNjY5ODkyMDk3MDA1LCJjaWQiOiJkOWQwbjRBWnVwIn0.R7xVHyE3IoYwNn_YuJIIsuApsxdEf0e-cFwXSzaJRaMWLoHVBua77D5y3SBKbd7EUEZx7BbSn_Ip9JgIO2F66vVWxvMEc1hO6IyAIy-Sz3KXHF4FOeJsAevLtXRV2JW2MfQZW2KaPXpNJSFpy7t15ER-1K4jI9wd9kpYPsl8c3du3m4pSp7TKd-fhMwXFYseOIlASUIg-Mp-zdzUDbPIjm6vV9enbnK_30Cg-jnsXFVb3QUnijVYVAVRuX5kLFQbXKpMUnW4KorKVPf0TNZgM7Hx0UshevE0n3tWLsBErV_W3NJ_lc6NYtTQraxSmBUFaVcahBi1xbilHPwjsBkh9w; u=691669892154166',
           'Origin': 'https://xueqiu.com',
           'Referer': 'https://xueqiu.com/hq', }
url = 'https://stock.xueqiu.com/v5/stock/screener/fund/list.json?type=18&parent_type=1&order=desc&order_by=percent&page=1&size=30'
response = requests.get(url=url, headers=headers)
json = response.json()
data1 = json['data']['list']
for i in data1:
    print()
    f = open('123.csv', mode='a', encoding='utf-8', newline='')
    csv_writer = csv.writer(f)
    csv_writer.writerow()

发帖前要善用论坛搜索功能,那里可能会有你要找的答案或者已经有人发布过相同内容了,请勿重复发帖。

雨陌 发表于 2022-12-6 00:30
[Python] 纯文本查看 复制代码
import requests
import openpyxl

MAX_PAGE = 3
IS_SETTING_TITLE = False

# url = 'https://stock.xueqiu.com/v5/stock/screener/fund/list.json?type=18&parent_type=1&order=desc&order_by=percent&page=1&size=30'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) '
                         'Chrome/107.0.0.0 Safari/537.36 Edg/107.0.1418.62',
           'Cookie': 's=ci12owalds;device_id=bf170c1a5ad2e8f024c7daaa8ea85226; '
                     'Hm_lvt_1db88642e346389874251b5a1eded6e3=1664207247,1664350847; xq_a_token=df4b782b118f7f9cabab6989b39a24cb04685f95; xqat=df4b782b118f7f9cabab6989b39a24cb04685f95; xq_r_token=3ae1ada2a33de0f698daa53fb4e1b61edf335952; xq_id_token=eyJ0eXAiOiJKV1QiLCJhbGciOiJSUzI1NiJ9.eyJ1aWQiOi0xLCJpc3MiOiJ1YyIsImV4cCI6MTY3MjE4Njc1MSwiY3RtIjoxNjY5ODkyMDk3MDA1LCJjaWQiOiJkOWQwbjRBWnVwIn0.R7xVHyE3IoYwNn_YuJIIsuApsxdEf0e-cFwXSzaJRaMWLoHVBua77D5y3SBKbd7EUEZx7BbSn_Ip9JgIO2F66vVWxvMEc1hO6IyAIy-Sz3KXHF4FOeJsAevLtXRV2JW2MfQZW2KaPXpNJSFpy7t15ER-1K4jI9wd9kpYPsl8c3du3m4pSp7TKd-fhMwXFYseOIlASUIg-Mp-zdzUDbPIjm6vV9enbnK_30Cg-jnsXFVb3QUnijVYVAVRuX5kLFQbXKpMUnW4KorKVPf0TNZgM7Hx0UshevE0n3tWLsBErV_W3NJ_lc6NYtTQraxSmBUFaVcahBi1xbilHPwjsBkh9w; u=691669892154166',
           'Origin': 'https://xueqiu.com',
           'Referer': 'https://xueqiu.com/hq', }
wb = openpyxl.Workbook()
wk = wb.active
for page in range(1, MAX_PAGE + 1):
    url = f'https://stock.xueqiu.com/v5/stock/screener/fund/list.json?type=18&parent_type=1&order=desc&order_by=percent&page={page}&size=30'
    response = requests.get(url=url, headers=headers)
    json_data = response.json()
    data = json_data['data']['list']
    if not IS_SETTING_TITLE:
        wk.append(list(data[0].keys()))
        IS_SETTING_TITLE = True
    for v in data:
        wk.append([str(i) if type(i) is not list else ','.join(i) for i in v.values()])
wb.save("stock.xlsx")

免费评分

参与人数 1吾爱币 +1 热心值 +1 收起 理由
c672569644 + 1 + 1 谢谢@Thanks!

查看全部评分

cloud2010 发表于 2022-12-6 06:50

观察每页url 差异,向不同页的url发请求可以实现抓取多页

观察响应结果页面结构,用正则或其他方式解析可以实现精准获取数据
sht281 发表于 2022-12-6 08:13
cao777 发表于 2022-12-6 08:47
你想要在表格里好看些 那就只能是json格式的数据了
O2H2O 发表于 2022-12-6 09:22
请问直接把url贴进浏览器地址里,打不开网页的呢?
https://stock.xueqiu.com/v5/stock/screener/fund/list.json?type=18&parent_type=1&order=desc&order_by=percent&page=1&size=30
 楼主| c672569644 发表于 2022-12-6 17:51
O2H2O 发表于 2022-12-6 09:22
请问直接把url贴进浏览器地址里,打不开网页的呢?
https://stock.xueqiu.com/v5/stock/screener/fund/lis ...

这个我也不懂你自己开个帖子问问
 楼主| c672569644 发表于 2022-12-6 17:54
雨陌 发表于 2022-12-6 00:30
[mw_shl_code=python,true]
import requests
import openpyxl

感谢大佬
雨陌 发表于 2022-12-6 20:39
O2H2O 发表于 2022-12-6 09:22
请问直接把url贴进浏览器地址里,打不开网页的呢?
https://stock.xueqiu.com/v5/stock/screener/fund/lis ...

要带上cookie,因为这个是一个内部的接口。
O2H2O 发表于 2022-12-7 09:48
雨陌 发表于 2022-12-6 20:39
要带上cookie,因为这个是一个内部的接口。

好的,谢谢了我还要多学习~
您需要登录后才可以回帖 登录 | 注册[Register]

本版积分规则

返回列表

RSS订阅|小黑屋|处罚记录|联系我们|吾爱破解 - LCG - LSG ( 京ICP备16042023号 | 京公网安备 11010502030087号 )

GMT+8, 2024-11-25 04:47

Powered by Discuz!

Copyright © 2001-2020, Tencent Cloud.

快速回复 返回顶部 返回列表