爬取福布斯富豪排名
本帖最后由 illuminate123 于 2022-9-21 18:51 编辑一共约2500条数据,爬取结果如下:
一共大约2500条数据,爬取结果如下:
!(assets/image-20220921185042795.png)
```
import requests
import csv
import parsel
#注意headers里面的大小写
with open('福布斯富豪榜.csv', 'w', encoding='utf_8_sig', newline='') as f:
csv_writer = csv.writer(f)
csv_writer.writerow(['rank', 'english_name',"chinese_name","wealth_value","wealth_source","country","age"])
headers={
'Cookie':'acw_tc=0bc1598f16626875705425258e66473deef35a301a9a96feee4c846eb71426; Hm_lvt_aa8b760f41278f94669da4685a1ce4fa=1662687577; XSRF-TOKEN=eyJpdiI6InRKaUhwTzkrXC81VXJlVVFQT3UzNkZ3PT0iLCJ2YWx1ZSI6IlMydkkxUkt1M2tiQ1FIZ2lTVTRZeUwyS09TNjJqNUJxejVxbjZ4SmcrUnlSZTFWaTFETGlFRERSSXFrbTIwVjIiLCJtYWMiOiIzYWY1ZjczNWVjN2Q4NjE4NjdlYjIyYzk5MzJlYTM0MDQ2YWZhMzM0OGEyMzQ4NjdkMjM1YmExNzg5MTcyZGU5In0=; laravel_session=eyJpdiI6Im1seHhRZGtcL0gxZWdcL1RaVzJaRmJaQT09IiwidmFsdWUiOiJBN2tDM1JMODkxOERvMGZjd1RWUE5kam5Cd2puanBtQWNVWVRTNGZkVzlLWHZMcUdHeHNYMlppQUVpNWlQejdhIiwibWFjIjoiMDkxNDE3MzY4MzI4M2Q0ZWIyZDZjMzI2ZDRhZTJkMDQyMTE1NThkODQwOTczZmRlZGUzNmJmOTBlYWU3MjNmYyJ9; Hm_lpvt_aa8b760f41278f94669da4685a1ce4fa=1662687601',
'Referer':'https://cn.bing.com/',
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36 Edg/105.0.1343.27'
}
requests.packages.urllib3.disable_warnings()
url='https://www.forbeschina.com/lists/1781'
response=requests.get(url=url,headers=headers,verify=False)
# pprint(response.text)
selector=parsel.Selector(response.text)
rank=selector.xpath('//*[@id="data-view"]/tbody/tr/td/text()').getall()
english_name=selector.xpath('//*[@id="data-view"]/tbody/tr/td/text()').getall()
chinese_name=selector.xpath('//*[@id="data-view"]/tbody/tr/td/text()').getall()
wealth_value=selector.xpath('//*[@id="data-view"]/tbody/tr/td/text()').getall()
wealth_source=selector.xpath('//*[@id="data-view"]/tbody/tr/td/text()').getall()
country=selector.xpath('//*[@id="data-view"]/tbody/tr/td/text()').getall()
age=selector.xpath('//*[@id="data-view"]/tbody/tr/td/text()').getall()
for i in range(len(country)):
with open('福布斯富豪榜.csv', 'a', encoding='utf_8_sig', newline='') as f:
csv_writer = csv.writer(f)
csv_writer.writerow(, english_name, chinese_name, wealth_value, wealth_source, country, age])
```
Khaoss 发表于 2022-9-27 16:04
萌新求教:
请问,headers里面的 'Cookie'的值是哪来的?
'Cookie'是你的浏览器标识,浏览器F12里面有
爬取它,有什么用,绑架富豪么?? 爬取出来这个能干嘛啊,能看到实时排名吗,不过爬取过程还是值得参考的 第一张图裂了。
这个要用什么来调用呢。 应该叫福布斯猎杀富豪排名
真正顶级富豪都隐形了 看到我了吗?如果没有的话我在努力一把 这玩意有必要爬吗? 你能不能弄点好玩的,这没用哈哈 遥不可及的你我只能远远看着 怎么没看到我
页:
[1]
2