70行代码爬取斗鱼虎牙企鹅直播观看人数
本帖最后由 2079898548 于 2020-8-24 21:52 编辑最近学了下多线程,但不知道爬点什么,然后最后就写个爬取直播人数
然后,就先弄了个双线程。
爬取的是lol和王者荣耀的直播观看人数。
2个游戏有高有低
没引战的意思
刚刚晚上爬的
中午爬的
# -*- coding: utf-8 -*-
import requests
import threading
from lxml import etree
lolURL={
"huya":"https://www.huya.com/g/lol",
"qier":"https://egame.qq.com/livelist?layoutid=lol",
"douyu":"https://www.douyu.com/g_LOL"
}
wzURL={
"huya":"https://www.huya.com/g/2336",
"qier":"https://egame.qq.com/livelist?layoutid=1104466820",
"douyu":"https://www.douyu.com/g_wzry"
}
headers={
'accept':'text/javascript, application/javascript, application/ecmascript, application/x-ecmascript, */*; q=0.01',
'accept-encoding':'gzip, deflate, br',
'accept-language':'zh-CN,zh;q=0.9',
'sec-fetch-dest':'empty',
'sec-fetch-mode':'cors',
'sec-fetch-site':'same-origin',
'user-agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36'
}
lol_people = 0
wz_people = 0
def game_clear(people,game):
if game == 'lol':
global lol_people
lol_people += people
elif game == 'wz':
global wz_people
wz_people += people
def clean(people_list,game):
people = 0.0
for r in people_list:
r=float(r.replace('万','').replace(',','').strip())
people += r
game_clear(people,game)
def game_people(URL,game):
for url,name in zip(URL.values(),URL):
req = requests.get(url=url,headers=headers).text
req_xp = etree.HTML(req)
if name == 'huya':
people_list = req_xp.xpath('//*[@class="js-num"]/text()')
clean(people_list,game)
elif name == 'qier':
people_list = req_xp.xpath('//*[@class="popular"]/text()')
clean(people_list,game)
elif name == 'douyu':
people_list = req_xp.xpath('//*[@class="DyListCover-hot is-template"]/text()')
clean(people_list,game)
else:
print('程序出错')
if __name__ == '__main__':
t1 = threading.Thread(game_people(lolURL,"lol"))
t2 = threading.Thread(game_people(wzURL,"wz"))
t1.start()
t2.start()
print('英雄联盟:%.2f万人'%lol_people)
print('王者荣耀:%.2f万人'%wz_people)
没引战的意思
https://static.52pojie.cn/static/image/hrline/1.gif
还请各位神仙大佬,帮忙看看这多线程,有没有什么问题。{:301_997:}
200积分是个漫长的日子
chaoiqun 发表于 2020-8-24 22:44
lxml模块导入失败,
lxml 有时会在线安装出错,这样你就百度,通过下载lxml的相应whl文件到电脑 ,再打开命令窗口
pip install xxxx.whl 的方式安装 xxxx.whl 指你下载的lxml模块的whl文件名。
2079898548 发表于 2020-8-30 12:02
我的话,很少用F12找,我直接get网站然后把他写到html里面,找到。然后,搜索万,比较,然后代码里面就能 ...
用文字搜索的话为啥我感觉不好整啊,这次你可以用数字或‘万’查找,可如果是要定位一个按钮呢,查找就不方便了吧。还是你有其它方法?
刚我把自己Elements定位到的class属性和用你查找到Sources页的class属性处分别另存为HTML代码,发现2边是不同的,把我这小菜整迷糊了,难道源码还有分?
积极回帖 增加积分哈哈哈哈 好像斗鱼不显示观看人数 显示的是热度 别的不知道,斗鱼这个数是热度,不是人数 import requests
import threading
from lxml import etree这些为什么会显示错误,求大佬指教{:1_896:} 支持,感觉这些直播平台人数还是有水分 lxml模块导入失败,https://pic.downk.cc/item/5f43c6bf160a154a673f0d6a.jpg 完全看不懂, 感谢楼主分享 Hirsch 发表于 2020-8-24 22:21
import requests
import threading
from lxml import etree这些为什么会显示错误,求大佬指教
因为你没有安装相应的库