小白刚学习python爬虫,分享一下代码,tx招聘的数据
本帖最后由 3651535042 于 2019-8-26 20:42 编辑有免费评分的。给下评分谢谢。攒点吾爱币去换教程!
刚刚学习python爬虫一个星期,所以写的并不是很好,大牛看到直接略过就好。
爬取的是腾讯招聘信息数据用的库有,requests,threading,json(网站的数据是在ajax里所以需要json),pandas(保存csv数据),
import requests
from lxml import etree
import time
from threading import Thread
import json
import pandas as pd
headers={
#请求头
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36'
}
#定义get函数
def get_html(url):
try:
r = requests.get(url,headers=headers)
r.encoding='utf8'
print('开始采集')
time.sleep(2)
return r.text
except EnvironmentError as e:
return e
def get_xpath(html):
data=json.loads(html)
names=[]
locations=[]
bgns=[]
products=[]
cates=[]
times=[]
for i in data['Data']['Posts']:
#职位名称
name = i['RecruitPostName']
#所在地
location = i['LocationName']
#
bgn = i['BGName']
#
product = i['ProductName']
#工作类型
cate = i['CategoryName']
#发布时间
time = i['LastUpdateTime']
names.append(name)
locations.append(location)
bgns.append(bgn)
products.append(product)
cates.append(cate)
times.append(time)
tp = pd.DataFrame({
'职位':names,
'地址':locations,
'分级':bgns,
'部门':products,
'类型':cates,
'发布时间':times
})
tp.to_csv('腾讯招聘.csv',encoding='utf8',mode='a',index=None,header=False)
def main(start_url,end_url):
for i in range(start_url,end_url):
url='https://careers.tencent.com/tencentcareer/api/post/Query?timestamp=1566200594583&countryId=&cityId=&bgIds=&productId=&categoryId=40001001,40001002,40001003,40001004,40001005,40001006,40002001,40002002,40003001,40003002,40003003,40004,40005001,40005002,40006,40007,40008,40009,40010,40011&parentCategoryId=&attrId=&keyword=&pageIndex={}&pageSize=10&language=zh-cn&area=cn'.format(i)
data=get_html(url=url)
get_xpath(html=data)
if __name__ == '__main__':
#多线程
thread=[]
t1=Thread(target=main,args=(0,100))
t2=Thread(target=main,args=(100,200))
t3=Thread(target=main, args=(200,300))
t4=Thread(target=main,args=(300,400))
t5=Thread(target=main,args=(400,492))
thread +=
for i in thread:
i.start()
for i in thread:
i.join()
代码如果有什么问题可以评论 cc78947 发表于 2019-8-25 11:43
except EnvironmentError as e:
return e
异常处理没必要return ,其他写的挺好的, 都用起 ...
那保存成什么,现在只知道数据能保存成CSV和json except EnvironmentError as e:
return e
异常处理没必要return ,其他写的挺好的, 都用起来pd了, 保存成csv有点浪费 为楼主点个赞!继续努力啊! 分享学习,共同进步:victory: 楼主有教程吗,分享一下,马上开学了有的是时间学一下 可以的 走出了第一步
不错 很好
{:1_918:}{:1_918:} 学习了,点赞 感谢大佬的分享:Dweeqw
页:
[1]
2