小白爬取所有混合基金的基金代码

a5228172 发表于 2020-10-23 20:15

违规麻烦联系删除 - -
只爬出基金代码其他的都没有爬取 EXCEL没有处理好单列不会处理
import pandas as pd,re
import requests
import random
#http://fund.eastmoney.com/HH_jzzzl.html
def main():
aa1=[]
datalist = {}
for i in range(1,23):
baseurl = "http://fund.eastmoney.com/Data/Fund_JJJZ_Data.aspx?t=1&lx=3&letter=&gsid=&text=&sort=zdf,desc&page="+str(i)+",200&dt=1603365267841&atfc=&onlySale=0"
html = getdata(baseurl)
# print(html)
# html = duquwenjian()
aa1.extend(jiexidata(html))
print(f'第{i}页')
datalist['基金代码'] = aa1
save(datalist,".\\123.xls")
# html = duquwenjian()
def getdata(baseurl):
user_agent_list = ['Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.71 Safari/537.1 LBBROWSER','Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; QQDownload 732; .NET4.0C; .NET4.0E)','Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.84 Safari/535.11 SE 2.X MetaSr 1.0','Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Maxthon/4.4.3.4000 Chrome/30.0.1599.101 Safari/537.36','Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.122 UBrowser/4.0.3214.0 Safari/537.36']

headers = {
"User-Agent":random.choice(user_agent_list)
}

data = {} #,params=data
try:
response = requests.get(baseurl,headers=headers ,timeout=1)
if response.status_code == 200:
with open("123.txt","w+",encoding="utf-8") as f:
f.write(str(response.text))
return response.text
except RequestsException:
return None
def jiexidata(html):
a= re.findall(".*datas:\[(.*)],count.*",html,re.S)
a= str(a)
# print(a)
a = re.sub("\[","",a)
a = re.sub('"',"",a)
a = re.split("],",a)
# print(len(a))
a1 = [] #基金代码
a2 = [] #基金名称
a3 = [] #单位净值当天
a4 = [] #累计净值当天
a5 = [] #单位净值昨天
a6 = [] #累计净值昨天
# print(a)
for i in range(0,len(a)):
# print(a)
b = re.split(",",a)
a1.append(b)
return a1
def duquwenjian():
with open("123.txt","r",encoding="utf-8") as f:
a = f.read()
f.close
return a
def save(a,savepath):
students = pd.DataFrame(a)
# print(list(a.keys()))
# students = students.set_index(list(a.keys()))
students.to_excel(savepath)
if __name__ == '__main__':
main()

Pengpo 发表于 2021-1-23 21:06

Sun_Dream 发表于 2020-12-8 12:19
近一月近三月近六月近一年基金经理基金公司成立日期基金规模赎回状态这些信息怎么爬啊楼主 ...

我也想知道，请问层主研究出来了吗

Sun_Dream 发表于 2020-12-8 12:19

近一月近三月近六月近一年基金经理基金公司成立日期基金规模赎回状态这些信息怎么爬啊楼主要天天基金的数据

woyaoshangshiqi 发表于 2020-10-23 21:00

挺好的，收藏了

Ranger233 发表于 2020-10-23 23:02

今天又是关灯喝凉水的一天。。。

huazang110 发表于 2020-10-23 23:05

一片绿油油的稻田，一下回到解放前，又白玩一个月{:1_907:}

xushifu 发表于 2020-10-23 23:13

天天吃大面韭菜割了一波又一波

任逍遥 发表于 2020-10-24 17:47

诺安配银河的路过

wanshiz 发表于 2020-10-25 07:54

谢谢楼主，真错的

elvisluciker999 发表于 2020-10-26 10:54

已经做的很棒了我也在学习中

页: [1] 2

吾爱破解 - 52pojie.cn's Archiver

小白爬取所有混合基金的基金代码