python使用BS4爬取酷狗top500存储到指定目录
本帖最后由 lu1108 于 2023-3-23 09:04 编辑求助大神,我想要把这个数据存储到txt文档,搞了半天,找到了很多实例案例,但是都不是想要的
实现效果就是把data数据里面的rank singer song time改成中文的(下面是两幅代码(文件不知道怎么上传不了,所以就直接复制的代码)第一个是原始的,第二个是已经改好的)然后存储txt到指定目录
import requests
from bs4 import BeautifulSoup
import time
import xlwt
headers = {
'User-Agent': 'Mozilla/5.0(Windows NT 10.0; Win64; x64) AppleWebKit/537.36(KHTML,'
'like Gecko) Chrome/111.0.0.0 Safari/537.36 Edg/111.0.1661.44'
}
def get_info(url):
wb_data = requests.get(url, headers=headers)
soup = BeautifulSoup(wb_data.text, 'lxml')
ranks = soup.select('span.pc_temp_num')
titles = soup.select('div.pc_temp_songlist > ul > li > a')
times = soup.select('span.pc_temp_tips_r > span')
for rank, title, time in zip(ranks, titles, times):
data = {
'rank': rank.get_text().strip(),
'singer': title.get_text().split('-').strip(),
'song': title.get_text().split('-').strip(),
'time': time.get_text().strip()
}
print(data)
if __name__ == '__main__':
urls = ['http://www.kugou.com/yy/rank/home/{} - 8888.html'.format(str(i)) for i in range(1, 2)]
for url in urls:
get_info(url)
time.sleep(1)
import requests
from bs4 import BeautifulSoup
import time
import xlwt
headers = {
'User-Agent': 'Mozilla/5.0(Windows NT 10.0; Win64; x64) AppleWebKit/537.36(KHTML,'
'like Gecko) Chrome/111.0.0.0 Safari/537.36 Edg/111.0.1661.44'
}
def get_info(url):
wb_data = requests.get(url, headers=headers)
soup = BeautifulSoup(wb_data.text, 'lxml')
ranks = soup.select('span.pc_temp_num')
titles = soup.select('div.pc_temp_songlist > ul > li > a')
times = soup.select('span.pc_temp_tips_r > span')
for rank, title, time in zip(ranks, titles, times):
data = {
'排名': rank.get_text().strip(),
'歌手': title.get_text().split('-')[1].strip(),
'歌曲': title.get_text().split('-')[0].strip(),
'时间': time.get_text().strip()
}
print(data)
if __name__ == '__main__':
urls = ['http://www.kugou.com/yy/rank/home/{} - 8888.html'.format(str(i)) for i in range(1, 2)]
for url in urls:
get_info(url)
time.sleep(1)
1.改成这样:def get_info(url,file):
2.print(data)前面加入这句:with open(file, 'a', encoding='utf-8') as f:
f.write(str(data) + '\n')
3.urls下面添加这句:#保存路径
file = '你的电脑桌面目录路径/kugou.txt'
4.改成这样:get_info(url,file)
5.你的链接中{}到8888之间有空格检查下
还有人用bs4???? adx123456 发表于 2023-3-23 04:06
还有人用bs4????
bs4落伍了吗{:1_909:} 撒旦の恶 发表于 2023-3-23 02:47
1.改成这样:def get_info(url,file):
2.print(data)前面加入这句:with open(fil ...
感谢大神的回答 adx123456 发表于 2023-3-23 04:06
还有人用bs4????
呜呜,学校教学使用,不是技术员~ lu1108 发表于 2023-3-23 08:53
呜呜,学校教学使用,不是技术员~
学校啊 难怪!!!! bs4都几年不更新了
页:
[1]