爬虫小白求助
最近自学了一段时间爬虫,想练练手,爬下自己城市的公交线路,写完后 一直报错,也不知道问题出在哪了,请各位大神帮忙看下,小弟不胜感激涕零~~~代码如下:
import requests
from lxml import etreeheaders = {
'user-agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) \
Chrome/70.0.3538.25 Safari/537.36 Core/1.70.3741.400 QQBrowser/10.5.3863.400'
}
items = []
def parse_car_num(c):
tree = etree.HTML(c)
car_name = tree.xpath('//div[@class="info"]/h1/text()')
run_time = tree.xpath('//ul[@class="bus-desc"]/li/text()')
ticket_info = tree.xpath('//ul[@class="bus-desc"]/li/text()')
up_car_route = tree.xpath('//div[@class="bus-lzlist mb15"]/ol/li/a/text()')
down_car_route = tree.xpath('//div[@class="bus-lzlist mb15"]/ol/li/a/text()')
item = {
'公交路线':car_name,
'运行时间':run_time,
'票价信息':ticket_info,
'去程信息':up_car_route,
'回程信息':down_car_route
}
items.append(item)
def parse_num(b):
r = requests.get(url=b,headers=headers)
parse_car_num(r.text)
def parse_car_list(a):
tree = etree.HTML(a)
car_list = tree.xpath('//div[@class="bus-layer depth w120"]/div/div/a/@href')
for href_listin car_list:
href_list1 = 'https://xiaogan.8684.cn'+href_list
parse_num(href_list1)
def parse_page():
url = 'https://xiaogan.8684.cn/'
r = requests.get(url,headers=headers)
parse_car_list(r.text)
def main():
parse_page()
fp = open('孝感公交.txt' 'w+', encoding='utf8')
for item in items:
fp.write(str(item))
fp.close()
if __name__ == '__main__':
main()
报错如下:Traceback (most recent call last):File "C:/Users/Administrator/PycharmProjects/untitled5/cxm_test/python/unintest/test_a.py", line 104, in <module> main()File "C:/Users/Administrator/PycharmProjects/untitled5/cxm_test/python/unintest/test_a.py", line 97, in main fp = open('孝感公交.txt' 'w+', encoding='utf8')FileNotFoundError: No such file or directory: '孝感公交.txtw+'
我试了可以运行的,只不过返回的列表是为空的,原因是因为fp = open('孝感公交.txt' 'w+', encoding='utf8') 这段代码少了一个, fp = open('孝感公交.txt' <这里少了一个逗号>'w+', encoding='utf8') yc19951005 发表于 2020-6-10 17:21
我试了可以运行的,只不过返回的列表是为空的,原因是因为fp = open('孝感公交.txt' 'w+', encoding='utf8' ...
感谢,原来是少了个逗号。确实返回的列表是空的,看来还是学艺不精,要继续努力了。 chenduizhang 发表于 2020-6-10 23:44
感谢,原来是少了个逗号。确实返回的列表是空的,看来还是学艺不精,要继续努力了。
可能是xpath的路径不对或者其他的
页:
[1]