Python3 如何格式化web日志文件
Python3 如何读取W3C格式web日志文件,并格式化 贴点代码和示例数据 zdnyp 发表于 2019-10-15 16:52贴点代码和示例数据
1.日志文件片段
#Software: Microsoft Internet Information Services 7.5
#Version: 1.0
#Date: 2017-03-31 07:25:24
#Fields: date time s-ip cs-method cs-uri-stem cs-uri-query s-port cs-username c-ip cs(User-Agent) sc-status sc-substatus sc-win32-status time-taken
2017-03-31 07:25:24 124.205.208.142 GET /otype.asp classid=4%20and%201=2 90 - 124.205.208.142 Mozilla/5.0+(Windows+NT+6.1;+WOW64)+AppleWebKit/537.36+(KHTML,+like+Gecko)+Chrome/52.0.2743.82+Safari/537.36 200 0 0 64
2017-03-31 07:25:24 124.205.208.142 GET /include/css.css - 90 - 124.205.208.142 Mozilla/5.0+(Windows+NT+6.1;+WOW64)+AppleWebKit/537.36+(KHTML,+like+Gecko)+Chrome/52.0.2743.82+Safari/537.36 304 0 0 1
2017-03-31 07:25:24 124.205.208.142 GET /include/data.js - 90 - 124.205.208.142 Mozilla/5.0+(Windows+NT+6.1;+WOW64)+AppleWebKit/537.36+(KHTML,+like+Gecko)+Chrome/52.0.2743.82+Safari/537.36 304 0 0 1
2017-03-31 07:25:24 124.205.208.142 GET /include/date.js - 90 - 124.205.208.142 Mozilla/5.0+(Windows+NT+6.1;+WOW64)+AppleWebKit/537.36+(KHTML,+like+Gecko)+Chrome/52.0.2743.82+Safari/537.36 304 0 0 1
2017-03-31 07:25:24 124.205.208.142 GET /images/1.gif - 90 - 124.205.208.142 Mozilla/5.0+(Windows+NT+6.1;+WOW64)+AppleWebKit/537.36+(KHTML,+like+Gecko)+Chrome/52.0.2743.82+Safari/537.36 304 0 0 0
2017-03-31 07:25:24 124.205.208.142 GET /images/gaobei_top2.gif - 90 - 124.205.208.142 Mozilla/5.0+(Windows+NT+6.1;+WOW64)+AppleWebKit/537.36+(KHTML,+like+Gecko)+Chrome/52.0.2743.82+Safari/537.36 304 0 0 2
2017-03-31 07:25:24 124.205.208.142 GET /images/menu-bg.gif - 90 - 124.205.208.142 Mozilla/5.0+(Windows+NT+6.1;+WOW64)+AppleWebKit/537.36+(KHTML,+like+Gecko)+Chrome/52.0.2743.82+Safari/537.36 304 0 0 2
2.需求:格式化处理后以效果要求
日期:2017-03-31 时间:07:25:24 访问者IP:124.205.208.142 ...这样的
guoyingjjjj 发表于 2019-10-15 17:00
1.日志文件片段
#Software: Microsoft Internet Information Services 7.5
#Version: 1.0
open_file = open(filePath, 'r')
Analysis_List = []
Key_List = title_List
Info_list = []
# 将每行数据存入到列表
for line in islice(open_file, 0, None):
Analysis_List.append(line.split('\n'))
# 删除每个列表元素最后的空字符串
for item in range(0, len(Analysis_List)):
end = len(Analysis_List) - 1
del Analysis_List
# 将列表中的元素转为字符串
for item in range(0, len(Analysis_List)):
Analysis_List = ''.join(Analysis_List)
# 处理注释行
while True:
for item in range(0, len(Analysis_List)):
if Analysis_List == "#":
del Analysis_List
break
if item == len(Analysis_List) - 1:
break
# 生成信息列表
for item in range(0, len(Analysis_List)):
vlue_list = []
vlue_str = ''
temp = Analysis_List + ' '
for i in range(0, len(temp)):
if temp == ' ':
vlue_list.append(vlue_str)
vlue_str = ''
continue
vlue_str = vlue_str + temp
Info_list.append(vlue_list)
正则表达式 datalist = perline.split()
每行数据用空格分隔取前三个元素不就是需要的三个数据 guoyingjjjj 发表于 2019-10-15 17:07
open_file = open(filePath, 'r')
Analysis_List = []
Key_List = title_List
正则日啊 挺简单的嘛 guoyingjjjj 发表于 2019-10-15 17:00
1.日志文件片段
#Software: Microsoft Internet Information Services 7.5
#Version: 1.0
with open读成list,然后split,取前三个索引就可以了
页:
[1]