好友
阅读权限10
听众
最后登录1970-1-1
|
最近因为一个题库小程序的需要,在做开发的时候,小程序做好后,题库需要输入数据库,就用python做了一个word转excel的小案例;
运行结果是这样的:
原版word图:
下面附上代码:
import re
import xlrd
import pandas as pd
from docx import Document
from collections import OrderedDict
import collections
doc = Document("E:\睿丰凌通\建靠题库/1111.docx")
sections = doc.sections
paragraphs = doc.paragraphs
black_char = re.compile("\..*。") #题目匹配
title_rule = re.compile("\..*。") #题目匹配
option_rule1 = re.compile("\d")#选项匹配
option_rule = re.compile("[ABCDE]\.\w.*")#选项匹配
option_rule_search = re.compile("[ABCDE]\..*")#选项匹配
daan_rule_search = re.compile("【答案】.*")
question_type2data = OrderedDict()
title2options = OrderedDict()
for paragraph in doc.paragraphs[0:55]:
# print(paragraph.text)
titlerule = re.search(r'\..*。', paragraph.text, re.M | re.I)
optionrule = re.search(r'[ABCDE]\.\w.*', paragraph.text, re.M | re.I)
if titlerule:
line = titlerule.group()
options = title2options.setdefault(line, [])
# print("题目", title)
elif optionrule:
optionrule = re.findall(r'[ABCDE]\.\w*', optionrule.group(), re.M | re.I)
# print("选项",optionrule)
options.extend(optionrule)
# print("选项",options)
# title2options = question_type2data.setdefault(line, OrderedDict())
# print(title2options.items())
# print("题目", title2options)
# title2options = question_type2data.setdefault(line,line)
result = []
# max_options_len = 0
# for title2options,options in title2options.items():
for title,options in title2options.items():
# print(options)
result.append([title,*options])
print("题目", result)
# options_len = len(options)
# if options_len > max_options_len:
# max_options_len = options_len
df = pd.DataFrame(result, columns=[
"题目", "选项A","选项B","选项C","选项D"])
# 题型可以简化下,去掉选择两个字
# df['题型'] = df['题型'].str.replace("选择", "")
df.to_excel("result.xlsx", index=False)
|
免费评分
-
查看全部评分
|
发帖前要善用【论坛搜索】功能,那里可能会有你要找的答案或者已经有人发布过相同内容了,请勿重复发帖。 |
|
|
|
|