python 在excle表格统计与制作方面的实践
本帖最后由 zohoChou 于 2022-6-24 19:49 编辑年前老师让我整个excel统计全年级的成绩,差点没搞死我,于是痛定思痛,自己搞了个python出来,力求一个文件解决一切。。。。。。
【2022.06.24已更新】
吾爱首发完整解析(csdn上面的也是我写的,但是极其简略,别说什么抄袭改写了)
注,这次的没发csdn
我回来了!!!
因为学校把51假期挪到了暑假,没法鸽了(狗头
你木见过的船新版本等你来探……
# 1.0.1版……
### 事实证明意外还是有的……
## 更新内容:增加了判定缺考的功能。(其他零分的情况默认为因故缺考,不计入有效科目内)
## 只更新了精简注释版
## 目录架构更改:需要在目录下新建一个unattended.xlsx的文件
__________________
例:|AAA | 1 | 一共两列,一列是姓名,一列记录那个人缺考的科目数量。
--------------------
# 假的更新日志(溜……# 1.0.0(最后一版,如果没有意外……)
## 1.把几乎所有的变量全都进行了规范化命名,再也不是for i, for j, for k了 ……
### 注:函数外变量为完整命名,函数内为简化命名,如:each_class_stu -> class_stu 等。
## 2.重写40%以上逻辑特别是当时自己写的取总人数,绕了一大圈……
### (逻辑上还可以优化,但是由于担心以后维护时看不懂,专门把逻辑写的很简单易懂【有小部分冗余】)。
## 3.回归初心,以pandas为主体,openpyxl专门格式化输出。### (因没看懂官方文档,均为pandas输出后使用openpyxl重读并重新输出,未使用openpyxl直接协同pandas)
## 4.重写注释,增加函数说明
小贴士:pandas与openpyxl有诸多不同,不要混了
传参一定不要反复逮着i用,极容易出现参数值混乱……
从未写过这么复杂的python……
注:实际上传时删掉了学校信息,可能少了十几个字
原始目录架构
https://s1.ax1x.com/2022/04/23/LfaPXD.png
原始文件:
结束运行后的目录架构
https://s1.ax1x.com/2022/04/23/LfdzQK.png
处理后的Sheet1
处理生成的Sheet2(排名已处理)
处理生成的汇总表
上代码!
```
# coding=utf-8
# 第一句很有必要!因为很多系统(特别是Windows,中文编码容易出问题)
# 为了执行cmd命令
import os
from time import sleep
# 核心模块
import openpyxl
import pandas as pd
from openpyxl.styles import Alignment, Border, Font, NamedStyle, Side
df_all_score = pd.DataFrame(columns=["班级",
"学号",
"姓名",
"不及格门数",
"平均分",
"总分",
"排名",
"级部排名"])
# 这是提前定义的,储存所有学生信息的dataframe
file_dir = os.path.split(os.path.realpath(__file__)) + "\\"
# 储存表格的目录(当前目录)
def clean(mood: int):
"""清除之前的生成文件\n
mood=0 -> 清除所有\n
mood=1 -> 仅清除Pandas的生成\n
因程序测试时每次都会输出*_*.xlsx,为方便起见,每次生成前先删除旧的生成文件。"""
if(mood):
# 直接利用0/1判断False/True
os.system("del /f /q /s " + file_dir + "*pandas*.xlsx")
print("已清除之前的生成", end="\n\n")
sleep(3)
else:
os.system("del /f /q /s " + file_dir + "*_*.xlsx")
print("已清除之前的生成", end="\n\n")
sleep(3)
def read_xlsx(class_i: str):
"""读表格并返回表格路径、表格内容"""
# 根据表格名称,<10与>=10两种情况(已在最底下的for循环里面处理为str_i)
full_path = file_dir + "临床2021" + class_i
sheet = pd.read_excel((full_path + ".xlsx"), header=1)
# header = 1 意为把表格的第二行作为标题。(忽略了第一行的读取,最后还得重新生成第一行……)
#为姓名起始序号
# 请注意,python的计数比较别扭,都是从0开始。上面的注释便于以后查看、改写。
return full_path, sheet
def format_each(df: pd.core.frame.DataFrame, r: int, c: int):
"""处理单元格的NaN问题"""
each_grade = df.iloc
if(pd.isna(each_grade)):
# 注:只能使用pd.isna进行判断,除非使用try,int()等一律失效
df.iloc = 0
each_grade = 0
each_grade = int(each_grade)
# 方便下面的再次取值(如果有的话),防止部分成绩的储存格式为文本型
df.iloc = each_grade
return each_grade
def get_sum_average_grade(class_stu: int, sheet: pd.core.frame.DataFrame):
"""计算有效课程数量与总分,并求有效平均分"""
# 完全可以使用loc[]进行取值,但是最原始的表格里面(pandas读取之前)有合并单元格的一个大标题
# 还牵扯到了中文编码的识别问题……建议不要用。
for row in range(class_stu):
sum_grade = 0
# 重新计算个人总分
eff_count = 0# effective_count
# 我们学校要求的是用平均成绩计算,且不把0分科目(一般都是缓考造成的)计算在内
# 所以说需要计算有意义的科目【这里应该能简写,希望大佬指点】
for column in range(11):
each_subject_garde = format_each(sheet, row, column)
if (each_subject_garde != 0):
# 如果成绩不为 0 则计数
sum_grade += each_subject_garde# 计算总成绩
eff_count += 1
# eff_count 有分数课程的总数
sheet.iloc = sum_grade
if(eff_count == 0):
sheet.iloc = 0# Avoid ZeroErr
else:
sheet.iloc = (sum_grade / eff_count)# 输出平均分
def rank_grades(class_stu: int, sheet1: pd.core.frame.DataFrame):
"""成绩排序"""
original_list = []
for i in range(class_stu):
original_list.append(sheet1.iloc)# 储存的是原始成绩
ranked_list = sorted(original_list, reverse=True)# 排序后的原始成绩
rank_list = # 名次(用来对应原始姓名)
# 演示:
# original_list:
# ranked_list:
# rank_list:
# final_list:
# (最后通过对应ranked_list与rank_list的对应关系让final_list变成原来顺序的纯名次)
# 一开始尝试过用字典或列表嵌套的方式,都失败了……
"""生成rank_list"""
# 实际排名
rank = 1
# 用ranked_list排序,第一个一定是第一名啊
count = 0
# 计算同分数段有多少人
for i in range(class_stu - 1):
if(ranked_list > ranked_list):
# 正常情况
rank = rank + count + 1
count = 0
# 别忘了count清零
else:
# 名次相等的情况
rank = rank
count += 1
rank_list.append(rank)
"""这里将生成final_list,通过对应使其变为顺序
不变的名次排序,然后直接输出到表格即可"""
final_list = original_list# 初始化final_list
# 两个循环匹配相同项
for i in range(len(original_list)):
j = 0
while (original_list != ranked_list):
j += 1
final_list = rank_list
for i in range(class_stu):
# 更改到表格的内容
sheet1.iloc = final_list
def sheet2_produce(sum: int, sheet1: pd.core.frame.DataFrame, class_stu: str):
"""根据Sheet1的内容生成Sheet2数据"""
class_stu = class_stu + "班"
sheet2 = sheet1.filter(
items=["学号", "姓名", "不及格门数", "平均分", "总分", "排名"])
# 删除原来的不相关列即可
sheet2.drop(,
inplace=True)
# 插入班级列,方便对应,不再使用学号进行对应。
sheet2.insert(0, "班级", class_stu)
sheet2.sort_values(by="平均分",
inplace=True,
kind="mergesort",
ascending=False)
# 注:inplace可免于自赋值
# kind="mergesort"使其有序排序,避免学号混乱。请自行搜索mergesort。
# 排序依据平均分,升序。
return sheet2
def pandas_xlsx_write(path: str, sheet1: pd.core.frame.DataFrame, sheet2: pd.core.frame.DataFrame):
"""pandas数据处理结束后写入表格"""
with pd.ExcelWriter(path + "_out_pandas.xlsx") as writer:
# 使用with语句可以避免Sheet2输出时覆盖Sheet1导致最后表格里只有一个sheet2的情况
sheet1.to_excel(writer,
sheet_name="Sheet1",
index=False,
engine="openpyxl")
sheet2.to_excel(writer,
sheet_name="Sheet2",
index=False,
engine="openpyxl")
print("Pandas汇总各班成绩完成")
def format_out_sheet(sheet_path: str, class_stu: int, str_num: str, dir: str):
"""用Openpyxl格式化文档并输出\n
同时为生成汇总表准备\n
"""
wb = openpyxl.load_workbook(sheet_path + "_out_pandas.xlsx")
ws1 = wb["Sheet1"]
thin = Side(border_style="thin", color="000000")
# 定义边框样式
normal_style = NamedStyle(
name="normal_style",
font=Font(name="宋体", size=9),
alignment=Alignment(horizontal="center",
vertical="center",
wrapText=False),
border=Border(top=thin,
left=thin,
right=thin,
bottom=thin))
# 定义一种样式方便后面直接使用
wb.add_named_style(normal_style)
ws1.insert_rows(0, 1)
ws1["A1"] = "班级成绩表 2021-2022-1 2021级临床" + str_num + "班"
# 插入一行
ws1.merge_cells("A1:T1")
ws1["A1"].font = Font(name="宋体", size=11)
ws1["A1"].alignment = Alignment(horizontal="center",
vertical="center",
wrapText=False)
ws1.row_dimensions.height = 20
# 合并单元格、设置字体、居中、行高
for i in list("APQRST"):
ws1.merge_cells(i + "2:" + i + "7")
for i in range(3, 8):
i = str(i)
ws1.merge_cells("B" + i + ":D" + i)
for i in list("ABCDEFGHIJKLMNOPQRST"):
for j in range(2, class_stu + 10):
ws1.style = normal_style
ws1.alignment = Alignment(horizontal="center",
vertical="center",
wrapText=True)
# 各种合并单元格。最后的wrapText代表是否允许换行。
for i in list("BCEFGHIJKLMNO"):
ws1.column_dimensions.width = (10 - 0.36)
# 注:实际上width属性写5,输出为4.36(height没问题)
# 所有的好像都是设置为(x + 1),实际为(x + 0.36)
ws1.column_dimensions["A"].width = (5 - 0.36)
for i in list("DPQRST"):
ws1.column_dimensions.width = (9 - 0.36)
for i in range(class_stu):
for j in list("EFGHIJKLMNO"):
if(int(ws1.value) < 60):
# 注:如果把if前置到for j那里则会出现格式失效的情况
# 原因不明,暂时用此法避免
ws1.font = Font(color="00FF0000")
# 颜色只能设置RGB值
ws2 = wb["Sheet2"]
for i in list("ABCDEFG"):
for j in range(1, class_stu + 2):
# 我真是受够了这个地方,openpyxl和pandas不一样
# openpyxl计数从1开始,pandas从0开始。
# 还有题头是否包含等各种问题……
ws2.style = normal_style
ws2.column_dimensions["A"].width = (5 - 0.36)
ws2.column_dimensions["B"].width = (10 - 0.36)
ws2.column_dimensions["C"].width = (10 - 0.36)
for i in list("EFG"):
ws2.column_dimensions.width = (9 - 0.36)
for i in range(2, class_stu + 1):
if (int(ws2["D" + str(i)].value) > 0):
# 突出不及格科目
ws2["D" + str(i)].font = Font(color="00FF0000", size=11)
wb.save(dir + "_out_openpyxl.xlsx")
# openpyxl保存时就不用with……
print("Openpyxl格式化完毕")
print(str_num + "班已结束\n")
def concat_all(each_sheet: pd.core.frame.DataFrame, df: pd.core.frame.DataFrame):
"""输出总汇总表"""
df = pd.concat(, axis=0, ignore_index=True)
# 连接每个表的Sheet2
return df
def sort_all_out(df_all: pd.core.frame.DataFrame, sum_stu: int, file_dir: str):
"""用pandas排序汇总表"""
df_all.sort_values(by="平均分",
inplace=True,
kind="mergesort",
ascending=False)
# 前面有写,略。
# 后面的逻辑和前面我自己写的排序模块基本一致
# 思路相同,略。
df_all.iloc = 1
rank = 1
count = 0
for i in range(sum_stu - 1):
if(df_all.iloc > df_all.iloc):
rank = rank + count + 1
count = 0
else:
count += 1
df_all.iloc = rank
df_all.to_excel(file_dir + "汇总_out_pandas.xlsx",
sheet_name="个人成绩汇总表",
index=False)
# 这个index=False一旦漏了就跟白排版了没啥区别了。
print("Pandas汇总全年级排名完毕")
def format_sumed_sheet(df_dir: str):
"""格式化汇总表并输出"""
wb = openpyxl.load_workbook(df_dir + "汇总_out_pandas.xlsx")
ws = wb["个人成绩汇总表"]
thin = Side(border_style="thin", color="000000")
normal_style = NamedStyle(
name="normal_style",
font=Font(name="宋体", size=9),
alignment=Alignment(horizontal="center",
vertical="center",
wrapText=False),
border=Border(top=thin,
left=thin,
right=thin,
bottom=thin))
wb.add_named_style(normal_style)
for i in list("ABCDEFGH"):
for j in range(1, ws.max_row + 1):
ws.style = normal_style
if ((i == "D") and (j > 1) and (int(ws.value) > 0)):
# 和前面思路一致,融合到一句话里了而已
ws.font = Font(color="00FF0000", size=11)
ws.column_dimensions["A"].width = (5 - 0.36)
ws.column_dimensions["B"].width = (10 - 0.36)
ws.column_dimensions["C"].width = (10 - 0.36)
for i in list("EFGH"):
ws.column_dimensions.width = (9 - 0.36)
# 设置列宽
wb.save(df_dir + "汇总_out_openpyxl.xlsx")
print("Openpyxl格式化排名完毕")
for class_int in range(1, 17):
"""说句大实话,可以直接把这些东西都放在一起的
完全可以避免这么搞。事实证明这么搞
除了增加了维护难度之外(传参的排查bug要人小命),
并没什么优点(模块化除外)
并且,我为了实现整体的逻辑统一与美观
被迫写了不少冗余代码,就为了让整体代码模式化
"""
# 注:全部变量名基本都改了一遍
# 现在的都是原始变量名为复杂的全称,函数内部的一般为简称
if(class_int == 1):
clean(0)
if(class_int < 10):
class_str = "0" + str(class_int)
else:
class_str = str(class_int)
full_path_no_ext, original_sheet = read_xlsx(class_str)
each_class_stu = len(original_sheet) - 7
get_sum_average_grade(each_class_stu, original_sheet)
rank_grades(each_class_stu, original_sheet)
original_sheet2 = sheet2_produce(
each_class_stu, original_sheet, class_str)
df_all_score = concat_all(original_sheet2, df_all_score)
pandas_xlsx_write(full_path_no_ext, original_sheet, original_sheet2)
format_out_sheet(full_path_no_ext, each_class_stu,
class_str, full_path_no_ext)
if(class_int == 16):
# 说白了就是班级的全部汇总完了就行了
# 完全可以放到大循环for的外面,我只是想利用一下现有的参数
# 又担心出了循环,参数可能失效的问题。
sort_all_out(df_all_score, len(df_all_score), file_dir)
format_sumed_sheet(file_dir)
clean(1)
print("\n")
```
精简版代码(无冗余注释,逻辑较清晰)
```
# coding=utf-8
import os
from time import sleep
import openpyxl
import pandas as pd
from openpyxl.styles import Alignment, Border, Font, NamedStyle, Side
df_all_score = pd.DataFrame(columns=["班级",
"学号",
"姓名",
"不及格门数",
"平均分",
"总分",
"排名",
"级部排名"])
# 储存所有学生信息的dataframe
file_dir = os.path.split(os.path.realpath(__file__))[0] + "\\"
# 当前目录
unattended = pd.read_excel(file_dir + "unattended.xlsx", header=None)
unattended_stu = {}
for each_stu in range(len(unattended)):
unattended_stu[unattended.iloc[each_stu, 0]] = unattended.iloc[each_stu, 1
def clean(mood: int):
"""清除之前的生成文件\n
mood=0 -> 清除所有\n
mood=1 -> 仅清除Pandas的生成"""
if(mood):
os.system("del /f /q /s " + file_dir + "*pandas*.xlsx")
print("已清除之前的生成", end="\n\n")
sleep(3)
else:
os.system("del /f /q /s " + file_dir + "*_*.xlsx")
print("已清除之前的生成", end="\n\n")
sleep(3)
def read_xlsx(class_i: str):
"""读表格并返回表格路径、表格内容"""
full_path = file_dir + "临床2021" + class_i
sheet = pd.read_excel((full_path + ".xlsx"), header=1)
#为姓名起始序号
return full_path, sheet
def format_each(df: pd.core.frame.DataFrame, r: int, c: int):
"""处理单元格的NaN问题"""
each_grade = df.iloc[r + 5, c + 4
if(pd.isna(each_grade)):
df.iloc[r + 5, c + 4] = 0
each_grade = 0
each_grade = int(each_grade)
df.iloc[r + 5, c + 4] = each_grade
return each_grade
def get_sum_average_grade(class_stu: int, sheet: pd.core.frame.DataFrame):
"""计算有效课程数量与总分,并求有效平均分"""
for row in range(class_stu):
sum_grade = 0
eff_count = 0
unattended_count = 0
for column in range(11):
each_subject_garde = format_each(sheet, row, column)
if (each_subject_garde != 0):
sum_grade += each_subject_garde
eff_count += 1
else:
if(sheet.iloc[row + 5, column + 2] in unattended_stu):
while(unattended_count <= int(unattended_stu[sheet.iloc[row + 5, column + 2]])):
eff_count += 1
unattended_count += 1
sheet.iloc[row + 5, 17] = sum_grade
if(eff_count == 0 or sum_grade == 0):
sheet.iloc[row + 5, 16] = 0# ZeroErr
else:
sheet.iloc[row + 5, 16] = round((sum_grade / eff_count),2)
def rank_grades(class_stu: int, sheet1: pd.core.frame.DataFrame):
"""成绩排序"""
original_list = []
for i in range(class_stu):
original_list.append(sheet1.iloc[i + 5, 16])
ranked_list = sorted(original_list, reverse=True)
rank_list = [1
# 演示:
# original_list:
# ranked_list:
# rank_list:
# final_list:
"""生成rank_list"""
rank = 1
count = 0
for i in range(class_stu - 1):
if(ranked_list[i] > ranked_list[i + 1]):
rank = rank + count + 1
count = 0
else:
rank = rank
count += 1
rank_list.append(rank)
final_list = original_list
for i in range(len(original_list)):
j = 0
while (original_list[i] != ranked_list[j]):
j += 1
final_list[i] = rank_list[j
for i in range(class_stu):
sheet1.iloc[i + 5, 19] = final_list[i
def sheet2_produce(sum: int, sheet1: pd.core.frame.DataFrame, class_stu: str):
"""根据Sheet1的内容生成Sheet2数据"""
class_stu = class_stu + "班"
sheet2 = sheet1.filter(
items=["学号", "姓名", "不及格门数", "平均分", "总分", "排名"])
sheet2.drop([0, 1, 2, 3, 4, sum + 5, sum + 6],
inplace=True)
sheet2.insert(0, "班级", class_stu)
sheet2.sort_values(by="平均分",
inplace=True,
kind="mergesort",
ascending=False)
return sheet2
def pandas_xlsx_write(path: str, sheet1: pd.core.frame.DataFrame, sheet2: pd.core.frame.DataFrame):
"""pandas数据处理结束后写入表格"""
with pd.ExcelWriter(path + "_out_pandas.xlsx") as writer:
sheet1.to_excel(writer,
sheet_name="Sheet1",
index=False,
engine="openpyxl")
sheet2.to_excel(writer,
sheet_name="Sheet2",
index=False,
engine="openpyxl")
print("Pandas汇总各班成绩完成")
def format_out_sheet(sheet_path: str, class_stu: int, str_num: str, dir: str):
"""用Openpyxl格式化文档并输出\n
同时为生成汇总表准备\n
"""
wb = openpyxl.load_workbook(sheet_path + "_out_pandas.xlsx")
ws1 = wb["Sheet1"
thin = Side(border_style="thin", color="000000")
normal_style = NamedStyle(
name="normal_style",
font=Font(name="宋体", size=9),
alignment=Alignment(horizontal="center",
vertical="center",
wrapText=False),
border=Border(top=thin,
left=thin,
right=thin,
bottom=thin))
wb.add_named_style(normal_style)
ws1.insert_rows(0, 1)
ws1["A1"] = "潍坊医学院 班级成绩表 2021-2022-1 2021级临床" + str_num + "班"
ws1.merge_cells("A1:T1")
ws1["A1"].font = Font(name="宋体", size=11)
ws1["A1"].alignment = Alignment(horizontal="center",
vertical="center",
wrapText=False)
ws1.row_dimensions[1].height = 20
for i in list("APQRST"):
ws1.merge_cells(i + "2:" + i + "7")
for i in range(3, 8):
i = str(i)
ws1.merge_cells("B" + i + ":D" + i)
for i in list("ABCDEFGHIJKLMNOPQRST"):
for j in range(2, class_stu + 10):
ws1[i + str(j)].style = normal_style
ws1[i + str(2)].alignment = Alignment(horizontal="center",
vertical="center",
wrapText=True)
for i in list("BCEFGHIJKLMNO"):
ws1.column_dimensions[i].width = (10 - 0.36)
# 注:实际上width属性设置为(x + 1),实际为(x + 0.36)
ws1.column_dimensions["A"].width = (5 - 0.36)
for i in list("DPQRST"):
ws1.column_dimensions[i].width = (9 - 0.36)
for i in range(class_stu):
for j in list("EFGHIJKLMNO"):
if(int(ws1[j + str(i + 8)].value) < 60):
ws1[j + str(i + 8)].font = Font(color="00FF0000")
ws2 = wb["Sheet2"
for i in list("ABCDEFGH"):
for j in range(1, class_stu + 2):
# openpyxl计数从1开始,pandas从0开始。
ws2[i + str(j)].style = normal_style
ws2.column_dimensions["A"].width = (5 - 0.36)
ws2.column_dimensions["B"].width = (10 - 0.36)
ws2.column_dimensions["C"].width = (10 - 0.36)
for i in list("EFGH"):
ws2.column_dimensions[i].width = (9 - 0.36)
for i in range(2, class_stu + 1):
if (int(ws2["D" + str(i)].value) > 0):
# 突出不及格科目
ws2["D" + str(i)].font = Font(color="00FF0000", size=11)
wb.save(dir + "_out_openpyxl.xlsx")
print("Openpyxl格式化完毕")
print(str_num + "班已结束\n")
def concat_all(each_sheet: pd.core.frame.DataFrame, df: pd.core.frame.DataFrame):
"""输出总汇总表"""
df = pd.concat([df, each_sheet], axis=0, ignore_index=True)
return df
def sort_all_out(df_all: pd.core.frame.DataFrame, sum_stu: int, file_dir: str):
"""用pandas排序汇总表"""
df_all.sort_values(by="平均分",
inplace=True,
kind="mergesort",
ascending=False)
df_all.iloc[0, 7] = 1
rank = 1
count = 0
for i in range(sum_stu - 1):
if(df_all.iloc[i, 4] > df_all.iloc[i + 1, 4]):
rank = rank + count + 1
count = 0
else:
count += 1
df_all.iloc[i + 1, 7] = rank
df_all.to_excel(file_dir + "汇总_out_pandas.xlsx",
sheet_name="个人成绩汇总表",
index=False)
df_all.close()
print("Pandas汇总全年级排名完毕")
def format_sumed_sheet(df_dir: str):
"""格式化汇总表并输出"""
wb = openpyxl.load_workbook(df_dir + "汇总_out_pandas.xlsx")
ws = wb["个人成绩汇总表"
thin = Side(border_style="thin", color="000000")
normal_style = NamedStyle(
name="normal_style",
font=Font(name="宋体", size=9),
alignment=Alignment(horizontal="center",
vertical="center",
wrapText=False),
border=Border(top=thin,
left=thin,
right=thin,
bottom=thin))
wb.add_named_style(normal_style)
for i in list("ABCDEFGH"):
for j in range(1, ws.max_row + 1):
ws[i + str(j)].style = normal_style
if ((i == "D") and (j > 1) and (int(ws[i + str(j)].value) > 0)):
ws[i + str(j)].font = Font(color="00FF0000", size=11)
ws.column_dimensions["A"].width = (5 - 0.36)
ws.column_dimensions["B"].width = (10 - 0.36)
ws.column_dimensions["C"].width = (10 - 0.36)
for i in list("EFGH"):
ws.column_dimensions[i].width = (9 - 0.36)
wb.save(df_dir + "汇总_out_openpyxl.xlsx")
print("Openpyxl格式化排名完毕")
for class_int in range(1, 17):
"""为了实现整体的逻辑统一与美观
被迫写了不少冗余代码
"""
if(class_int == 1):
clean(0)
if(class_int < 10):
class_str = "0" + str(class_int)
else:
class_str = str(class_int)
full_path_no_ext, original_sheet = read_xlsx(class_str)
each_class_stu = len(original_sheet) - 7
get_sum_average_grade(each_class_stu, original_sheet)
rank_grades(each_class_stu, original_sheet)
original_sheet2 = sheet2_produce(each_class_stu,
original_sheet, class_str)
df_all_score = concat_all(original_sheet2, df_all_score)
pandas_xlsx_write(full_path_no_ext, original_sheet, original_sheet2)
format_out_sheet(full_path_no_ext, each_class_stu,
class_str, full_path_no_ext)
if(class_int == 16):
sort_all_out(df_all_score, len(df_all_score), file_dir)
format_sumed_sheet(file_dir)
clean(1)
print("\n")
```
本帖最后由 zohoChou 于 2022-4-8 13:00 编辑
还有,重要提示,我使用的是windows10系统,自己手动把编码改成了utf-8,一般电脑默认是gbk,可能出现编码问题!!!
下面的更新我全都改成了openpyxl写的,反而没有python的pandas与numpy(发现这俩玩意对可视化太不友好了……)
咕咕咕,来更新了(本来打算把那个整合全年级成绩的模块(函数,叫习惯了)也一起发出来…………
但是,咕咕使我没有动力…………不想动弹…………
完成后的成果如下图:
注:懒得写注释了……抽空补上
生成出来的表里面都是公式和引用!
(对于宽松的要求来说够用了,但是如果要严谨的话请自行复制+粘贴值(我也没找到好办法……
还有,汇总出来的表没法用,引用炸了……正在积极寻找办法中(咕咕咕,有啥没看明白的可以问我,共同讨论共同提升
```
import os
from time import sleep
import openpyxl
os.system("taskkill /f /t /im EXCEL.EXE")
file_dir = "D:\\Desktop\\成绩\\"
if(os.path.exists(file_dir + "临床202101_out3.xlsx")):
os.system("cls")
os.system("del /f /q /s D:\\Desktop\\成绩\\*_out*")
os.system("del /f /q /s D:\\Desktop\\成绩\\*汇总*")
print("已清除之前的生成", end="\n\n")
sleep(3)
def get_sum(num: str, lrange):
"""注:因为openpyxl无法承载函数嵌套,故出此下策"""
A_Z_range = list("ABCDEFGHIJKLMNOPQRSTUVWXYZ")
sum_grades = 0
for k in range(4, 16):
# 注:有的值是NaN,要人小命,调试了N多遍
try:
int(lrange + num].value)
except:
lrange + num] = 0
sum_grades += int(lrange + num].value)
return sum_grades
def Starting(path: str):
original_wb = openpyxl.load_workbook(path + ".xlsx", data_only=True)
original_ws = original_wb["sheet1"]
original_ws.title = "Sh"
original_ws.title = "Sheet1"
# 注:原表名为sheet1,直接改为Sheet1会变成Sheet11(可能是bug,故以此绕过bug判定)
sum_people = original_ws.max_row - 9
for l in range(sum_people):
temp = str(l + 8)
original_ws["R" + temp] = get_sum(temp, original_ws)
original_ws["Q" + temp] = "=R" + temp + \
"/COUNTIF(E" + temp + ":O" + temp + ",\"<>0\")"
original_ws["T" + temp] = "=RANK(Q" + temp + ",$Q$8:$Q$" + \
str(sum_people + 7) + ",0)"
changed_ws2 = original_wb.copy_worksheet(original_ws)
changed_ws2.title = "Sheet2"
changed_ws2.unmerge_cells("A1:T1")
temp = ["A", "P", "Q", "R", "S", "T"]
# 大坑记:openpyxl的删除行只是清除内容,不是真正的删除!
# 所以说需要手动排版!!!
for m in temp:
changed_ws2.unmerge_cells(m + "2:" + m + "7")
for n in range(3, 8):
n = str(n)
changed_ws2.unmerge_cells("B" + n + ":D" + n)
changed_ws2.delete_cols(19, 1)
changed_ws2.delete_cols(5, 11)
changed_ws2.delete_cols(4, 1)
changed_ws2.delete_cols(1, 1)
changed_ws2.delete_rows(changed_ws2.max_row-1, 2)
changed_ws2.delete_rows(3, 5)
changed_ws2.delete_rows(1, 1)
changed_ws2.column_dimensions["A"].width = 12
changed_ws2.row_dimensions.height = 14.3
for o in range(sum_people):
changed_ws2["D" + str(o + 2)] = "=Sheet1!Q" + str(o + 8)
changed_ws2["F" + str(o + 2)] = "=Sheet1!T" + str(o + 8)
original_wb.save(file_path + "_out3.xlsx")
print("success3")
return sum_people, original_ws, changed_ws2
def class_judge(num: int):
if(20211150001 <= num <= 20211150050):
return "1班"
elif(20211150051 <= num <= 20211150100):
return "2班"
elif(20211150101 <= num <= 20211150150):
return "3班"
elif(20211150151 <= num <= 20211150200):
return "4班"
elif(20211150201 <= num <= 20211150250):
return "5班"
elif(20211150251 <= num <= 20211150300):
return "6班"
elif(20211150301 <= num <= 20211150349):
return "7班"
elif(20211150350 <= num <= 20211150399):
return "8班"
elif(20211150400 <= num <= 20211150449):
return "9班"
elif(20211150450 <= num <= 20211150499):
return "10班"
elif(20211150500 <= num <= 20211150549):
return "11班"
elif(20211150550 <= num <= 20211150599):
return "12班"
elif(20211150600 <= num <= 20211150648):
return "13班"
elif(20211150649 <= num <= 20211150700):
return "14班"
elif(20211150701 <= num <= 20211150753):
return "15班"
elif(20211150754 <= num <= 20211150806):
return "16班"
else:
print(num)
sum_wb = openpyxl.Workbook()
sum_wb["Sheet"].title = "个人成绩汇总"
sum_ws = sum_wb["个人成绩汇总"]
sum_ws.append(["学号", "班级", "姓名", "不及格门数", "平均分", "总分", "级部排名", "班级内排名"])
for i in range(1, 17):
if (i < 10):
file_path = file_dir + "临床20210" + str(i)
str_i = "0" + str(i)
else:
file_path = file_dir + "临床2021" + str(i)
str_i = str(i)
people, ws1, ws2 = Starting(file_path)
sum_last_all = 0
for j in range(people):
# 神奇的事情,“ValueError: Cells cannot be copied from other worksheets” ……没办法
sum_wb["个人成绩汇总"].append([
ws2["A" + str(2 + j)].value,
class_judge(int(ws2["A" + str(2 + j)].value)),
ws2["B" + str(2 + j)].value,
ws2["C" + str(2 + j)].value,
"=\"" + file_path + "\"" + str_i +
"_out3.xlsx]Sheet1!$Q$8" + str(i),
ws2["E" + str(2 + j)].value,
"=RANK(E" + str(sum_last_all + j + 2) + ",E1:E805)",
"=[临床2021" + str_i + "_out3.xlsx]Sheet1!$T$8" + str(i)])
sum_last_all += people
print(str(i) + "班完成")
sum_wb.save(file_dir + "_汇总.xlsx")
print("汇总")
```
我回来了!!!
因为学校把51假期挪到了暑假,没法鸽了(狗头
你木见过的船新版本等你来探……
# 假的更新日志(溜……
# 1.0.0(最后一版,如果没有意外……)
## 1.把几乎所有的变量全都进行了规范化命名,再也不是for i, for j, for k了 ……
### 注:函数外变量为完整命名,函数内为简化命名,如:each_class_stu -> class_stu 等。
## 2.重写40%以上逻辑特别是当时自己写的取总人数,绕了一大圈……
### (逻辑上还可以优化,但是由于担心以后维护时看不懂,专门把逻辑写的很简单易懂【有小部分冗余】)。## 3.回归初心,以pandas为主体,openpyxl专门格式化输出。
### (因没看懂官方文档,均为pandas输出后使用openpyxl重读并重新输出,未使用openpyxl直接协同pandas)
## 4.重写注释,增加函数说明
小贴士:
pandas与openpyxl有诸多不同,不要混了
传参一定不要反复逮着i用,极容易出现参数值混乱……
从未写过这么复杂的python……
注:实际上传时删掉了学校信息,可能少了十几个字
原始目录架构
https://s1.ax1x.com/2022/04/23/LfaPXD.png
原始文件:
结束运行后的目录架构
https://s1.ax1x.com/2022/04/23/LfdzQK.png
处理后的Sheet1
处理生成的Sheet2(排名已处理)
处理生成的汇总表
上代码!
```
# coding=utf-8
# 第一句很有必要!因为很多系统(特别是Windows,中文编码容易出问题)
# 为了执行cmd命令
import os
from time import sleep
# 核心模块
import openpyxl
import pandas as pd
from openpyxl.styles import Alignment, Border, Font, NamedStyle, Side
df_all_score = pd.DataFrame(columns=["班级",
"学号",
"姓名",
"不及格门数",
"平均分",
"总分",
"排名",
"级部排名"])
# 这是提前定义的,储存所有学生信息的dataframe
file_dir = os.path.split(os.path.realpath(__file__)) + "\\"
# 储存表格的目录(当前目录)
def clean(mood: int):
"""清除之前的生成文件\n
mood=0 -> 清除所有\n
mood=1 -> 仅清除Pandas的生成\n
因程序测试时每次都会输出*_*.xlsx,为方便起见,每次生成前先删除旧的生成文件。"""
if(mood):
# 直接利用0/1判断False/True
os.system("del /f /q /s " + file_dir + "*pandas*.xlsx")
print("已清除之前的生成", end="\n\n")
sleep(3)
else:
os.system("del /f /q /s " + file_dir + "*_*.xlsx")
print("已清除之前的生成", end="\n\n")
sleep(3)
def read_xlsx(class_i: str):
"""读表格并返回表格路径、表格内容"""
# 根据表格名称,<10与>=10两种情况(已在最底下的for循环里面处理为str_i)
full_path = file_dir + "临床2021" + class_i
sheet = pd.read_excel((full_path + ".xlsx"), header=1)
# header = 1 意为把表格的第二行作为标题。(忽略了第一行的读取,最后还得重新生成第一行……)
#为姓名起始序号
# 请注意,python的计数比较别扭,都是从0开始。上面的注释便于以后查看、改写。
return full_path, sheet
def format_each(df: pd.core.frame.DataFrame, r: int, c: int):
"""处理单元格的NaN问题"""
each_grade = df.iloc
if(pd.isna(each_grade)):
# 注:只能使用pd.isna进行判断,除非使用try,int()等一律失效
df.iloc = 0
each_grade = 0
each_grade = int(each_grade)
# 方便下面的再次取值(如果有的话),防止部分成绩的储存格式为文本型
df.iloc = each_grade
return each_grade
def get_sum_average_grade(class_stu: int, sheet: pd.core.frame.DataFrame):
"""计算有效课程数量与总分,并求有效平均分"""
# 完全可以使用loc[]进行取值,但是最原始的表格里面(pandas读取之前)有合并单元格的一个大标题
# 还牵扯到了中文编码的识别问题……建议不要用。
for row in range(class_stu):
sum_grade = 0
# 重新计算个人总分
eff_count = 0# effective_count
# 我们学校要求的是用平均成绩计算,且不把0分科目(一般都是缓考造成的)计算在内
# 所以说需要计算有意义的科目【这里应该能简写,希望大佬指点】
for column in range(11):
each_subject_garde = format_each(sheet, row, column)
if (each_subject_garde != 0):
# 如果成绩不为 0 则计数
sum_grade += each_subject_garde# 计算总成绩
eff_count += 1
# eff_count 有分数课程的总数
sheet.iloc = sum_grade
if(eff_count == 0):
sheet.iloc = 0# Avoid ZeroErr
else:
sheet.iloc = (sum_grade / eff_count)# 输出平均分
def rank_grades(class_stu: int, sheet1: pd.core.frame.DataFrame):
"""成绩排序"""
original_list = []
for i in range(class_stu):
original_list.append(sheet1.iloc)# 储存的是原始成绩
ranked_list = sorted(original_list, reverse=True)# 排序后的原始成绩
rank_list = # 名次(用来对应原始姓名)
# 演示:
# original_list:
# ranked_list:
# rank_list:
# final_list:
# (最后通过对应ranked_list与rank_list的对应关系让final_list变成原来顺序的纯名次)
# 一开始尝试过用字典或列表嵌套的方式,都失败了……
"""生成rank_list"""
# 实际排名
rank = 1
# 用ranked_list排序,第一个一定是第一名啊
count = 0
# 计算同分数段有多少人
for i in range(class_stu - 1):
if(ranked_list > ranked_list):
# 正常情况
rank = rank + count + 1
count = 0
# 别忘了count清零
else:
# 名次相等的情况
rank = rank
count += 1
rank_list.append(rank)
"""这里将生成final_list,通过对应使其变为顺序
不变的名次排序,然后直接输出到表格即可"""
final_list = original_list# 初始化final_list
# 两个循环匹配相同项
for i in range(len(original_list)):
j = 0
while (original_list != ranked_list):
j += 1
final_list = rank_list
for i in range(class_stu):
# 更改到表格的内容
sheet1.iloc = final_list
def sheet2_produce(sum: int, sheet1: pd.core.frame.DataFrame, class_stu: str):
"""根据Sheet1的内容生成Sheet2数据"""
class_stu = class_stu + "班"
sheet2 = sheet1.filter(
items=["学号", "姓名", "不及格门数", "平均分", "总分", "排名"])
# 删除原来的不相关列即可
sheet2.drop(,
inplace=True)
# 插入班级列,方便对应,不再使用学号进行对应。
sheet2.insert(0, "班级", class_stu)
sheet2.sort_values(by="平均分",
inplace=True,
kind="mergesort",
ascending=False)
# 注:inplace可免于自赋值
# kind="mergesort"使其有序排序,避免学号混乱。请自行搜索mergesort。
# 排序依据平均分,升序。
return sheet2
def pandas_xlsx_write(path: str, sheet1: pd.core.frame.DataFrame, sheet2: pd.core.frame.DataFrame):
"""pandas数据处理结束后写入表格"""
with pd.ExcelWriter(path + "_out_pandas.xlsx") as writer:
# 使用with语句可以避免Sheet2输出时覆盖Sheet1导致最后表格里只有一个sheet2的情况
sheet1.to_excel(writer,
sheet_name="Sheet1",
index=False,
engine="openpyxl")
sheet2.to_excel(writer,
sheet_name="Sheet2",
index=False,
engine="openpyxl")
print("Pandas汇总各班成绩完成")
def format_out_sheet(sheet_path: str, class_stu: int, str_num: str, dir: str):
"""用Openpyxl格式化文档并输出\n
同时为生成汇总表准备\n
"""
wb = openpyxl.load_workbook(sheet_path + "_out_pandas.xlsx")
ws1 = wb["Sheet1"]
thin = Side(border_style="thin", color="000000")
# 定义边框样式
normal_style = NamedStyle(
name="normal_style",
font=Font(name="宋体", size=9),
alignment=Alignment(horizontal="center",
vertical="center",
wrapText=False),
border=Border(top=thin,
left=thin,
right=thin,
bottom=thin))
# 定义一种样式方便后面直接使用
wb.add_named_style(normal_style)
ws1.insert_rows(0, 1)
ws1["A1"] = "班级成绩表 2021-2022-1 2021级临床" + str_num + "班"
# 插入一行
ws1.merge_cells("A1:T1")
ws1["A1"].font = Font(name="宋体", size=11)
ws1["A1"].alignment = Alignment(horizontal="center",
vertical="center",
wrapText=False)
ws1.row_dimensions.height = 20
# 合并单元格、设置字体、居中、行高
for i in list("APQRST"):
ws1.merge_cells(i + "2:" + i + "7")
for i in range(3, 8):
i = str(i)
ws1.merge_cells("B" + i + ":D" + i)
for i in list("ABCDEFGHIJKLMNOPQRST"):
for j in range(2, class_stu + 10):
ws1.style = normal_style
ws1.alignment = Alignment(horizontal="center",
vertical="center",
wrapText=True)
# 各种合并单元格。最后的wrapText代表是否允许换行。
for i in list("BCEFGHIJKLMNO"):
ws1.column_dimensions.width = (10 - 0.36)
# 注:实际上width属性写5,输出为4.36(height没问题)
# 所有的好像都是设置为(x + 1),实际为(x + 0.36)
ws1.column_dimensions["A"].width = (5 - 0.36)
for i in list("DPQRST"):
ws1.column_dimensions.width = (9 - 0.36)
for i in range(class_stu):
for j in list("EFGHIJKLMNO"):
if(int(ws1.value) < 60):
# 注:如果把if前置到for j那里则会出现格式失效的情况
# 原因不明,暂时用此法避免
ws1.font = Font(color="00FF0000")
# 颜色只能设置RGB值
ws2 = wb["Sheet2"]
for i in list("ABCDEFG"):
for j in range(1, class_stu + 2):
# 我真是受够了这个地方,openpyxl和pandas不一样
# openpyxl计数从1开始,pandas从0开始。
# 还有题头是否包含等各种问题……
ws2.style = normal_style
ws2.column_dimensions["A"].width = (5 - 0.36)
ws2.column_dimensions["B"].width = (10 - 0.36)
ws2.column_dimensions["C"].width = (10 - 0.36)
for i in list("EFG"):
ws2.column_dimensions.width = (9 - 0.36)
for i in range(2, class_stu + 1):
if (int(ws2["D" + str(i)].value) > 0):
# 突出不及格科目
ws2["D" + str(i)].font = Font(color="00FF0000", size=11)
wb.save(dir + "_out_openpyxl.xlsx")
# openpyxl保存时就不用with……
print("Openpyxl格式化完毕")
print(str_num + "班已结束\n")
def concat_all(each_sheet: pd.core.frame.DataFrame, df: pd.core.frame.DataFrame):
"""输出总汇总表"""
df = pd.concat(, axis=0, ignore_index=True)
# 连接每个表的Sheet2
return df
def sort_all_out(df_all: pd.core.frame.DataFrame, sum_stu: int, file_dir: str):
"""用pandas排序汇总表"""
df_all.sort_values(by="平均分",
inplace=True,
kind="mergesort",
ascending=False)
# 前面有写,略。
# 后面的逻辑和前面我自己写的排序模块基本一致
# 思路相同,略。
df_all.iloc = 1
rank = 1
count = 0
for i in range(sum_stu - 1):
if(df_all.iloc > df_all.iloc):
rank = rank + count + 1
count = 0
else:
count += 1
df_all.iloc = rank
df_all.to_excel(file_dir + "汇总_out_pandas.xlsx",
sheet_name="个人成绩汇总表",
index=False)
# 这个index=False一旦漏了就跟白排版了没啥区别了。
print("Pandas汇总全年级排名完毕")
def format_sumed_sheet(df_dir: str):
"""格式化汇总表并输出"""
wb = openpyxl.load_workbook(df_dir + "汇总_out_pandas.xlsx")
ws = wb["个人成绩汇总表"]
thin = Side(border_style="thin", color="000000")
normal_style = NamedStyle(
name="normal_style",
font=Font(name="宋体", size=9),
alignment=Alignment(horizontal="center",
vertical="center",
wrapText=False),
border=Border(top=thin,
left=thin,
right=thin,
bottom=thin))
wb.add_named_style(normal_style)
for i in list("ABCDEFGH"):
for j in range(1, ws.max_row + 1):
ws.style = normal_style
if ((i == "D") and (j > 1) and (int(ws.value) > 0)):
# 和前面思路一致,融合到一句话里了而已
ws.font = Font(color="00FF0000", size=11)
ws.column_dimensions["A"].width = (5 - 0.36)
ws.column_dimensions["B"].width = (10 - 0.36)
ws.column_dimensions["C"].width = (10 - 0.36)
for i in list("EFGH"):
ws.column_dimensions.width = (9 - 0.36)
# 设置列宽
wb.save(df_dir + "汇总_out_openpyxl.xlsx")
print("Openpyxl格式化排名完毕")
for class_int in range(1, 17):
"""说句大实话,可以直接把这些东西都放在一起的
完全可以避免这么搞。事实证明这么搞
除了增加了维护难度之外(传参的排查bug要人小命),
并没什么优点(模块化除外)
并且,我为了实现整体的逻辑统一与美观
被迫写了不少冗余代码,就为了让整体代码模式化
"""
# 注:全部变量名基本都改了一遍
# 现在的都是原始变量名为复杂的全称,函数内部的一般为简称
if(class_int == 1):
clean(0)
if(class_int < 10):
class_str = "0" + str(class_int)
else:
class_str = str(class_int)
full_path_no_ext, original_sheet = read_xlsx(class_str)
each_class_stu = len(original_sheet) - 7
get_sum_average_grade(each_class_stu, original_sheet)
rank_grades(each_class_stu, original_sheet)
original_sheet2 = sheet2_produce(
each_class_stu, original_sheet, class_str)
df_all_score = concat_all(original_sheet2, df_all_score)
pandas_xlsx_write(full_path_no_ext, original_sheet, original_sheet2)
format_out_sheet(full_path_no_ext, each_class_stu,
class_str, full_path_no_ext)
if(class_int == 16):
# 说白了就是班级的全部汇总完了就行了
# 完全可以放到大循环for的外面,我只是想利用一下现有的参数
# 又担心出了循环,参数可能失效的问题。
sort_all_out(df_all_score, len(df_all_score), file_dir)
format_sumed_sheet(file_dir)
clean(1)
print("\n")
```
精简版代码(无冗余注释,逻辑较清晰)
```
# coding=utf-8
import os
from time import sleep
import openpyxl
import pandas as pd
from openpyxl.styles import Alignment, Border, Font, NamedStyle, Side
df_all_score = pd.DataFrame(columns=["班级",
"学号",
"姓名",
"不及格门数",
"平均分",
"总分",
"排名",
"级部排名"])
# 储存所有学生信息的dataframe
file_dir = os.path.split(os.path.realpath(__file__)) + "\\"
# 当前目录
def clean(mood: int):
"""清除之前的生成文件\n
mood=0 -> 清除所有\n
mood=1 -> 仅清除Pandas的生成"""
if(mood):
os.system("del /f /q /s " + file_dir + "*pandas*.xlsx")
print("已清除之前的生成", end="\n\n")
sleep(3)
else:
os.system("del /f /q /s " + file_dir + "*_*.xlsx")
print("已清除之前的生成", end="\n\n")
sleep(3)
def read_xlsx(class_i: str):
"""读表格并返回表格路径、表格内容"""
full_path = file_dir + "临床2021" + class_i
sheet = pd.read_excel((full_path + ".xlsx"), header=1)
#为姓名起始序号
return full_path, sheet
def format_each(df: pd.core.frame.DataFrame, r: int, c: int):
"""处理单元格的NaN问题"""
each_grade = df.iloc
if(pd.isna(each_grade)):
df.iloc = 0
each_grade = 0
each_grade = int(each_grade)
df.iloc = each_grade
return each_grade
def get_sum_average_grade(class_stu: int, sheet: pd.core.frame.DataFrame):
"""计算有效课程数量与总分,并求有效平均分"""
for row in range(class_stu):
sum_grade = 0
eff_count = 0
for column in range(11):
each_subject_garde = format_each(sheet, row, column)
if (each_subject_garde != 0):
sum_grade += each_subject_garde
eff_count += 1
sheet.iloc = sum_grade
if(eff_count == 0):
sheet.iloc = 0# ZeroErr
else:
sheet.iloc = (sum_grade / eff_count)
def rank_grades(class_stu: int, sheet1: pd.core.frame.DataFrame):
"""成绩排序"""
original_list = []
for i in range(class_stu):
original_list.append(sheet1.iloc)
ranked_list = sorted(original_list, reverse=True)
rank_list =
# 演示:
# original_list:
# ranked_list:
# rank_list:
# final_list:
"""生成rank_list"""
rank = 1
count = 0
for i in range(class_stu - 1):
if(ranked_list > ranked_list):
rank = rank + count + 1
count = 0
else:
rank = rank
count += 1
rank_list.append(rank)
final_list = original_list
for i in range(len(original_list)):
j = 0
while (original_list != ranked_list):
j += 1
final_list = rank_list
for i in range(class_stu):
sheet1.iloc = final_list
def sheet2_produce(sum: int, sheet1: pd.core.frame.DataFrame, class_stu: str):
"""根据Sheet1的内容生成Sheet2数据"""
class_stu = class_stu + "班"
sheet2 = sheet1.filter(
items=["学号", "姓名", "不及格门数", "平均分", "总分", "排名"])
sheet2.drop(,
inplace=True)
sheet2.insert(0, "班级", class_stu)
sheet2.sort_values(by="平均分",
inplace=True,
kind="mergesort",
ascending=False)
return sheet2
def pandas_xlsx_write(path: str, sheet1: pd.core.frame.DataFrame, sheet2: pd.core.frame.DataFrame):
"""pandas数据处理结束后写入表格"""
with pd.ExcelWriter(path + "_out_pandas.xlsx") as writer:
sheet1.to_excel(writer,
sheet_name="Sheet1",
index=False,
engine="openpyxl")
sheet2.to_excel(writer,
sheet_name="Sheet2",
index=False,
engine="openpyxl")
print("Pandas汇总各班成绩完成")
def format_out_sheet(sheet_path: str, class_stu: int, str_num: str, dir: str):
"""用Openpyxl格式化文档并输出\n
同时为生成汇总表准备\n
"""
wb = openpyxl.load_workbook(sheet_path + "_out_pandas.xlsx")
ws1 = wb["Sheet1"]
thin = Side(border_style="thin", color="000000")
normal_style = NamedStyle(
name="normal_style",
font=Font(name="宋体", size=9),
alignment=Alignment(horizontal="center",
vertical="center",
wrapText=False),
border=Border(top=thin,
left=thin,
right=thin,
bottom=thin))
wb.add_named_style(normal_style)
ws1.insert_rows(0, 1)
ws1["A1"] = "班级成绩表 2021-2022-1 2021级临床" + str_num + "班"
ws1.merge_cells("A1:T1")
ws1["A1"].font = Font(name="宋体", size=11)
ws1["A1"].alignment = Alignment(horizontal="center",
vertical="center",
wrapText=False)
ws1.row_dimensions.height = 20
for i in list("APQRST"):
ws1.merge_cells(i + "2:" + i + "7")
for i in range(3, 8):
i = str(i)
ws1.merge_cells("B" + i + ":D" + i)
for i in list("ABCDEFGHIJKLMNOPQRST"):
for j in range(2, class_stu + 10):
ws1.style = normal_style
ws1.alignment = Alignment(horizontal="center",
vertical="center",
wrapText=True)
for i in list("BCEFGHIJKLMNO"):
ws1.column_dimensions.width = (10 - 0.36)
# 注:实际上width属性设置为(x + 1),实际为(x + 0.36)
ws1.column_dimensions["A"].width = (5 - 0.36)
for i in list("DPQRST"):
ws1.column_dimensions.width = (9 - 0.36)
for i in range(class_stu):
for j in list("EFGHIJKLMNO"):
if(int(ws1.value) < 60):
ws1.font = Font(color="00FF0000")
ws2 = wb["Sheet2"]
for i in list("ABCDEFG"):
for j in range(1, class_stu + 2):
# openpyxl计数从1开始,pandas从0开始。
ws2.style = normal_style
ws2.column_dimensions["A"].width = (5 - 0.36)
ws2.column_dimensions["B"].width = (10 - 0.36)
ws2.column_dimensions["C"].width = (10 - 0.36)
for i in list("EFG"):
ws2.column_dimensions.width = (9 - 0.36)
for i in range(2, class_stu + 1):
if (int(ws2["D" + str(i)].value) > 0):
# 突出不及格科目
ws2["D" + str(i)].font = Font(color="00FF0000", size=11)
wb.save(dir + "_out_openpyxl.xlsx")
print("Openpyxl格式化完毕")
print(str_num + "班已结束\n")
def concat_all(each_sheet: pd.core.frame.DataFrame, df: pd.core.frame.DataFrame):
"""输出总汇总表"""
df = pd.concat(, axis=0, ignore_index=True)
return df
def sort_all_out(df_all: pd.core.frame.DataFrame, sum_stu: int, file_dir: str):
"""用pandas排序汇总表"""
df_all.sort_values(by="平均分",
inplace=True,
kind="mergesort",
ascending=False)
df_all.iloc = 1
rank = 1
count = 0
for i in range(sum_stu - 1):
if(df_all.iloc > df_all.iloc):
rank = rank + count + 1
count = 0
else:
count += 1
df_all.iloc = rank
df_all.to_excel(file_dir + "汇总_out_pandas.xlsx",
sheet_name="个人成绩汇总表",
index=False)
print("Pandas汇总全年级排名完毕")
def format_sumed_sheet(df_dir: str):
"""格式化汇总表并输出"""
wb = openpyxl.load_workbook(df_dir + "汇总_out_pandas.xlsx")
ws = wb["个人成绩汇总表"]
thin = Side(border_style="thin", color="000000")
normal_style = NamedStyle(
name="normal_style",
font=Font(name="宋体", size=9),
alignment=Alignment(horizontal="center",
vertical="center",
wrapText=False),
border=Border(top=thin,
left=thin,
right=thin,
bottom=thin))
wb.add_named_style(normal_style)
for i in list("ABCDEFGH"):
for j in range(1, ws.max_row + 1):
ws.style = normal_style
if ((i == "D") and (j > 1) and (int(ws.value) > 0)):
ws.font = Font(color="00FF0000", size=11)
ws.column_dimensions["A"].width = (5 - 0.36)
ws.column_dimensions["B"].width = (10 - 0.36)
ws.column_dimensions["C"].width = (10 - 0.36)
for i in list("EFGH"):
ws.column_dimensions.width = (9 - 0.36)
wb.save(df_dir + "汇总_out_openpyxl.xlsx")
print("Openpyxl格式化排名完毕")
for class_int in range(1, 17):
"""为了实现整体的逻辑统一与美观
被迫写了不少冗余代码
"""
if(class_int == 1):
clean(0)
if(class_int < 10):
class_str = "0" + str(class_int)
else:
class_str = str(class_int)
full_path_no_ext, original_sheet = read_xlsx(class_str)
each_class_stu = len(original_sheet) - 7
get_sum_average_grade(each_class_stu, original_sheet)
rank_grades(each_class_stu, original_sheet)
original_sheet2 = sheet2_produce(each_class_stu,
original_sheet, class_str)
df_all_score = concat_all(original_sheet2, df_all_score)
pandas_xlsx_write(full_path_no_ext, original_sheet, original_sheet2)
format_out_sheet(full_path_no_ext, each_class_stu,
class_str, full_path_no_ext)
if(class_int == 16):
sort_all_out(df_all_score, len(df_all_score), file_dir)
format_sumed_sheet(file_dir)
clean(1)
print("\n")
```
https://blog.csdn.net/zwliang98/article/details/104043328?spm=1001.2101.3001.6661.1&utm_medium=distribute.pc_relevant_t0.none-task-blog-2%7Edefault%7ECTRLIST%7ETopBlog-1.topblog&depth_1-utm_source=distribute.pc_relevant_t0.none-task-blog-2%7Edefault%7ECTRLIST%7ETopBlog-1.topblog&utm_relevant_index=1
headerCellStyle = NamedStyle(name = 'headerCellStyle')
headerCellStyle.alignment = Alignment(horizontal = 'center', vertical = 'center') # 水平垂直居中
border = Side(border_style = 'thin', color = '000000') # 线框样式
headerCellStyle.border = Border(left = border, top = border, right = border, bottom = border) # 设置单元格边框样式
openpyxl 可以修改格式 厉害,我最近也是在学习python,看到你的解说,我觉得这个软件确实不错 新人学习下,感觉蛮不错的,详细注释的代码很不错!^O^ jcorner 发表于 2022-3-16 10:53
厉害,我最近也是在学习python,看到你的解说,我觉得这个软件确实不错
这属于pandas基本的使用(其实把pandas这个极其强大的模块用成这样是不对的,他的强大之处在于直接在python环境下进行各种统计、生成图表等操作,而不是重新变成excel)
python的大数据分析很大一部分是运用了pandas模块,建议提高英文水平,找找官网看。
还有,菜鸟教程你值得拥有。以及,《python从入门到精通》是个垃圾。。。在我刚刚接触python这门编程语言时就发现了诸多错误。避坑!
最后,加油!我也是一步步踩坑踩过来的(笑哭 同求方法 等大神来指导 pandas也不太会用 xuanyuanhs 发表于 2022-3-16 10:56
新人学习下,感觉蛮不错的,详细注释的代码很不错!^O^
其实一般不会写这么详细的,除了我自己开发的这个成绩排序的模块,因为实在写的太乱,思路太绕,必须得写注释,否则下面大佬写的那段就是一般的文件的样子 hacker922 发表于 2022-3-16 11:04
同求方法 等大神来指导 pandas也不太会用
我尝试过xlrt什么的,然而全失败了……最后只能自己手动格式刷
再实在不行我手撸一个函数……(只会写函数,不会写模块。。。溜……
无语凝噎……
zohoChou 发表于 2022-3-16 11:02
这属于pandas基本的使用(其实把pandas这个极其强大的模块用成这样是不对的,他的强大之处在于直接在pyth ...
求菜鸟教程 正在学python从入门到精通,看的头疼 赶紧有错误不说,还不好理解 zohoChou 发表于 2022-3-16 11:06
我尝试过xlrt什么的,然而全失败了……最后只能自己手动格式刷
再实在不行我手撸一个函数……(只会写函 ...
{:1_937:}哈哈哈 我也是 弄不来就处理出来 自己再手动 蹲坑多年还是要多来学习大神