new.xlsx内容为只有一列"id":1 0 3
old.xlsx内容为只有一列"id":1 2 3
win11代码为:
import pandas as pd
from pathlib import Path
#define parameters
#path to files https://gist.github.com/VankatPe ... 42b6029c3b92f20862a
path_old=Path(r'D:\Program Files\Pycharm_professional_2019.3.3_Portable\bin\Pycharm\config\scratches\old.xlsx')
path_new=Path(r'D:\Program Files\Pycharm_professional_2019.3.3_Portable\bin\Pycharm\config\scratches\new.xlsx')
#list of key column(s)
key=['id']
#sheets to read in
sheet='Sheet1'
# Read in the two excel files and fill NA
old = pd.read_excel(path_old).fillna(0)
new = pd.read_excel(path_new).fillna(0)
#set index
old=old.set_index(key)
new=new.set_index(key)
#save the output
writer.save()
print ('\nDone.\n')
linux代码为:
import pandas as pd
from pathlib import Path
#define parameters
#path to files https://gist.github.com/VankatPe ... 42b6029c3b92f20862a
path_old=Path(r'/home/asd/Downloads/test/old.xlsx')
path_new=Path(r'/home/asd/Downloads/test/new.xlsx')
#list of key column(s)
key=['id']
#sheets to read in
sheet='Sheet1'
# Read in the two excel files and fill NA
old = pd.read_excel(path_old).fillna(0)
new = pd.read_excel(path_new).fillna(0)
#set index
old=old.set_index(key)
new=new.set_index(key)
#save the output
writer.save()
print ('\nDone.\n')
重点来了:linux报错
Traceback (most recent call last):
File "/home/asd/.config/JetBrains/PyCharmCE2020.1/scratches/scratch_27.py", line 25, in <module>
df_all_changes = pd.concat([old, new], axis='columns', keys=['old','new'], join='inner')
File "/home/asd/archiconda3/lib/python3.7/site-packages/pandas/util/_decorators.py", line 311, in wrapper
return func(*args, **kwargs)
File "/home/asd/archiconda3/lib/python3.7/site-packages/pandas/core/reshape/concat.py", line 307, in concat
return op.get_result()
File "/home/asd/archiconda3/lib/python3.7/site-packages/pandas/core/reshape/concat.py", line 528, in get_result
indexers[ax] = obj_labels.get_indexer(new_labels)
File "/home/asd/archiconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3442, in get_indexer
raise InvalidIndexError(self._requires_unique_msg)
pandas.errors.InvalidIndexError: Reindexing only valid with uniquely valued Index objects
ChatGPT:
根据错误提示,这个问题似乎是由于重新索引时,索引值不唯一造成的。请检查 old 和 new 两个数据框是否设置了相同的列名,如果是,则需要将其中一个数据框的列名进行修改以确保唯一性。您也可以尝试对两个数据框的索引列进行去重的操作,以确保每个索引值的唯一性。例如:df.drop_duplicates()。
e/asd/archiconda3/lib/python3.7/site-packages/openpyxl/worksheet/header_footer.py:48: UserWarning: Cannot parse header or footer so it will be ignored
warn("""Cannot parse header or footer so it will be ignored""")
Traceback (most recent call last):
File "/home/asd/Downloads/test/定稿的代码.py", line 2694, in <module>
old=old.set_index(key, drop_duplicates=True)
File "/home/asd/archiconda3/lib/python3.7/site-packages/pandas/util/_decorators.py", line 311, in wrapper
return func(*args, **kwargs)
TypeError: set_index() got an unexpected keyword argument 'drop_duplicates'