chardet检测的编码,pd读取csv也不准,即使读取了也是中文为乱码
[Python] 纯文本查看 复制代码 def detect_encoding(input_file):
with open(input_file, 'rb') as f:
data = f.readline()
return chardet.detect(data)['encoding']
def read_csv(input_file):
encode = detect_encoding(input_file)
encodes = ['iso-8859-1','utf-8', encode, 'utf-16', 'utf-8-sig', 'GB2312', 'big5', 'latin1', 'ANSI']
for encode in encodes:
try:
with open(input_file, 'r', encoding=encode) as f:
df = pd.read_csv(f, header=None, usecols=cellNum)
break
except UnicodeError or UnicodeDecodeError:
continue
header=None 将第一行作为数据读取,不作为列名读取
usecols=[] 指定读取的列例如,[0,2]指定只读取第0列,第2列 |