批量将wps转txt

fryant 发表于 2023-5-17 21:25

利用Python批量将wps转txt
直接展示代码吧~
import os
import chardet

# 自定义文件路径
file_path = r"C:\Users\Fryant\Desktop\data\policy\wps"

# 遍历指定目录下的所有文件
for root, dirs, files in os.walk(file_path):
for file in files:
   # 判断文件是否是WPS文件
   if file.endswith(".wps"):
         # 打开WPS文件，并读取其中的内容
         with open(os.path.join(root, file), "rb") as f:
            content = f.read()

         # 检测文件的编码格式
         encoding = chardet.detect(content)["encoding"]

         # 将文件内容转化为字符串
         content_str = content.decode(encoding)

         # 将文件内容保存到txt文件中
         with open(os.path.join(root, file[:-4] + ".txt"), "w", encoding="utf-8") as f:
            f.write(content_str)
print('处理完成')

TR小米哥 发表于 2023-5-22 16:29

看起来没有明显的错误或不足，但是可以进行一些改进和优化。

1.在处理大量文件时，最好使用多线程或异步方式，以提高程序的效率。

2.在处理文件时，最好使用上下文管理器（with语句），以确保文件在使用完毕后能够被正确关闭。

3.在检测文件编码格式时，最好使用第三方库chardet2，它的准确度比chardet更高。

4.在保存文件内容到txt文件中时，最好使用上下文管理器，并指定文件编码格式。同时，可以考虑使用os.path.splitext()函数来获取文件名和扩展名，以避免硬编码。
```
import os
import chardet2
import concurrent.futures

# 自定义文件路径
file_path = r"C:\Users\Fryant\Desktop\data\policy\wps"

def process_file(file):
# 判断文件是否是WPS文件
if file.endswith(".wps"):
   # 打开WPS文件，并读取其中的内容
   with open(file, "rb") as f:
         content = f.read()

   # 检测文件的编码格式
   encoding = chardet2.detect(content)["encoding"]

   # 将文件内容转化为字符串
   content_str = content.decode(encoding)

   # 将文件内容保存到txt文件中
   with open(os.path.splitext(file) + ".txt", "w", encoding="utf-8") as f:
         f.write(content_str)

# 使用多线程处理文件
with concurrent.futures.ThreadPoolExecutor() as executor:
# 遍历指定目录下的所有文件
for root, dirs, files in os.walk(file_path):
   # 处理每个文件
   for file in files:
         file = os.path.join(root, file)
         executor.submit(process_file, file)

print('处理完成')

````

龍謹发表于 2023-5-18 06:55

谢谢分享，学习加使用。

pdfhvy141 发表于 2023-5-18 07:29

谢楼主分享，成品请发个

wobzhidao 发表于 2023-5-18 08:30

这个好，省了不少事。

mac666 发表于 2023-5-18 08:33

感谢分享

Easonll 发表于 2023-5-18 08:49

图片也能转吗

yulinsoft 发表于 2023-5-18 08:49

lingwushexi 发表于 2023-5-18 09:01

感谢分享，学习学习{:1_921:}

fryant 发表于 2023-5-19 14:56

yulinsoft 发表于 2023-5-18 08:49
不能用，
Traceback (most recent call last):
File "C:%users\liwei\Desktop\wps2txt.py", line 20, i ...

把encoding换成'utf-8'，'ascii'或者'latin-1'试试哦~

fryant 发表于 2023-5-19 14:57

Easonll 发表于 2023-5-18 08:49
图片也能转吗

不支持图片的哦~

页: [1] 2

吾爱破解 - 52pojie.cn's Archiver

批量将wps转txt