文件夹中有很多繁体版word文档,要批量转换成简体版的:
可以用OpenCC库。OpenCC(Open Chinese Convert)是一个开源的中文简繁转换库,旨在提供高质量的简繁体转换功能。它支持多种编程语言接口,包括C++、Python、Java和JavaScript等,使得不同背景的开发者可以轻松集成到自己的应用中。
在ChatGPT中输入提示词:
写一个Python脚本,完成繁体转换简体的任务,具体步骤如下:
打开文件夹:"F:\aivideo";
用win32com 库读取里面所有的docx文档;
用OpenCC库将docx文档中的繁体字转为简体字;
只转换字体,要保持原排版不变,包括原word文档中的图片、表格、格式等都保持原样。
注意:每一步都要输出信息到屏幕上
为了确保文档中的图片、表格和格式在转换过程中保留不变,需要确保仅修改文本内容而不影响其他内容。
在Python环境中,可以使用pip命令安装OpenCC:
pip install opencc
源代码:
import os
from win32com import client
from opencc import OpenCC
def convert_text_traditional_to_simplified(text, opencc):
"""Convert text from Traditional Chinese to Simplified Chinese using OpenCC."""
return opencc.convert(text)
def process_word_document(doc_path, output_path):
# Initialize OpenCC for traditional to simplified conversion
opencc = OpenCC('t2s')
# Start Word application
print(f"正在读取文件: {doc_path}")
word = client.Dispatch("Word.Application")
word.Visible = False
doc = word.Documents.Open(doc_path)
# Iterate through paragraphs and convert text
for paragraph in doc.Paragraphs:
original_text = paragraph.Range.Text
converted_text = convert_text_traditional_to_simplified(original_text, opencc)
if original_text != converted_text:
paragraph.Range.Text = converted_text
# Iterate through tables and convert text
for table in doc.Tables:
for row in table.Rows:
for cell in row.Cells:
original_text = cell.Range.Text
converted_text = convert_text_traditional_to_simplified(original_text, opencc)
if original_text != converted_text:
cell.Range.Text = converted_text
# Save the converted document
print(f"正在保存转换后的文件到: {output_path}")
doc.SaveAs(output_path)
doc.Close(False)
word.Quit()
def process_folder(folder_path):
if not os.path.exists(folder_path):
print(f"文件夹不存在: {folder_path}")
return
docx_files = [f for f in os.listdir(folder_path) if f.endswith('.docx')]
if not docx_files:
print("文件夹中没有找到.docx文件")
return
for docx_file in docx_files:
docx_path = os.path.join(folder_path, docx_file)
output_path = os.path.join(folder_path, f"converted_{docx_file}")
# Convert the docx file
process_word_document(docx_path, output_path)
print(f"文件转换完成: {output_path}")
# Specify the folder path
folder_path = "F:\\aivideo"
# Process the folder
process_folder(folder_path)
转换完成后的word文档: