Python在Linux环境下Word转PDF | 您所在的位置:网站首页 › 批量文档转pdf › Python在Linux环境下Word转PDF |
有定制化需求可私信联系
有完美 PDF 转 Docx 方案,可私信联系 文章目录 简介对比win32comcomtypesdocx2pdfAbiWordLibreOfficeWindowsLinux命令行参数 ~~wvPDF~~~~aspose-words~~~~unoconv~~易源数据API九云图APIWPS+pywpsrpcSupervisor 守护进程封装遇到的坑参考文献 简介Word 转 PDF,且需要能够在 Linux 上部署 1.docx(0118) Linux 环境下建议使用 WPS+pywpsrpc win32com代码 from pathlib import Path from win32com.client import Dispatch, constants, gencache docx_path = str(Path('1.docx').absolute()) pdf_path = str(Path('1.pdf').absolute()) gencache.EnsureModule('{00020905-0000-0000-C000-000000000046}', 0, 8, 4) wd = Dispatch('Word.Application') doc = wd.Documents.Open(docx_path, ReadOnly=1) doc.ExportAsFixedFormat(pdf_path, constants.wdExportFormatPDF, Item=constants.wdExportDocumentWithMarkup, CreateBookmarks=constants.wdExportCreateHeadingBookmarks) wd.Quit(constants.wdDoNotSaveChanges)效果 第一页第二页![]() ![]() 优点:代码简洁,效果好 缺点:需要安装Microsoft Word,不支持 Linux comtypes安装 pip install comtypes代码 from pathlib import Path import comtypes.client word = comtypes.client.CreateObject('Word.Application') doc = word.Documents.Open(str(Path('1.docx').absolute())) wdFormatPDF = 17 doc.SaveAs(str(Path('1.pdf').absolute()), FileFormat=wdFormatPDF) doc.Close() word.Quit()效果 第一页第二页![]() ![]() 优点:代码简洁,效果好 缺点:需要安装Microsoft Word,不支持 Linux wdFormatPDF 的值为 17 查阅 WdSaveFormat enumeration docx2pdf原理是调用 Microsoft Word,所以必须安装 Windows:win32comMac:JXA安装 pip install docx2pdf命令行 docx2pdf 1.docx代码 from docx2pdf import convert convert('1.docx') print('转换完成')效果 第一页第二页![]() ![]() 优点:速度快,效果好 缺点:需要安装Microsoft Word,不支持 Linux AbiWordLinux 安装 sudo apt-get install abiword命令行 abiword -t pdf 1.docx代码 from subprocess import Popen file = '1.docx' Popen(['abiword', '-t', 'pdf', file]).communicate() print('转换完成')命令行参数详细阅读:Command line options 效果 第一页第二页第三页![]() ![]() ![]() 优点:占用空间小,安装方便 缺点:不支持 Windows,导出可能不完整 LibreOffice Windows安装 LibreOffice 添加环境变量 PATH:C:\Program Files\LibreOffice\program 命令行 soffice.exe --convert-to pdf 1.docx代码 from subprocess import Popen file = '1.docx' Popen(['soffice.exe', '--convert-to', 'pdf', file]).communicate() print('转换完成') Linux从 Index of libreoffice 找到合适的文件(x.x.x 指具体的版本,如 7.1.8) LibreOffice_x.x.x_Linux_x86-64_deb.tar.gz LibreOffice_x.x.x_Linux_x86-64_deb_sdk.tar.gz LibreOffice_x.x.x_Linux_x86-64_deb_langpack_zh-CN安装 mkdir -p ~/download/libreoffice/deb cd ~/download/libreoffice wget http://mirrors.ustc.edu.cn/tdf/libreoffice/stable/7.1.8/deb/x86_64/LibreOffice_7.1.8_Linux_x86-64_deb.tar.gz wget http://mirrors.ustc.edu.cn/tdf/libreoffice/stable/7.1.8/deb/x86_64/LibreOffice_7.1.8_Linux_x86-64_deb_sdk.tar.gz wget http://mirrors.ustc.edu.cn/tdf/libreoffice/stable/7.1.8/deb/x86_64/LibreOffice_7.1.8_Linux_x86-64_deb_langpack_zh-CN.tar.gz tar -zxvf LibreOffice_7.1.8_Linux_x86-64_deb.tar.gz -C deb tar -zxvf LibreOffice_7.1.8_Linux_x86-64_deb_sdk.tar.gz -C deb tar -zxvf LibreOffice_7.1.8_Linux_x86-64_deb_langpack_zh-CN.tar.gz -C deb sudo dpkg -i LibreOffice_7.1.8.1_Linux_x86-64_deb/DEBS/*.deb sudo dpkg -i LibreOffice_7.1.8.1_Linux_x86-64_deb_sdk/DEBS/*.deb sudo dpkg -i LibreOffice_7.1.8.1_Linux_x86-64_deb_langpack_zh-CN/DEBS/*.deb命令行 soffice --headless --convert-to pdf 1.docx 或 libreoffice --headless --convert-to pdf 1.docx效果 第一页第二页第三页![]() ![]() ![]() 删除安装包 cd ~ rm -r ~/download/libreofficeTODO:卸载(要一个个删,很多) dpkg -l | grep LibreOffice sudo dpkg -r xxx 命令行参数详细阅读:LibreOffice command line parameters 指定保存位置(不含文件名):--outdir wvPDF安装 sudo apt-get install wv texlive-base texlive-latex-base ghostscript命令行 wvPDF 1.docx 1.pdf效果:这份文档无法导出 aspose-words安装 pip install aspose-words代码 import aspose.words as aw doc = aw.Document('1.docx') doc.save('1.pdf') print('转换完成')效果 第一页第二页第三页![]() ![]() ![]() 安装 sudo apt-get install unoconv命令行 unoconv -fpdf 1.docx doc2pdf 1.docx效果 第一页第二页第三页![]() ![]() ![]() 代码 import base64 import requests encoded = base64.b64encode(open('1.docx', 'rb').read()) url = 'https://word2pdf.showapi.com/word2pdf' appcode = '你的AppCode' headers = { 'Authorization': 'APPCODE ' + appcode, 'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8' } r = requests.post(url, data={'base64': encoded}, headers=headers, verify=False) print(r) print(r.json()['showapi_res_body']['url'])效果 第一页第二页第三页第四页![]() ![]() ![]() ![]() 创建转换任务 import json import requests url = 'https://api.docsdk.com/v2/jobs' apiKey = '你的apiKey' data = { 'tasks': { 'ImportURL': { 'operation': 'import/url', 'url': 'https://image2.9yuntu.cn/resources/api/九云图API使用说明.docx' }, 'ConvertFile': { 'input': ['ImportURL'], 'operation': 'convert', 'output_format': 'pdf' }, 'ExportResult': { 'input': ['ConvertFile'], 'operation': 'export/url' } } } headers = { 'Authorization': 'Bearer ' + apiKey, 'Content-Type': 'application/json' } r = requests.post(url, data=json.dumps(data), headers=headers) print(r) result = r.json() print(result) print(result['data']['status']) print(result['data']['job_id'])主动查询转换结果(复制上一步的 job_id) import requests jobId = '上一步的job_id' url = f'https://api.docsdk.com/v2/jobs/{jobId}' apiKey = '你的apiKey' headers = { 'Authorization': 'Bearer ' + apiKey } r = requests.get(url, headers=headers) print(r) result = r.json() print(result) status = result['data']['status'] print(status) if status == 'finished': tasks = result['data']['tasks'] for task in tasks: files = task['result']['files'] print(task['id'], task['status'], str(task['percent']) + '%') for index, file in enumerate(files): print(index, file['filename'], file.get('url', '无下载地址'))安装 Python SDK pip install docsdk封装 import docsdk docsdk.configure(api_key='你的apiKey') payload = { 'tasks': { 'ImportURL': { 'operation': 'import/url', 'url': 'https://image2.9yuntu.cn/resources/api/九云图API使用说明.docx' }, 'ConvertFile': { 'input': ['ImportURL'], 'operation': 'convert', 'output_format': 'pdf', }, 'ExportResult': { 'input': ['ConvertFile'], 'operation': 'export/url' } } } res = docsdk.Job.create(payload=payload) # 创建转换任务 print(res) job_id = res['id'] res = docsdk.Job.wait(job_id) # 等待转换结果 file = res['tasks'][-1]['result']['files'][0] filename = docsdk.download(url=file['url'], filename=file['filename']) # 下载文件 print(filename)效果 第一页第二页![]() ![]() 此方法需要 X11 查看是否已安装 X11 dpkg -l | grep xserver-xorg-core查看是否开启 X11 cat /etc/ssh/sshd_config | grep X11Forwarding查看架构 dpkg --print-architecture根据架构下载 WPS Office 2019 for Linux,如本人为 amd64,选择 X64 安装(本人实际安装的是wget https://wdl1.pcfg.cache.wpscdn.com/wpsdl/wpsoffice/download/linux/10920/wps-office_11.1.0.10920.XA_amd64.deb,以下为新版) mkdir -p ~/download/wps cd ~/download/wps wget https://wdl1.cache.wps.cn/wps/download/ep/Linux2019/10920/wps-office_11.1.0.10920_amd64.deb sudo dpkg -i wps-office_11.1.0.10920_amd64.deb sudo apt-get install qt5-default pip3 install pywpsrpc wget https://raw.githubusercontent.com/timxx/pywpsrpc/master/examples/rpcwpsapi/convertto/convertto.py打开 WPS(应该是中文的) wps 1.docx弹出,同意即可(如果没有弹出,要装 X11,阅读 Linux安装X11实现GUI) 这时可能会报错,再运行一次即可 Convert failed: Details: Can't get the application ErrCode: 0x80000008
安装无界面模式 sudo apt install xserver-xorg-video-dummy vim dummy.conf输入以下内容 Section "Monitor" Identifier "dummy_monitor" HorizSync 28.0-80.0 VertRefresh 48.0-75.0 Modeline "1920x1080" 172.80 1920 2040 2248 2576 1080 1081 1084 1118 EndSection Section "Device" Identifier "dummy_card" VideoRam 256000 Driver "dummy" EndSection Section "Screen" Identifier "dummy_screen" Device "dummy_card" Monitor "dummy_monitor" SubSection "Display" EndSubSection EndSection启动虚拟界面 X :0 -config dummy.conf另外开一个 SSH,实现无界面运行 export DISPLAY=localhost:0.0 echo $DISPLAY python3 convertto.py -f pdf 1.docx效果 第一页第二页![]() ![]() 删除安装包 cd ~ rm -r ~/download/wps卸载 dpkg -l | grep wps-office sudo dpkg -r wps-office sudo apt-get --purge remove wps-office Supervisor 守护进程创建 X11 的配置文件 xorg.conf sudo vim /usr/share/X11/xorg.conf.d/xorg.conf填入内容,具体含义见 xorg.conf Section "Monitor" Identifier "dummy_monitor" HorizSync 28.0-80.0 VertRefresh 48.0-75.0 Modeline "1920x1080" 172.80 1920 2040 2248 2576 1080 1081 1084 1118 EndSection Section "Device" Identifier "dummy_card" VideoRam 256000 Driver "dummy" EndSection Section "Screen" Identifier "dummy_screen" Device "dummy_card" Monitor "dummy_monitor" SubSection "Display" EndSubSection EndSection创建子进程配置文件 sudo vim /etc/supervisor/conf.d/X11.conf填入内容 [program:X11] command=X :0 -config /usr/share/X11/xorg.conf.d/xorg.conf autostart=true autorestart=true startsecs=10 stdout_logfile=/tmp/X11.stdout.log stderr_logfile=/tmp/X11.stderr.log重新读取配置并更新子进程 supervisorctl reread supervisorctl update查看进程状态 ps -ef | grep Xorg supervisorctl status | grep X11 封装 import os import argparse from pathlib import Path from pywpsrpc.common import S_OK from pywpsrpc.rpcwpsapi import wpsapi, createWpsRpcInstance os.environ['DISPLAY'] = ':0.0' # 设置环境变量,指定哪块屏幕 formats = { 'doc': wpsapi.wdFormatDocument, 'docx': wpsapi.wdFormatXMLDocument, 'rtf': wpsapi.wdFormatRTF, 'html': wpsapi.wdFormatHTML, 'pdf': wpsapi.wdFormatPDF, 'xml': wpsapi.wdFormatXML, } class ConvertException(Exception): def __init__(self, text, hr): self.text = text self.hr = hr def __str__(self): return 'Convert failed:\nDetails: {}\nErrCode: {}'.format(self.text, hex(self.hr & 0xFFFFFFFF)) def convert_to(paths, format='pdf'): hr, rpc = createWpsRpcInstance() if hr != S_OK: raise ConvertException('Can not create the rpc instance', hr) hr, app = rpc.getWpsApplication() if hr != S_OK: raise ConvertException('Can not get the application', hr) app.Visible = False # 不需要GUI docs = app.Documents for path in paths: path = Path(path) if path.is_file(): hr, doc = docs.Open(str(path.absolute()), ReadOnly=True) if hr != S_OK: raise ConvertException('can not open file {}'.format(path.name), hr) new_file = '{}.{}'.format(path.parent / path.stem, format) hr = doc.SaveAs2(new_file, FileFormat=formats[format]) if hr != S_OK: raise ConvertException('convert_file failed', hr) doc.Close(wpsapi.wdDoNotSaveChanges) app.Quit() if __name__ == '__main__': parser = argparse.ArgumentParser() parser.add_argument('path', nargs='+') args = parser.parse_args() paths = args.path print(paths) try: convert_to(paths) print('转换完成') except ConvertException as e: print(e) 遇到的坑1. 报错 NotImplementedError: docx2pdf is not implemented for linux as it requires Microsoft Word to be installed docx2pdf 不能在 Linux 上运行 2. 报错 unoconv: Cannot find a suitable pyuno library and python binary combination in /usr/lib/libreoffice sudo vim /usr/bin/unoconv将首行 #!/usr/bin/env python3改为 #!/usr/bin/python33. 报错 The program ‘libreoffice’ is currently not installed. To run ‘libreoffice’ please ask your administrator to install the package ‘libreoffice-common’ sudo apt-get libreoffice-common4. 报错 Error: source file could not be loaded sudo apt-get libreoffice-writer5. WPS 的方案,测试代码能正常转换,部署代码报错 Can not get the application 检查环境变量 PATH 是否添加 WPS 所在目录:/usr/bin import os print(os.environ['PATH']) print('/usr/bin' in os.environ['PATH'])可能不支持并发多线程 X11 可能不支持不同用户或无 sudo 权限用户使用,Can’t start X11 applications after “su” or “su -” to another user 6. WPS 的方案,报错 convert_file failed ErrCode: 0x80010105 没有写入权限 7. WPS 可能有 BUG,需要切换为多组件模式 设置多组件模式流程:wps → 右上角设置按钮 → Settings → Others → Change window manage mode… → 选择【Multi-Module Mode】 8. Errors were encountered while processing cd /var/lib/dpkg sudo mv info info.bak sudo mkdir info sudo apt-get update 参考文献 pywpsrpc GitHubwpsrpc-sdk GitHubwps_cppWPS 开放平台WPS C++ 集成源码WPSOffice二次开发帮助文档docx2pdf GitHubConverting docx to pdf with pure python (on linux, without libreoffice)LibreOffice command line parametersHow to convert Word (doc) to PDF in linuxunoconv: Cannot find a suitable pyuno library and python binary combination · Issue #49 · unoconv/unoconvAbiWord vs LibreOffice 2022 ComparisonLinux 下的LibreOffice安装pywpsrpc Run on ServerLinux deb 软件包管理Linux安装X Window服务——远程显示GUIlinux服务器通过X11实现图形化界面显示Error: source file could not be loadedPython 中docx转pdfcomtypes Documentationcomtypes GitHububuntu 下使用python操作wps文档和表格python 设置linux环境变量python 如何设置linux环境变量?What is the $DISPLAY environment variable?Python实现的进程管理神器——SupervisorWhere is the X.org config file? How do I configure X there?并发执行文件转换的程序获取不到applicationCan’t start X11 applications after “su” or “su -” to another userxserver - What are xhost and xhost +si?How can I run /usr/bin/Xorg without sudo?/usr/bin/xauth: file /…/.Xauthority does not existWord转PDF-阿里云Requests DocumentationPython图片转base64九云图 - API文档docsdk GitHub |
CopyRight 2018-2019 实验室设备网 版权所有 |