在Python中使用PDF：插入，删除和重新排序页面

2024-06-28 09:23| 来源: 网络整理| 查看: 265

本文是有关在Python中使用PDF的系列文章中的第三篇：

阅读和拆分页面

添加图像和水印

插入，删除和重新排序页面(您在这里)

介绍

本文是有关在Python中使用PDF的系列文章的第三部分。在之前的文章中，我们介绍了如何使用Python阅读PDF文档。到目前为止，您已经了解了如何处理现有的PDF，以及如何读取和提取文本和图像等内容。此外，我们已经讨论了将文档拆分为单个页面，以及添加水印和条形码的问题。

现在，在本文中，我们将更进一步，演示如何以几种不同的方式重新排列PDF文档。

使用pdfrw删除页面

用PyMuPDF删除页面

用PyMuPDF插入页面

使用PyPDF2分割偶数页和奇数页

使用pdfrw删除页面

从PDF文件中删除单个页面非常简单，如下所示：

读取PDF作为输入文件

将所选页面作为输出文件写入新的PDF

下面的示例从PDF文档中删除前两页。使用pdfrw库，首先在PdfReader()类的帮助下读取文件。除了第一页和第二页，每个页面都使用addpage()方法添加到输出文件，然后最终写入磁盘。

图1显示了在四页PDF文件上执行代码时的输出。

1234567891011121314151617181920# !/usr/bin/python # Remove the first two pages (cover sheet) from the PDF from pdfrw import PdfReader, PdfWriter input_file ="example.pdf" output_file ="example-updated.pdf" # Define the reader and writer objects reader_input = PdfReader(input_file) writer_output = PdfWriter() # Go through the pages one after the next for current_page in range(len(reader_input.pages)): if current_page > 1: writer_output.addpage(reader_input.pages[current_page]) print("adding page %i" % (current_page + 1)) # Write the modified content to disk writer_output.write(output_file)

用PyMuPDF删除页面

PyMuPDF库带有许多复杂的方法，这些方法可简化从PDF文件中删除页面的过程。它允许您指定单个页面(使用deletePage()方法)或页面编号范围(使用deletePageRange()方法)，或带有页面编号的列表(使用select()方法)。

下面的示例将演示如何使用列表来选择要保留在原始文档中的页面。请注意，未指定的页面将不属于输出文档。在我们的例子中，输出文档仅包含第一页，第二页和第四页。

123456789101112131415# !/usr/bin/python # Recall that PyMuPDF is imported as fitz import fitz input_file ="example.pdf" output_file ="example-rearranged.pdf" # Define the pages to keep - 1, 2 and 4 file_handle = fitz.open(input_file) pages_list = [0,1,3] # Select the pages and save the output file_handle.select(pages_list) file_handle.save(output_file)

用PyMuPDF插入页面

PyMuPDF库还允许您插入页面。它提供了用于添加完全空白页面的方法newPage()和用于添加现有页面的方法insertPage()。下一个示例显示了如何在另一个PDF文档的末尾添加来自另一个PDF文档的页面。

1234567891011121314# !/usr/bin/python # Recall that PyMuPDF is imported as fitz import fitz original_pdf_path ="example.pdf" extra_page_path ="extra-page.pdf" output_file_path ="example-extended.pdf" original_pdf = fitz.open(original_pdf_path) extra_page = fitz.open(extra_page_path) original_pdf.insertPDF(extra_page) original_pdf.save(output_file_path)

使用PyPDF2分割偶数页和奇数页

下面的示例使用PyPDF2，并通过获取文件，将其分为偶数页和奇数页，将偶数页保存在文件even.pdf中以及将奇数页保存在odd.pdf中来实现。

该Python脚本以两个输出文件even.pdf和odd.pdf以及它们相应的编写器对象pdf_writer_even和pdf_writer_odd的定义开始。接下来，在for循环中，脚本遍历整个PDF文件，然后依次读取一页。使用addpage()将具有偶数页码的页面添加到流pdf_writer_even中，将奇数添加到流pdf_writer_odd中。最后，这两个流按照之前的定义分别保存到磁盘中的单独文件中。

1234567891011121314151617181920212223242526272829303132#!/usr/bin/python3 from PyPDF2 import PdfFileReader, PdfFileWriter pdf_document ="example.pdf" pdf = PdfFileReader(pdf_document) # Output files for new PDFs output_filename_even ="even.pdf" output_filename_odd ="odd.pdf" pdf_writer_even = PdfFileWriter() pdf_writer_odd = PdfFileWriter() # Get reach page and add it to corresponding # output file based on page number for page in range(pdf.getNumPages()): current_page = pdf.getPage(page) if page % 2 == 0: pdf_writer_odd.addPage(current_page) else: pdf_writer_even.addPage(current_page) # Write the data to disk with open(output_filename_even,"wb") as out: pdf_writer_even.write(out) print("created", output_filename_even) # Write the data to disk with open(output_filename_odd,"wb") as out: pdf_writer_odd.write(out) print("created", output_filename_odd)

结论

使用pdfrw，PyMuPDF和PyPDF2库可以很容易地重新编写和重新排列PDF的结构。仅需几行Python代码，您就可以删除页面，分离页面并添加新内容。

【本文地址】

公司简介

联系我们