Merge many PDFs with Python

While there are many tools to merge single PDF files to one big PDF file most of them fail if you try to merge many files, e.g. more than 10000. Therefore, I wrote a little python script which takes batches of files and merges them. These batches can be merged again.

Requirements:

  • Python 3x
  • Libraries: os, sys, PyPDF2
  • basic python knowledge

The script as given below worked fine for me, but it can be easily improved to fit further needs, e.g.:

  • remove specific PDFs from “the merge”
  • run recursively through directories or merge the batches again.
  • hide not so important warning (like “PdfRead Warning: Superfluous white space found….”)

The feedback of this script should look like this:

Here is the script. Copy & Paste it in your favorite editor, adapt the variables and run.

Variables to adapt:

  • Path of your PDF directory “dir_in”.
  • Size of your batches “batchsize”.
  • Name of the output dir “dir_out”

Reference: https://pypi.org/project/PyPDF2/

About ralf

I studied Geology at University Erlangen and got my PhD (bio-nanotechnology) at TU Dresden. In my spare time i program simulations and tinker around with data prediction methods. Frisbee is my favorite sport and i play guitar when my friends and i meet to make some music.
This entry was posted in coding, python and tagged . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.