Are you looking for ways to better optimize your PDF files to make them more streamlined for your workflow? In this two-part post, we’ll help you go from ‘bloated’ PDFs (and we’ll get into what we mean by ‘bloated’) to what we like to call ‘leaner, cleaner’ PDFs.
First, it’s important to understand some of the common potential problems that can occur with PDF documents. The PDF file format itself is a vast, feature-rich format that nearly everyone is familiar with, however they don’t always realize the complexity behind it. Today, hundreds if not thousands of different PDF Processors exist. Some have been written from scratch, and some have been spun off from another, in different languages. Not to mention there are images, fonts, forms, embedded thumbnails, annotations, metadata, and more that lie in the PDF file itself. It’s easy to see how working with the PDF format itself can be challenging when dealing with it across different products.
Let’s look at an example of how a PDF document can be problematic:
Imagine you have a 100,000-page document with virtually millions of images and forms throughout it. Let’s say you take your favorite PDF software and extract 10 pages out of it to create a new document. Depending on the software you’re using, all of those millions of images and forms we just mentioned may have ‘come along for the ride’ when creating the extracted document. So even though those 10 pages might not be using nearly all of these millions of images and forms, they all exist in the extracted document. This is what we mean by a ‘bloated PDF.’
The common problems with so-called ‘bloated’ PDFs are:
- They can slow down processing speed of PDF Viewers
- They’re going to impact different workflows like PDF conversion, printing, editing, or document processing which can all be significantly impacted by bloated PDFs
- They can cause problems like hanging or crashing
For example, your PDF software may get hung up trying to process a PDF because maybe it’s not a true hang. Perhaps after 7 hours the processing will be complete, and things will return to normal. But in the real world, no user is going to spend 7 hours waiting for your software to finish running to continue processing documents. This leads to end-user frustration where they file bug reports against your software, something like “Your software creates PDFs that cause all of my customers to experience crashes when trying to open them, your software stinks.” Harsh, but that’s the reality.
PDF Optimization Tools: What to Know
Be careful of many so-called PDF optimization tools in the marketplace, because they don’t all behave well with PDFs. Some will create PDF documents that are simply corrupt; something that will then refuse to open in your PDF Viewer. You can have PDFs created that, despite the name of the product, the output becomes larger than the input PDF. There can be missed opportunities for bloated PDFs that suffer from easily correctable issues and for whatever reasons; time, resources, understanding, etc. the authors didn’t implement fixes for such issues and the output doesn’t get reduced in size at all. You can have inaccurate or incorrect output, such as mishandled complex ColorSpace representations in a PDF. Sometimes content that PDF software can’t understand is simply dropped from the page and lost forever, e.g. JBIG2 is a compression type typically handled by commercial software and support is not widespread.
As an example of inaccurate output, let’s say you are in the healthcare space and you have patients that have to sign an authorization of release of information on a form. The patient signs and then the form is saved as a PDF and the image is saved as an image on the page. Let’s say you pass such medical release forms through some PDF Optimization software that just obliterates the image and that signature is no longer readable. In the name of saving space, the optimization software really wrecked the image, all a person can make out is little scratch marks from what used to be a signature. Well now you’re in a pickle, if you’re faced with a lawsuit and you need to show evidence to the court that you had authorization from the patient, well if the judge can’t read it and it doesn’t look like a real person’s signature, and now you’re really stuck. So, it’s easy to see how fidelity is important when dealing with PDF Optimization.
When it comes to things PDF and PDF optimization, the do-it-yourself (DIY) route is not the best option. As we mentioned before the format is quite complex and support among vendors varies dramatically. It’s estimated that if you wanted to write just a subset of a good PDF optimizer, you’re talking about several years of effort to do this. If you’re talking about the full set of what a good PDF optimizer should do, you’re talking decades of effort. No company today that’s looking to be profitable can sacrifice that much time on what may only be one aspect of their business.
Now that we’ve talked about the problems of bloated PDFs, what can you do to achieve a leaner, cleaner PDF? Check out part two of this post to find out!