PDF files are too big.
PDF is too slow.
When I click on a link to a PDF, my screen goes blank.
These are the most common complaints about PDF that I’ve heard over the years… and all three are real problems with PDF. But they don’t have to be. PDF is one of the most robust file formats for communicating final form documents ever created in human history. PDF is used by nearly every government agency in the world, nearly every regulated industry, court, manufacturer, bank, commercial printer, the list goes on. But each of these groups uses PDF for slightly different reasons and in slightly different ways. For example, a PDF file created by a user in a government agency might contain structure information that is essentially invisible to the sighted but absolutely essential for people using a screen reader. However, that same structure information is unnecessary for printing the document. Now, generally, structure information in a PDF doesn’t take up all that much room but the inverse example isn’t always true. Image information necessary to print a document at high resolution and in full color can take up a lot of space but most of that data is tossed away when pages get rendered to the screen.
The root of this problem is the fact that most of the PDF files created by individuals are created in the same way; using the application defaults in their PDF tool of choice. While Adobe Acrobat does provide a very good set of defaults, even it can’t guess what you might be using the PDF file for further along in it’s life cycle so it can’t streamline the PDF to a specific use case without the user telling it to. And then there are the really bad PDF tools. These tools don’t necessarily create bad PDF in that they do conform to the specification, they just do things in really bizarre ways; which is understandable given the complexities of the PDF Specification. Typically, these tools only target the visual representation with no regard for the underlying structure that allows for content reuse or much of anything other than printing the file. And finally there is that whole big mess in the middle; PDF tools that get most things right, or close to right, and rely on the fact that Adobe Reader will fix it up automatically before displaying it.
So… out of this swirl of creation tools and use cases has emerged a sort of PDF aftermarket toolset designed to take the standard output of the various PDF creation tools and optimize it. This article is the first in a series that discusses various aspects of PDF Optimization. There’s a lot of PDF expertise here at Datalogics and the goal of this series is to share that expertise with you to help you better understand what is involved with PDF Optimization, set expectations for what can be optimized, and debunk some myths.
What You Can Look Forward To:
The next article will discuss one of the hard facts of PDF Optimization; it’s lossy. You’re going to remove data and with that, you’re going to limit what the PDF file is useful for. The term PDF Optimization is generally used… well… generally… too generally. People who say it know what they mean but the people who hear it don’t necessarily hear the same thing. You can’t have a reasonable discussion about Optimization without knowing what you’re optimizing for; what application or use case are you targeting? The target use case will pretty much determine how big is “too big” and inform how much loss of data can be tolerated. And you are going to lose data. But what might you gain? Faster download times? Faster rendering? If you’re just removing unused named destinations, you may effectively be losing nothing.
Is PDF Optimization lose-lose or can something be gained by intelligently streamlining the file?
The remaining articles will discuss…
- Image downsampling, which is probably the easiest optimization to get right.
- Coalescing font subsets which is probably the hardest to get right and with certain files may be impossible.
- Is Refrying even an option? Sometimes it seems like you just have to print the PDF to PostScript and convert it back to PDF. Yes – it’s ugly, but does it work?
- How to manage the user’s expectations. We’ll discuss what to do when “as small as you can go” is still too big.
These are the topics on my list but we’d like to hear from you as well. Send us your use cases for PDF Optimization. How big is too big? How helpful would auditing the space usage be? Leave a comment and share your thoughts.