PDF Optimization Horror Stories
Back in the early aughts, when I first started encountering problematic PDFs on a professional basis, I had a two-part mantra when it came to this file format: “It’s a horrible format; it just happens to be better than all of the alternatives.” Twenty-some years later, the rough edges have been worn down, and we PDF software creators and consumers understand the file format a lot better.
Most often, these days problematic PDFs are manifestations of edge-cases where an unexpected choice was made in creating a file. My current favorite example of that is a file whose text was filled with a tiling pattern that included an image that was removed by our PDF Optimizer because we weren’t expecting to have to check the colorspaces used by text in order to find if an image resource was being used or not. Oops. But edge-case oversights like that aren’t the stuff from which nightmares are made of.
The real nightmare files tend to sneak up unexpectedly. A small change that suddenly cause a file to balloon enormously in size; a file whose processing simply never ends, just quietly chewing up CPU time while the user wonders impatiently what in the world is going on. And the worst is when the usual bag tools for finding the sources of these problems fails unexpectedly and you are playing hide-and-seek with a killer bug that’s absolutely murdering your productivity while the clock is ticking.
Of course, none of that compares to the horror of realizing that having promised horror stories about PDF Optimization specifically and after desperately searching through thousands of case histories and pull requests, that the source material might be a tiny bit thinner than originally anticipated.
PDF Days Europe Presentation
At PDF Days Europe, last month, I shared 4 horror stories:
- A 4MB document that grew to nearly 20 times larger in size when merged with a 100kb coversheet. The clues pointed to the tagged structure tree as the culprit, but comparing the original with the merged document, the structure tree seemed identical.
- A PDF/A document that more than doubled in size when it was reconverted to a PDF/A document. Again, the tagged structure tree was suspected; it all looked perfectly normal, but this tree was hiding phantom limbs…
- Saving a particular document as a web-optimized document turned into a Wait for Godot as APDFL descended into a labyrinth of Form XObject resource dictionaries. Using PDF Optimization instead cut through that Gordian knot; it’s Great.
- Tragically, I ended with a morality tale reminiscent of Humpty-Dumpty’s tale of woe: if you take a document with, say, 25 thousand pages, using, say, 9 fonts and copy each page one-by-one, you will create lots and lots of copies of identical fonts, but if you then subset each of those identical fonts; these copies will no longer be identical and cannot be safely reconsolidated. Adobe PDF Library and PDF Optimizer will try of course, but if the document is large enough the task turns into a Sysiphusian punishment from the Gods.
These tales are from our past, but beware that new horrors could ensnare us when we least expect it as we lose our wariness and relax our vigilance as these memories fade into oblivion…
[Queue the Vincent Price cackle clip from Michael Jackson’s thriller].
Standardize your PDF outputs
Sign up for a free trial of PDF Optimizer now and unlock the ability to optimize PDFs while maintaining essential quality through user-specified conditions and content adjustments.
Want to optimize PDFs with an SDK? Start your free trial of Adobe PDF Library here.
Looking for a containerized version of APDFL instead? Click here.