Adobe started the PDF revolution in 1993, when they released Adobe Acrobat 1.0. At the time, DOS\Windows, Mac and Unix had their own ways of handling files and fonts and there was minimal interoperability of files between operating systems. With the release of Adobe Acrobat, it became possible to view and print a file on one of these operating systems that had been created on another. There were several competing document formats at the time, including offerings from Interleaf, Microsoft and Folio, but no particular format had a commanding market share. Adobe initially charged $50 per desktop for Acrobat until the IRS purchased a license to distribute Acrobat Reader making it seem free to those who obtained it this way.

It was only after Adobe decided to make the Adobe Reader a truly free offering that PDF went viral and had widespread adoption. This may be the earliest and most successful version of the freemium software business model that is so popular today.

When PDF took off, it was generally true that PDF files were created with Adobe tools and viewed or printed with Adobe tools. PDF became an ISO standard in 2008 and there are now trillions of PDF files, tens of thousands of PDF creation tools, and thousands of unique PDF viewing applications. These may be built upon Adobe technology, built with 3rd party commercial technology, or with any of the variety of open source PDF projects.

The PDF specification is publicly available and has become a rather complex and ambiguous document. This image, taken from the SafeDocs presentation given by Peter Wyatt, Principal Scientist of the PDF Association (screenshot below), shows the complexity of the current PDF specification and the number of external documents it references. Among the many external documents the ISO standard for PDF references, there are 6 different versions of the Unicode specification.

Based on the complexity of the specification and the number of external references, is it any wonder that different vendors’ products might create or interpret the same file differently?

At Datalogics, we see problematic PDF files all the time. Some are just poorly created and others are outright invalid. We receive files from customers that render as intended with one PDF viewer but not with another. We receive files that print as intended with one RIP but not with another. We receive files where the internal text cannot be searched or extracted consistently. Based on years of experience dealing with customer files and working with PDF technology, we have developed our own opinions about which tools create quality PDF files and which do not. One particular creation tool we found was a prime example of what a tool can do to create bad PDFs. The tool repeated commands that did not need to be repeated, created empty save/restore groups, defined crop boxes that were larger than the page itself, built empty dictionaries, and used a dictionary color space for device gray and device RGB – all of which contributed to the creation of bad PDFs. Unfortunately, this is not a unique example, there are lots of tools that you should stay far away from. We see good and bad PDF files all the time.

Datalogics would define quality PDF as having a valid syntax, designed to render responsively, and configured optimally for its intended use case. A quality PDF for a high-resolution digital press would not be optimal for consumption on a mobile device, nor would it be optimal for meeting regulatory compliance with long term archival requirements.

It is widely known that Adobe Acrobat DC (Document Cloud) and the Adobe PDF Library can open many PDF files which do not strictly comply with the PDF syntax requirements. When Acrobat saves these PDF files they may be subtly changed to make them more compliant. Datalogics has enhanced the Adobe PDF Library over the years to allow it to open additional improperly formed PDF files with a setting we refer to as “relaxed syntax”. When such files are written by the Adobe PDF Library or our PDF Optimizer tools, corrections are applied when possible to make them more acceptable to a wider array of PDF tools. Additionally, Datalogics’ PDF Checker was created as a free tool to identify problems in PDF files or content conditions which may not be optimal for specific use cases. One of the things that PDF Checker can tell you is whether “relaxed syntax” was required to check a particular file. We find that “relaxed syntax” is often required to check the content and consistency of the PDF files we receive.

Due to the complexity of the PDF file format, the ambiguities of the specification, and the wide array of available PDF creators and consuming applications, it is no longer safe to assume that a PDF file created by a specific PDF tool will render responsively, view or print as expected with another. As they say – caveat emptor, be sure that the PDF tools you use will create good PDFs.

So I have to ask, is PDF really still portable?8

Share this post with your friends

Share on facebook
Share on twitter
Share on linkedin

Leave a Comment

Your email address will not be published. Required fields are marked *

Get instant access to the latest PDF news, tips and tricks!

Do you want monthly updates on the latest document technology trends?

By submitting the form, you agree to receive marketing emails from Datalogics. You may unsubscribe at any time. 

Like what you're reading?

Get Datalogics blogs sent right to your inbox!

By submitting the form, you agree to receive marketing emails from Datalogics. You may unsubscribe at any time.