Setting PDF Boundaries to Avoid Unwanted Outcomes

Here in Support, we often receive problem reports from customers whose applications or documents showed no obvious signs of distress in the past, and yet something bad has now happened for which they can find no cause. In one recent case, an otherwise unremarkable test document generated a mysterious popup error in Acrobat when advancing to the next page:

Fine! Lovely! What error? An effort to track it down via the Acrobat Pre-flight PDF Syntax check was about as productive as using a match to investigate a gas leak:

A useful clue finally emerged when viewing the problem document via the Google Chrome web browser’s PDF display utility, which would cheerfully display all visible content in the document while apparently ignoring everything else, including whatever the mystery problem was, and the suspect page finally made its issue obvious. A line of “text” had been set which was, in reality, someone’s image data, rolling off-screen to the right in one long line of nonsense characters that was obviously not going to end for several megabytes, perhaps winding up on the monitor of a computer in a neighboring town.
I did a quick double-click on the visible portion of the offending gobbledygook, pasted the whole lump into a text editor, and looked up its length. Ah-ha! It was pushing 32,767 bytes, the maximum allowable length of a content stream string.
So, that was our culprit. I suspect it might have been even longer behind the scenes, within the document, but what I had picked up was enough to show that something largely unworkable had crept into play.
The next question was how it got there in the first place, and the answer turned out to be, essentially, that no one had told the application that it couldn’t do that. Being a toolkit, the Adobe PDF Library can give you the capability of carrying a reasonable operation to an unwanted conclusion – like building a battleship in your basement that looks magnificent, but won’t fit through the door. That was the case here, where a well-intentioned person had apparently tried to list some diagnostic messages about the content on the page. This was fine when printing data about a text object, but less so when printing data for an image whose size was clearly well in excess of 32K, thus breaking any later application that tried to parse the document.
The key to avoiding this kind of headache, whether an excessively long content stream or some other massive pileup of data, is to know your boundaries, either architectural (pertaining to your hardware, such as banging into the 32-bit integer limit in this case) or memory-related (i.e. whether your application will have enough memory to float your battleship), and those are outlined in a handy and frequently-overlooked piece of documentation: “Implementation Limits,” Appendix C of the PDF Reference Manual.
In three concise pages, Appendix C gives a general outline of the boundaries within which you can operate, and lists the actual numbers to be aware of for such things as largest and smallest integers or real values; limits for name length or nesting levels; maximum numbers of indirect objects, colorants or CID values; and more. They’re not very restrictive, but they’re there, and having those 3 pages of Appendix C printed out for quick reference when coding can give you a nicely robust application that knows its limits, enabling you to generate content that won’t hit the wall on its way out the door.
What problems have you run into when setting boundaries within your PDFs? Comment below, or contact us so we can help!

Share this post with your friends

Get instant access to the latest PDF news, tips and tricks!

Do you want monthly updates on the latest document technology trends?

By submitting the form, you agree to receive marketing emails from Datalogics. You may unsubscribe at any time. 

Like what you're reading?

Get Datalogics blogs sent right to your inbox!

By submitting the form, you agree to receive marketing emails from Datalogics. You may unsubscribe at any time.