Sample of the Week:
July is “Back to Basics” month here at the “Sample of the Week.” For the next few posts, I’ll be discussing some of the basic PDF manipulations that are easy to do in Adobe Acrobat but don’t have an exact corollary in the Datalogics PDF Java Toolkit. The first of these articles reproduces the Replace Pages feature in Adobe Acrobat.
In my other life, I create a lot of PDF forms and other types of interactive PDF files. Often times I need to change the content of the form but I don’t want to lose any of the fields that I’ve already added to the page. Fortunately, I can use Acrobat’s “Replace Pages” feature to slip new page content… the artwork… under the existing form fields, buttons, links. This is because form fields, comments, links, movies, 3D models… all of the dynamic elements of a PDF file actually sit on top of the PDF page content in a separate “plane” and are held in an “Annotation” dictionary for that page. Acrobat allows you to swap out the page content while leaving the Annotation dictionary unchanged.
“Replace Pages” is one of the most basic functions of Adobe Acrobat and it’s been in the product since version one. It’s really simple. Right?… Wrong.
If your page has subset and embedded fonts, layers (OCGs), or other resources that might conflict with resources in the document rest of the document, replacing one page with another requires some delicate PDF surgery. It’s not as trivial is it might seem.
Currently, the Datalogics PDF Java Toolkit has no direct “Replace Pages” corollary; you can’t replace a page in one document with pages from another document. To replace one page’s content with another requires that the pages have the same parent document; they must be in the same document for the resources to be migrated properly.
Fortunately, once the pages are merged into the same document, it’s very easy to grab what you need from the one, add it to the other, and then delete the original.
To demonstrate this in the Gist referenced below, I update a 2014 version of a form to the 2015 version, leaving the forms fields and their values in place.
The PMMService allows developers to insert pages from a source document into a target and ensure that all the bits and pieces carry over correctly. Then to get the form fields, really the entire annotation dictionary, from the original page and add them to the replacement page, it just a simple matter of overwriting one with the other.
Now here’s the trick… and why I wrote this Gist… at this point you have two pages pointing at the same set of annotations. From a PDF perspective, that’s allowed… but that’s not what we want. Before deleting the original page, we need to remove the annotations from that page to break the connection. Then we can safely delete the original page without deleting the annotations on both.
Once the original page is deleted, we are left with a document that contains the original set of form fields but new page content; effectively duplicating the “Replace Pages” feature in Acrobat.
You can see the Gist for Replace Pages here