The easiest way to create a PDF/A document from an existing PDF document is to feed it through a PDF/A converter like the PDFProcessor plugin that ships with Adobe PDF Library (APDFL). If the output comes out looking essentially the same as what the input looks like and it validates as a PDF/A document, then you are done. But that’s not the case with all documents.
Some PDF documents are created such that PDF/A conversion requires little more than adding the appropriate PDF/A headers to the document’s metadata, other documents require a bit of nip and tuck to make them PDF/A compliant, while others come out of the process looking like plastic surgery gone wrong (ED: link intentionally not provided).
For the lower bar of PDF/A compliance (e.g. PDF/A-1b and PDF/A-2b), creating a document that converts easily to PDF/A essentially means avoiding forbidden features and providing the document with the resources necessary so that the document can still be viewed and used as intended in a far distant future when today’s PDF viewers and operating systems are long forgotten. Hey, it could happen; they finally retired Windows XP, didn’t they?
As an academic exercise, rather than relying on the PDF/A converter to do it all within its black box, let’s create our own PDF/A document from scratch, starting from the helowrld sample that has shipped with APDFL Since Forever (ED: Nope, not linking to the band).
The first change to helowrld we are going to make is to embed the fonts we use:
A PDF that doesn’t embed its fonts is dependent on having the right mix of system fonts available in order to render correctly. This can be a problem in the here and now, much less the distant future, if a document that relies on the fonts common on a Windows system is rendered on a Unix system that lacks those fonts.
The next change, if done differently could have abnegated the need for the third change.
Okay, technically, all I’ve done here compared to the original helowrld code is change the fill color from deviceGray black to deviceCMYK blue, but if I had changed the colorspace to an ICCBased colorspace, and used ICCBased colors throughout my document, then there would be no need to embed a color profile for the Output Intent:
This is a two-step process because the heart of the embedOutputIntent() routine is APDFL’s PDDocColorConvertEmbedOutputIntent() call.
And PDDocColorConvertEmbedOutputIntent() does not create a PDF/A Output Intent, but a PDF/X Output Intent. The difference is slight, and it can be rectified by the switchOutpuIntentsToPDFA() routine also included in this code, but in case you want to preserve the PDF/X Output Intent, the copyPDFXOutputIntentsToPDFA() routine will allow you to do so:
A final step is to update the xmp metadata headers with the PDF/A identifiers.
Note that I set the PDF/A identifiers to PDF/A-2B; “B” because there is no document structure in this hello-world PDF, and “2” because then I didn’t have to worry about changing the minor version number to “4” and making sure that I wasn’t using any PDF features introduced since PDF version 1.4.
The full code is here.