It was my privilege to represent Datalogics at the PDF Hackathon that our partner company Callas Software organized in Berlin on April 11 and 12. This was Callas’ second Hackathon; their first was focused on Callas’s new and impressive pdfChip technology, but this iteration was broadened in scope to any PDF-related
With most of the attendees being Callas customers, the topics they wanted to hack tended towards pre-press issues with Callas tools. But there was a small group of topics which caught my fancy, including an issue with converting Emails to PDFs causing Images to be split-up, a request for how to mask an image with a vector path, and an interesting request from a gentleman from a Finnish newspaper for how to extract an article from PDF(s) for a reprint service given an XML file that describes where the components of the article are on the PDF, all other ancillary material having already been destroyed as it would overwhelm their resources to archive the InDesign files generated on a daily basis.
My initial thought was to wonder if InDesign might possibly be including Article and Bead information (a PDF v1.1 feature; section 8.3.2 of the v1.7 PDF Reference) in the document when it generated the PDF, and if that could be used to extract the relevant article. Alas, a small bit of sleuthing revealed that this little-used PDF feature is seemingly not used by the product most likely to populate that information in a PDF.
Turning to the example XML file and examining the elements and attributes that it contained, we convinced ourselves, because it would be much easier for us if it were true, that its article coordinates were likely to be Desktop publishing points (1/72 in.), until I later noticed that the xgeometry coordinate in our sample XML file was beyond the right edge of our sample PDF. Oops. Our newspaper man then turned to InDesign to determine the position of the article on the page in points, and I turned to the XML coordinates to determine how to convert them to match those point-based coordinates.
At a lull in between, I took a small break to put together a quick DLE program to demonstrate how one masks an Image with a vector path, which I’ll discuss in a follow-up article.