Note that several months ago, I wrote up a sample app which recreated Acrobat’s Font list in the Document Properties dialog, this is now part of a series where we use APDFL to extract or recreate the information contained in the Adobe Acrobat’s Document Properties tab.
The Description tab seems relatively straight forward, but there are a couple of gotchas and in some cases more than one way to extract the information with APDFL.
I’m going to skip going over how to extract File(name), Location and File Size as you don’t really need APDFL to get that information. But otherwise, let’s proceed from the top down in order, starting with the first four at the top:
Nothing too difficult here, I use PDDocGetInfoASText mainly because these fields could contain Unicode text and shoving it into an ASText variable makes it easier to handle, even if all I’m doing is converting it to UTF-8 for extraction purposes.
If you don’t need Unicode text extraction; for example, you are extracting date properties, the following also works:
Note that I could have parsed the date string and formatted per the current locale, but
For Application and PDF Producer, I’m pulling these properties directly out of the XMP metadata embedded in the file using the PDDocGetXAPMetadataProperty call…if the metadata stream is actually in the file. The reason that you might want use this call instead of PDDocGetInfoASText is if you want to extract other metadata that PDDocGetInfo doesn’t know about; such as the PDF/A or PDF/UA flags.
Next up is checking the PDF version and the corresponding version of Acrobat that can open that file, and Adobe Extension levels (to handle the fact that the PDF format has been stuck at version 1.7 for the past decade waiting for
A little known feature of the Description tab is that it will provide page size information about the current page. While you could calculate this from the page CropBox, there are a couple of other factors that could come into play. In the code below, since we don’t have a current page, we’ll just grab the information from the first page.
Grabbing the number of pages is one call:
Determining if the document is a tagged PDF however is a bit more complicated as I didn’t find a good call or flag for determining if the document is tagged or not, so I had to drop to the Cos-level to find the information and it’s a slight bit more complicated than the PDF Reference makes it out to be, as it needs to both have a StructTreeRoot and a MarkInfo:Marked entry set to true in order for Acrobat to consider the document to be a tagged PDF:
Lastly, Fast Web View means that the document is linearized so that the first page that gets opened when viewing the document (which isn’t necessarily page 1) is at the very beginning of the file with all the necessary resources it needs to display; so that that page could be displayed while the rest of the document was slowly be downloaded over a 56kb Modem
And that’s that. Full code is available here.
Interested in trying Adobe PDF Library? Sign up for your free eval today!