Recreating Acrobat's Document Properties Description tab with APDFL

Note that several months ago, I wrote up a sample app which recreated Acrobat’s Font list in the Document Properties dialog, this is now part of a series where we use APDFL to extract or recreate the information contained in the Adobe Acrobat’s Document Properties tab.
The Description tab seems relatively straight forward, but there are a couple of gotchas and in some cases more than one way to extract the information with APDFL.Document Properties Description Tab
I’m going to skip going over how to extract File(name), Location and File Size as you don’t really need APDFL to get that information. But otherwise, let’s proceed from the top down in order, starting with the first four at the top:

Nothing too difficult here, I use PDDocGetInfoASText mainly because these fields could contain Unicode text and shoving it into an ASText variable makes it easier to handle, even if all I’m doing is converting it to UTF-8 for extraction purposes.
If you don’t need Unicode text extraction; for example, you are extracting date properties, the following also works:

Note that I could have parsed the date string and formatted per the current locale, but I’m lazy that’s outside the scope of APDFL per se.
For Application and PDF Producer, I’m pulling these properties directly out of the XMP metadata embedded in the file using the PDDocGetXAPMetadataProperty call…if the metadata stream is actually in the file.  The reason that you might want use this call instead of PDDocGetInfoASText is if you want to extract other metadata that PDDocGetInfo doesn’t know about; such as the PDF/A or PDF/UA flags.

Next up is checking the PDF version and the corresponding version of Acrobat that can open that file, and Adobe Extension levels (to handle the fact that the PDF format has been stuck at version 1.7 for the past decade waiting for Godot the ISO32000 committee to finalize PDF version 2.0.  Adobe snuck in a few new features into PDF by declaring them to be Adobe extensions.  The extension levels map to unofficial PDF versions. The code below matches Acrobat’s secret decoder ring:

A little known feature of the Description tab is that it will provide page size information about the current page.  While you could calculate this from the page CropBox, there are a couple of other factors that could come into play. In the code below, since we don’t have a current page, we’ll just grab the information from the first page.

Grabbing the number of pages is one call:

Determining if the document is a tagged PDF however is a bit more complicated as I didn’t find a good call or flag for determining if the document is tagged or not, so I had to drop to the Cos-level to find the information and it’s a slight bit more complicated than the PDF Reference makes it out to be, as it needs to both have a StructTreeRoot and a MarkInfo:Marked entry set to true in order for Acrobat to consider the document to be a tagged PDF:

Lastly, Fast Web View means that the document is linearized so that the first page that gets opened when viewing the document (which isn’t necessarily page 1) is at the very beginning of the file with all the necessary resources it needs to display; so that that page could be displayed while the rest of the document was slowly be downloaded over a 56kb Modem
And that’s that. Full code is available here.
Interested in trying Adobe PDF Library? Sign up for your free eval today!

Start Your FREE Trial

Share this post with your friends

Leave a Comment

Your email address will not be published.

Get instant access to the latest PDF news, tips and tricks!

Do you want monthly updates on the latest document technology trends?

By submitting the form, you agree to receive marketing emails from Datalogics. You may unsubscribe at any time. 

Like what you're reading?

Get Datalogics blogs sent right to your inbox!

By submitting the form, you agree to receive marketing emails from Datalogics. You may unsubscribe at any time.