When diagnosing PDF text issues, I’ve always found Acrobat’s Fonts tab of its Document Properties dialog window to be useful, so I’ve wanted to reproduce that functionality with APDFL’s APIs for a while now.
The task is a bit more complicated than it might seem as PDF essentially supports six different-but-related types of fonts and the mapping between what Acrobat’s Font list lists and the APDFL APIs would return could only be found, until now, by consulting the Celestial Emporium of Benevolent Knowledge.
Ignoring the substitution font (‘Actual Font’) information for when a font is not embedded, the Fonts tab listing provides four pieces of information: a font’s name, whether the font is embedded or not, the font’s type, and the font’s encoding.
There is a PDFontGetName() function, but since font names can be specified using Unicode, it makes more sense to use PDFontGetASTextName() and extract the ASText value as a UTF8 string.
Determining whether a font is embedded or not can be done simply enough with the PDFontIsEmbedded() function.
Determining a font’s type starts with calling PDFontGetSubtype(). But if the font is a composite font (a Type0) font, then you need to call PDFontGetSubtype() on the descendant font to find which (sub-)type of composite font the font is.
Lastly, while you might think that the name of a font’s encoding could be had by calling a function like PDFontGetEncodingName(), you would be wrong, mostly. You can extract a font’s encoding by reading the Encoding value from a font’s CosDictionary, but that entry is optional for simple fonts, so a better place to start is with PDFontGetEncodingIndex().
Assembling all of this into a callback function called from PDDocEnumFonts(), we get a listing of fonts which would mostly match Acrobat’s font list if it were sorted in alphabetical order.
The code for all of this can be found here.