At Last Month’s PDF Camp, I gave a mini-presentation on different tools that I use for diagnosing PDF problems. One of the tools I highlighted was our PDFObjectExplorer sample app; the version that ships with the Adobe PDF Library. However, while putting together the presentation, I was also tweaking PDFObjectExplorer to be a bit more useful.
In my presentation, I pointed out that Acrobat has a very nice alternative to PDFObjectExplorer built into its preflight tool. However, it has some shortcomings in that there are some types of PDF documents for which Acrobat will not allow you access to preflight (namely PDF Collections). There are times when it describes the document as updated by Acrobat rather than by what is truly contained in the file itself (e.g. when Acrobat regenerates annotation appearances).
For those cases and others, it can be useful to have a PDF Object Explorer which is not so closely tied to Acrobat. However, the PDFObjectExplorer sample has its own shortcomings too. One of which is that it is too easy to lose your bearings navigating the page tree looking for the page that you want. A good way around that would be to display a rendering of a Page when you select a page dictionary. And while we’re at it, how about rendering individual images or individual XObjects?
So let’s start by adding a ‘rendered’ tab which would show the image of the page or XObject. The key difference is that our tab needs to show an image rather than text.
After that it’s mainly a matter of duplicating the tab code; judicious copy and paste mostly.
Now how do we render to a Page or XObject to it? Let’s start by rendering pages since the DotNet interface has some nice facilities for rendering pages. In part, we need to identify when we’ve selected a Page dictionary. After that, we need to be able to figure out what page number that page object corresponds to.
My initial stab at this was to brute-force iterate through the pages comparing its PDFDict.ID against the indirect Object ID of the selected object until a match is found. This works…but inefficiently for any document with more than a few pages. Rather than doing this every time a page dictionary is selected, it makes more sense to iterate through all the pages once and cache the page number using the page dictionary’s ID so that you can do an O(1) lookup for any page dictionary that’s subsequently selected.
Once, you have that page number lookup, the rendering of the page becomes straightforward:
Detecting XObjects is fairly similar, starting in the same function:
But XObjects are a little bit trickier to render. You essentially need to pull them out of the source document into their own temporary document. Between Form XObjects and Image XObjects, the steps are essentially the same, but the latter needs to be scaled to a reasonable size when inserted into a new page:
And with the former, you don’t scale it per se, but you do get the Form XObject’s bounding box to set the page’s size appropriately.
There are other improvements that could be done to PDFObjectExplorer; including, per its original author, re-architecture it to have a more standard MVC design so that you can add new features more easily. But this is a start.
What other features would you like to see implemented in PDFObjectExplorer?