Creating a Table of Contents from Bookmarks with DLE

A couple of years ago, I put together a sample app for creating a Table of Contents from Bookmarks which I’ve revisited and revised on occasion based on feedback from customers, so I thought I’d share the latest iteration of this code a bit more broadly.  Also, I came across the perfect input Document to demonstrate it, this weekend, during my annual  Choose-Your-Own-Adventure for Adults weekend, when I dive into this text and come out at the other end with either a penalty or a refund; it’s a high-stakes game, or at least that’s what I tell myself to get through it.
Anyway, the Table of Contents for the 1040 general instructions is a little bit parsimonious:
Must be because of the Paperwork Reduction Act. Fortunately, the PDF has plenty of bookmarks so we can generate a rather expanded version of this Table of Contents:
And so on for 5 more pages.  That actually brings up one of the first technical challenges of creating a Table of Contents that points to the correct pages: You need to know how many pages you are going to be inserting into the document for the table of contents and insert them first so that your page numbers don’t shift by one or more when you discover that your Table of Contents is going to take up more than one page.

Before that, a preliminary challenge is to flatten the bookmark tree into an in-order list of bookmarks:

Of course, we still need the bookmark tree structure to determine the indent level.  We determine the indent-level by counting the parent nodes from the bookmark node back up to the root node.  This code has a builtin table of indent-levels but the bookmark depth can be arbitrary so once we run out of predefined indent-levels, then we increase the indent-level by the last entry in the indentlevels array ad infinitum (but not really; the algorithm would break if the depth was too great)

The practiceLayout is going to be similar to the layoutTOC routine but since we won’t be committing anything to page just yet, it can be simpler.

where the layoutTOC complicates itself is by adding dots between the end of the bookmark title and the right-justified page label:

Also, for the second pass, we add one Text element per bookmark node so that we can use its boundingBox to add a link to the bookmark destination and so close the loop.

There are at least few ways that this code could be further extended. One of which would be to add the Table of Contents after an arbitrary set of pages. Another would be to pass in an array of Rects so that you could do an IRS-style two-column Table of Contents. A third would be to decorate the Table of Contents pages in some way…perhaps adding a page header that uses the Document Title. Or perhaps specifying a special page label range for the table of contents and creating a page footer that uses the page label string for the page number. I’m sure that there are other possibilities I haven’t thought of. Let me know if you think of a good one.
The full code is here.

Share this post with your friends

Get instant access to the latest PDF news, tips and tricks!

Do you want monthly updates on the latest document technology trends?

By submitting the form, you agree to receive marketing emails from Datalogics. You may unsubscribe at any time. 

Like what you're reading?

Get Datalogics blogs sent right to your inbox!

By submitting the form, you agree to receive marketing emails from Datalogics. You may unsubscribe at any time.