Sample of the Week:
As I mentioned last week, Adobe Acrobat and Reader are able to display PDF files that contain features that are in the PDF specification but that can’t be authored using Acrobat. In this week’s article for the “Beyond Acrobat” series, I show how the hit area for a link doesn’t necessarily need to be the same as the rectangle that defines the location of the link annotation.
Consider the image below… let’s say that you have some text on a 45 degree angle and you want a link to be on top of that text… and only on top of that text (the blue outlined area). Using Adobe Acrobat, you can only define the link as a rectangle where the sides are all parallel and perpendicular to the page edges (the red outlined area). In fact, the PDF specification requires that the mandatory Rect key for an annotation is defined in this way. However, link annotations can also have a QuadPoints key. The QuadPoints are an array of 8 numbers specifying the coordinates of the quadrilateral (quad) that defines the region within the boundary of the Rect in which the link should be activated. adding the QuadPoints key lets you define the hit area of the link to be on an angle… just like the text.
All words found by the TextExtractor will have an array of bounding quads associated with them. Most words where all the characters have the same baseline will only have one quad. Words that are written on a curve will have multiple quads, typically, one for each character.
In the Gist referenced below, we’ll actually use the QuadPoints of the word to create the QuadPoints for the link making adding links to a set of words very simple when using the Datalogics PDF Java Toolkit.
After reading in the input file, we need to create a text extractor object which provides the functionality to extract text from the content streams of a PDF document so we can then iterate over the words. It needs the font set in order to understand what characters are in the stream.
There’s only one word in the input file (the text in the header was created by converting the font to outlines in Illustrator) but the code uses an iterator anyway just as an example of how to get at the text in reading order. Once we have a word and we know that it’s not a space character, we iterate over the bounding quads. Again, in this case, there is only one but there may be more so it’s best to iterate over those as well.
Now that we have the bounding quad for the word, we can easily use it to set the QuadPoints for the link annotation. However, the PDF specification requires that we also set the Rect key for the annotation… and the rectangle it defines must completely encompass the QuadPoints or they’ll will be ignored by the viewer. The easiest way that I’ve found to discover the lower, left and upper, right corners of the rectangle that bounds the QuadPints is to create separate arrays for the x and y coordinates, sort them, and then grab the first element and last element of each array to create two x,y pairs.
Finally we save the file. Open the file in Adobe Acrobat and try rolling over and around the text, you’ll see that the link is only active when it is over the text itself rather than the entire link. If you then edit the link, you can visually see that the rectangle for the annotation is in blue and the hit area is limited to the black rectangle.
This is just another example of how the Datalogics PDF Java Toolkit provides developers access to even the more esoteric and hidden features of the PDF specification but are supported for viewing in Acrobat and Reader.