Sample of the Week:
For many web developers, PDF Forms are a bit of a mystery and best avoided outside of their Defence Against the Dark Arts classes. While PDF Forms are incredibly powerful, the beauty and elegance of the format only reveals itself once you’ve peeled back the layers and gotten to the core… so let’s get started.
As I’ve mentioned in the past, PDF forms come in two basic varieties; the ones from prior to Adobe’s acquisition of Accellio (JetForm)… and the ones that came after… in more common terms that’s AcroForms and XFA, respectively. This article is limited to the classic AcroForm type of PDF form.
As I mention above, if you’re a developer and have experience with HTML forms, PDF Forms can be somewhat mysterious. At the time of this writing, HTML5 has 22 different types of input elements; PDF has 4. Now, this might lead one to believe that PDF is impoverished compared to HTML but that’s not exactly the case; you can build a lot of truly elegant stuff using just 4 building blocks. There are three key aspects of the PDF specification that developers need to understand to fully appreciate PDF Forms and work with them effectively. The viewer or API that the developer is using must also process the PDF file with respect for these three aspects or the user may not see what they should be seeing when they open the form. Luckily, the Datalogics PDF Java Toolkit makes accessing these aspects of the format simple and easy. To a certain degree, it also cuts through some of the confusion that the Acrobat form authoring user interface might be causing.
Fields vs Widgets:
Unlike HTML, a PDF Field isn’t actually the thing that you type your data into on the page. A field is an object in a dictionary that belongs to the entire document; not an individual page. Each field has name, and generally occurs exactly once. Fields can also be hierarchical using a period as the separator. The thing on the page that you type into is a “Widget Annotation” or just “widget.” A widget that refers to the same field can appear on any number of pages throughout a document and even multiple times on the same page but it always has the same value. This allows a form developer to place a field called “name” on every page of the document but a user only needs to enter their name once in order to have it appear everywhere. Here comes the fun part… the value entered is stored in the field but the various widgets can have different appearances. The value of a field is completely separate from how that value is presented… appears… on the page; this lets you do some pretty amazing things.
Values vs. Appearances:
While Adobe Acrobat allows a user to place 8 different types of form fields on a page, there are really only four; Text fields, Button fields, Choice fields, and Signature fields. Acrobat uses appearances to differentiate these four into the 8 that people are most familiar with. When a user creates a new checkbox, Acrobat automatically generates a button field, sets the button type for the widget, creates the appearances necessary for the on state and the off state and then assigns the proper appearance based on the field value. The FormFieldManager interface in the Datalogics PDF Java Toolkit does the same thing for developers; a single line of code will create the field, create a default set of appearances and set the value. The code below is from the FormFieldServiceSample which demonstrates how to add new AcroForm fields to a PDF document via the FormFieldService API making creating forms easier.
formFieldManager.addCheckBox("chk1", true, PDFRectangle.newInstance(pdfDoc, 400, 400, 450, 450), page, null); formFieldManager.addCheckBox("chk2", false, PDFRectangle.newInstance(pdfDoc, 500, 400, 550, 450), page, null);
The best example that demonstrates the difference between a value and an appearance is the Barcode field. The value of a barcode on an AcroForm can be presented as one of three different barcode types but the underlying value is the same regardless of how it’s presented on the page. This is because barcode fields are really just text fields with a special appearance that Acrobat creates automatically using it’s built in barcode generator. The Datalogics PDF Java Toolkit also has a built-in barcode generator and can regenerate a barcode on a PDF form based on new data that may have been added programmatically.
At first this architecture can appear to be overly complicated. It isn’t. The “P” in PDF stands for “Portable” and portability is exactly what this architecture provides. By separating the field values and their appearances, a PDF viewer that doesn’t understand what interactive fields are can still display the form properly. This is particularly important for PDF viewers on mobile devices that are often times less capable than their desktop counterparts or high-speed printers that can accept PDF files directly. As long as a fully capable PDF tool like Acrobat or the Datalogics PDF Java Toolkit was the last tool to modify the form, virtually any PDF viewer… even the worst of them… can display the form as the author intended.