Sample of the Week:
Most people are familiar with streaming audio and video files to their browsers and apps. Some people, like me for instance, are streaming music all day long and then watching Netflix all evening; we’re always streaming. Because you can start listening or watching immediately, streaming is the best way to experience web content that tends to be stored as large files on the server. What most people don’t know is that you can stream PDF files too… sort of. Because PDF is random access, it can be byte-served; it’s not exactly the same thing as streaming but it is a way to retrieve the bytes that you need in order to see a particular page in a PDF document without having to download every other page. Today, this is commonly referred to as “Fast Web View” but was originally called “Linearization”. The name sort of stuck in developer circles.
A “Linearized” PDF is organized slightly differently than a regular PDF and allows an application to display the content of the first page as soon as those bytes are available rather than forcing the user to wait until the entire document has been downloaded. This can be particularly useful for short documents that have a lot of resources (meaning large files), documents with lots of pages, and any document with even a few embedded fonts. The idea is that for any given file, regardless of the total number of pages, the user shouldn’t see any difference when they move from page to page within that document.
The “first page” doesn’t even need to be page one it could be any page in the file if you use the open parameters in the URL or set the initial view to an interior page. But there are other advantages to creating linearized files. For example, when the user navigates to another page, it will display as quickly as possible because the application knows which bytes to get. With the right viewer, even very large PDF files will perform well over slow connections because the page can display incrementally, showing the most useful data first. This is why sometimes you see a PDF page “snap” and suddenly look a lot cleaner; the embedded font arrived a few milliseconds after the text and images. And finally, for those of us who are impatient, like me, the viewer will accept user interaction, like clicking on a link, before the entire page has been displayed… or even been completely loaded.
Most of the popular PDF creation software that is marketed for that purpose, Adobe Acrobat for example but there are others, will create linearized PDF automatically and by default. However, most software that isn’t engineered specifically to create PDF but only export it, like Microsoft Word, Google Docs, and Open Office don’t create Linearized PDF. That’s understandable. But…
Unfortunately most PDF developer libraries and toolkits can’t create Linearized PDF files either… which brings me to my point.
PDFSaveOptions options = PDFSaveLinearOptions.newInstance();
Because creating a properly formatted Linearized PDF is non-trivial, Adobe made it simple for PDF developers. Like most classes in the Datalogics PDF Java Toolkit, the defaults do exactly what you need them to do. The PDFSaveOptions combined with the PDFSaveLinearOptions can be used on files that were created by the Datalogics PDF Java Toolkit as well as PDF files that were created in other applications and libraries. In both cases, the “linear save” operation, will rewrite the PDF document in a way that will provide more efficient incremental access over a network.