File compression

Community ForumCategory: PDF CheckerFile compression
Corey Staff asked 1 year ago
What does “suboptimal compression” mean when working with PDF Checker?
1 Answers
Corey Staff answered 1 year ago

A data stream in a PDF document can hold text, an image, or an object, with instructions on how the content will be rendered on the page. Data streams in a document can be compressed to make the PDF smaller and more portable. PDF Checker looks for data streams in the input document that are not compressed, or that are using a simple algorithm that is not as efficient in compression, such as ASCII, or Run Length, or LZW.

ASCII characters are encoded as 8 bits each, but strings of ASCII characters can be compressed to require fewer bytes for transmission or storage.

Run Length Encoding (RLE) is a simple method for compressing values that appear in the form of runs of data. A data run features a sequence of characters or binary digits where the same value appears many times, and often in long strings. A long string, or run, of the same character can be replaced with a shorthand description of those characters, thus saving storage space and making the resulting PDF document smaller.

Think of an image on a white background with a black square in the middle. Instead of representing a row in the image with 600 white pixels followed by 200 black pixels and then 600 more white pixels, this row of 1400 binary digits could be represented by the statement “600W200B600W” instead. As a result, 1400 characters are replaced with 12.

LZW, or Lempel-Ziv-Welch, is a universal data compression algorithm, once widely used with Unix platforms. This method appears in some old PDF documents but is rarely used now.