What is Redaction?
Redaction is the removal of certain content from a document in order to distribute a copy with sensitive or classified information eliminated.
Typically, redacting documents has been related to government agencies and services. Most people are familiar with the concept. In the past, redaction involved simply taking a black pen to the printed page and manually crossing out text, or in some cases even using a pair of scissors to cut out words, phrases, or whole sentences. Fortunately, we have much better tools for doing this today. Datalogics offers PDF Java Toolkit (PDFJT) and the Adobe PDF Library (APDFL), which provide excellent redaction capability.
But before we jump into these, let’s go back in time.
Redaction in History
During World War II, the United States Military Postal Service provided soldiers and sailors serving in the war two sheets of postage-free Victory Mail, or V-Mail, a day. The volume was enormous. If 150,000 soldiers in France wanted to write to Mom or send greetings to their sweethearts in the States, their letters would have filled 37 mail bags and would have weighed over a ton. By 1944, over 11 million men and women in uniform were sending 70 million pieces of mail a week to the United States.
In response, the U.S. Military photographed every single letter and copied them as thumbnail images to microfilm. That reduced those 37 mail bags of letters home to a single bag, saving precious cargo space for war materials. When microfilm arrived in the States, the letters were printed on paper and delivered.
But, an important part of this process was control. The letters were not merely collected, processed, and reduced in size to improve efficiency (similar to what our PDF OPTIMIZER product does today). Every single letter written by an American in uniform overseas was read by a censor. Any stray comments about troop positions or movements, battle plans, military objectives, or anything else that might be useful to the enemy had to be removed, lest these facts find their way back across the Atlantic or Pacific.
These days, redaction is often used to edit digital files. If you are working with PDF documents, it is not enough to use an editor to draw a black line or black box over a few sentences in a PDF document and then save the file. The content is still there, underneath. This means anyone with access to the document can copy the text you “redacted,” paste it into another document, and read it there instead.
When a PDF document is redacted properly (as with PDFJT and APDFL), the sensitive information that you highlight is completely removed from the page. A black box appears in the place where the erased content used to be. Metadata from the document can also be permanently removed. This related process, called sanitization, can be used to remove objects added to the PDF document but are not a part of the document itself. For example, you could use sanitization to remove the name of the author and the date that the document was last updated. When the process is finished, it is impossible to recover the redacted characters and metadata.
Why Redaction Matters to You
Properly redacting documents is extremely important in a business or government setting to protect yourselves, your customers, and your assets. When an original document needs to be seen, but sensitive information needs to be kept hidden, redaction is an ideal solution.
Some common examples of sensitive information are:
- Account numbers
- Social Security numbers
- Legal information
- Financial records
When it comes to redaction, if it’s not done properly, it can lead to big problems. Take a look at these examples of high-profile redaction mistakes:
- In 2011, the British Ministry of Defense offered the world classified information about their nuclear submarine program. It was included in a document that the Ministry distributed and thought had been censored properly. It was not.
- In 2015, the Taxation and Revenue Department of the State of New Mexico released a scanned copy of an email message to several media outlets to show that one of their officials had not provided preferential treatment to a former client. In the email, the name of the taxpayer had been redacted, but if the image file used was brightened using a common graphics software tool, the name of the taxpayer appeared clearly.
Redacting PDF Documents with Datalogics
When it comes to security, we sometimes take it for granted. Though it’s not often a major concern in our day-to-day lives, we need to prioritize keeping our information protected. The advantage to using PDF Java Toolkit or Adobe PDF Library is that these are server based tools which can manage PDF documents programmatically. You are not limited to work with one PDF at a time. Instead, you can use the Toolkit or the Library to build a redaction process into your own application that can automatically redact defined content for dozens, hundreds, or even thousands of PDF documents.
Want to Learn More?
Contact us to discuss proper redaction, and how we can help ensure your content is protected.