Compliance Archiving: How to Convert PDFs to PDF/A at Scale

Compliance Archiving: How to Convert PDFs to PDF/A at Scale

Published April 15, 2026

Compliance archiving requirements are rarely optional. Healthcare organizations must retain patient records in formats that can be reproduced decades from now. Financial services firms must archive client communications and transaction records to standards that withstand regulatory examination. Government agencies must preserve public records in formats that survive software platform changes. In all of these cases, the standard most commonly required is PDF/A.

The challenge for most organizations is not understanding what PDF/A requires. The challenge is getting there at scale. If you have a repository of 500,000 existing PDFs that need to be converted to compliance, or an ongoing pipeline producing thousands of new PDFs per month that must be archived in compliant format, manual conversion is not a realistic option.

This post covers what PDF/A requires, how PDF Optimizer handles bulk conversion, and how to configure the process for your specific compliance scenario.

What PDF/A Actually Requires

PDF/A (ISO 19005) is a constrained version of the PDF specification designed for long-term preservation. Its requirements exist to ensure that a document can be rendered identically on any compliant viewer at any point in the future, without depending on external resources, platform-specific features, or software that may not exist in 20 years.

The core requirements are: all fonts must be fully embedded in the document, color spaces must be explicitly defined (no reliance on device color settings), encryption and password protection are not permitted, JavaScript and executable content are prohibited, and XFA (XML Forms Architecture) dynamic forms are not allowed. Interactive elements that cannot be reliably reproduced without specific software must be removed or flattened.

What this means in practice is that many PDFs generated by standard enterprise software, whether document management systems, report generators, or form processing tools, will not be PDF/A compliant out of the box. They may reference external fonts, use device-dependent color spaces, contain JavaScript for form interactions, or include metadata and embedded objects that violate the standard. Producing compliant output requires an explicit conversion step.

PDF/A-1b vs. PDF/A-3u: Choosing the Right Conformance Level

PDF/A comes in multiple conformance levels, and choosing the right one for your workflow matters.

PDF/A-1b is the most widely supported conformance level and the right starting point for most organizations. It requires visual reproducibility: the document must look the same on any compliant viewer. It does not require Unicode text mapping, which means text in the document may not be machine-readable or searchable after conversion. For workflows where the archived document is treated as a visual record, PDF/A-1b is sufficient.

PDF/A-3u requires both visual reproducibility and Unicode character mapping for all text in the document. This means text in the archived document can be searched, extracted, and processed by downstream systems. PDF/A-3u also allows any file type to be embedded as an attachment within the PDF, making it suitable for workflows that need to carry source data, XML exports, or supplementary files alongside the visual document.

If your compliance requirement involves downstream text extraction, full-text search across archived documents, or the need to embed related files within the PDF, use PDF/A-3u. For straightforward visual archiving where machine-readable text is not required, PDF/A-1b is simpler and more broadly compatible.

PDF Optimizer supports both conformance levels. You specify the target in your JSON profile and PDF Optimizer handles the conversion.

Configuring PDF Optimizer for PDF/A Conversion

PDF Optimizer converts documents to PDF/A as part of a single optimization pass. You configure the target compliance level in the JSON profile, alongside any other optimization operations you want to apply simultaneously, and PDF Optimizer produces compliant output in one step.

A typical PDF/A conversion profile combines compliance conversion with size reduction operations that are compatible with the standard: font subsetting (which reduces font payload while maintaining full embedding), removal of disallowed elements such as JavaScript and XFA, stripping of unnecessary metadata and embedded thumbnails, and color space normalization to ensure all color data is explicitly defined rather than device-dependent.

The profile is defined once and applied consistently to every document in the conversion batch. There is no per-document manual intervention. The same profile that converts document one will convert document one million with identical settings.

After each batch run, PDF Optimizer produces a results report that logs the outcome for every processed document: whether conversion succeeded, which operations were applied, input and output file sizes, and any warnings or errors encountered. For compliance workflows, this report serves as the audit log for the conversion process.

Pre-Conversion Validation with PDF Checker

Not all PDFs convert cleanly to PDF/A on the first pass. Documents with severely damaged structure, missing font data, or deeply embedded non-compliant elements may require remediation before conversion succeeds. Running PDF Checker before PDF Optimizer identifies these documents before they reach the conversion step, routing problematic files to a review queue rather than allowing them to fail silently in the pipeline.

PDF Checker is included free with every PDF Optimizer purchase. In a compliance archival pipeline, the recommended architecture is: PDF Checker validation, then PDF Optimizer conversion, then archive storage of the output and the conversion log. Documents that fail PDF Checker validation are flagged for manual review before re-entry into the pipeline.

Industry Contexts

Healthcare organizations archiving patient records under HIPAA and related regulations commonly require PDF/A-1b or PDF/A-3u for electronic health records retained over multi-decade timeframes. The PDF/A requirement ensures that records produced today can be rendered accurately on systems that do not yet exist.

Financial services firms subject to SEC Rule 17a-4, FINRA recordkeeping requirements, or equivalent regulations in other jurisdictions require archived documents to be in non-rewritable, reproducible formats. PDF/A satisfies the reproducibility requirement when combined with appropriate storage controls.

Government agencies at federal, state, and local levels commonly mandate PDF/A for public records. The U.S. National Archives and Records Administration has specifically recommended PDF/A for permanent records. Similar guidance exists from national archives and records authorities in other countries.

For all of these contexts, the key is that PDF/A conversion must happen systematically and verifiably, not on an ad hoc basis. An automated pipeline with full audit logging is the only approach that scales.

Getting Started

PDF Optimizer free trial is available for Windows and Linux, runs as a command-line tool, and is suitable for integration into any document processing pipeline.

If you are managing a compliance archiving requirement, the next step is straightforward: define your target conformance level, configure a JSON profile, and run a sample batch of your existing documents through the trial. The results report will tell you what the conversion process produces and flag any documents that require remediation before full-scale processing begins.

For compliance projects with specific deadlines or regulatory complexity, Datalogics technical staff are available to support the evaluation and implementation process. Use the Talk to a Developer option on the product page to get connected.