Cracking the Code: Splitting PDFs into Individual Pages with Adobe C++
Splitting a PDF into individual pages is one of the most common document processing operations. A batch of scanned records needs to become individual files for indexing. A multi-page report needs to be separated so each page can be routed to a different recipient. A print job needs individual page files for downstream imposition. The Adobe PDF Library C++ SDK handles all of these with a straightforward pattern: open the source document, then loop through its pages and copy each one into a new empty document.
This tutorial walks through SplitPDF.cpp from the apdfl-cplusplus-samples repository, which is effectively the inverse of MergeDocuments. The same PDDocInsertPages function powers both operations.
Who Is This For?
This tutorial is for C++ developers building document processing pipelines that need to disaggregate multi-page PDFs into individual page files. Common use cases include scanning and capture systems that produce batch PDFs and need single-page files for document management, financial services platforms processing statement batches where each page is a separate customer document, print workflows that require individual page files for imposition or variable data processing, and legal or healthcare systems that must route individual pages to different queues or recipients based on content.
How Splitting Works in the Adobe PDF Library
The Adobe PDF Library does not have a dedicated split function. Instead, splitting is achieved through PDDocInsertPages, which copies a specified range of pages from a source document into a target document. By creating a new empty document for each page and inserting exactly one page into it, you produce one output file per page. This is the same API used to merge documents; the difference is only in the direction of the operation.
Understanding this model makes the code easy to extend. Inserting two pages instead of one gives you a two-page output. Inserting a calculated range gives you chapter-sized chunks. The sample shown here is the single-page-per-file baseline that most splitting workflows build on.
The SplitPDF Sample: Step by Step
Step 1: Initialize the Library and Open the Source Document
Every Adobe PDF Library program begins with APDFLib initialization. This loads the library and its resources. The isValid() check must always be performed before proceeding, as a failed initialization will cause every subsequent PDF operation to crash rather than fail gracefully:
APDFLib libInit;
ASErrorCode errCode = 0;
if (libInit.isValid() == false) {
errCode = libInit.getInitError();
std::cout << "Initialization failed with code " << errCode << std::endl;
return libInit.getInitError();
}
With the library initialized, the source PDF is opened using APDFLDoc. The second argument (true) enables automatic repair of minor structural errors in the input file, which is recommended for any workflow that processes documents from external sources:
APDFLDoc document(csInputFileName.c_str(), true);
Step 2: Get the Page Count
PDDocGetNumPages returns the total number of pages in the source document. This drives the loop that produces one output file per page:
ASUns32 numInputPages = PDDocGetNumPages(document.getPDDoc());
The getPDDoc() call retrieves the underlying PDDoc handle from the APDFLDoc wrapper. Most of the lower-level Adobe PDF Library functions operate on PDDoc handles directly rather than on the APDFLDoc wrapper object.
Step 3: Loop Through Pages and Create Output Documents
For each page in the source document, the sample creates a new empty APDFLDoc, inserts the single page into it, generates an output filename using the page number, and saves:
for (ASUns32 page = 0; page < numInputPages; ++page) {
APDFLDoc outDoc;
PDDocInsertPages(outDoc.getPDDoc(), PDBeforeFirstPage,
document.getPDDoc(), page, 1,
PDInsertDoNotResolveInvalidStructureParentReferences,
NULL, NULL, NULL, NULL);
std::ostringstream ossFile;
ossFile << csOutputFilePrefix.c_str() << (page + 1) << ".pdf";
outDoc.saveDoc(ossFile.str().c_str());
}
Understanding PDDocInsertPages
PDDocInsertPages is the workhorse of this sample. Its parameters are worth understanding clearly because they control exactly what gets copied and how:
The first argument is the destination document handle -- the new empty outDoc in this case. The second argument, PDBeforeFirstPage, specifies where to insert: before the first page, which for an empty document means it becomes the only page. The third argument is the source document handle. The fourth argument is the zero-based index of the first page to copy -- the loop variable page. The fifth argument is the number of pages to copy, which is 1 here for single-page output.
The sixth argument, PDInsertDoNotResolveInvalidStructureParentReferences, controls how the library handles tagged PDF structure when copying pages. This flag tells the library not to attempt to repair structure parent references that cannot be resolved in the new document context, which is the safe default for splitting operations. Attempting to resolve invalid references can cause the insert to fail on documents with complex tagging.
The final four NULL arguments are optional callback parameters for progress reporting and cancellation. They can be left NULL for batch processing workflows where no user interface feedback is needed.
Output File Naming
The sample uses an output file prefix combined with the page number to generate each output filename. The default prefix is _b_, producing files named _b_1.pdf, _b_2.pdf, and so on. You can pass a custom prefix on the command line:
SplitPDF myinput.pdf invoice_page_
This would produce invoice_page_1.pdf, invoice_page_2.pdf, and so on. In a production system you would typically replace the numeric suffix with a meaningful identifier derived from the document content, such as a policy number extracted from the first page before splitting.
Note that page numbering in the output filenames starts at 1 (page + 1) even though the loop index is zero-based. This matches the convention most users expect when working with page numbers.
Error Handling
The insertion loop is wrapped in the standard Adobe PDF Library DURING/HANDLER/END_HANDLER block. This is the C++ APDFL error handling mechanism, which catches library-level errors that are signaled through APDFL's internal error system rather than C++ exceptions. If any page insertion fails, the error code is captured, the error is displayed using libInit.displayError(), and the program exits with that code:
DURING
// ... splitting loop ...
HANDLER
errCode = ERRORCODE;
libInit.displayError(errCode);
END_HANDLER
In a production pipeline you would typically want finer-grained error handling that catches failures on individual pages rather than aborting the entire job. Moving the DURING/HANDLER block inside the page loop and logging failures per page allows the rest of the document to be processed even when one page causes an error.
Expected Output
For an N-page input PDF, the sample produces N individual PDF files, each containing exactly one page. The output files are written to the current working directory unless the output prefix includes a path component. Each output file is a fully valid, self-contained PDF that can be opened independently in any PDF viewer.
The visual content of each page is preserved exactly. Fonts, images, annotations, and form fields on the copied page are all included in the output document. Links and bookmarks that reference other pages in the original document are not automatically updated in the single-page outputs, which is expected behavior for a split operation.
Extending the Sample
The single-page split is the most common use case, but the same pattern supports other splitting strategies by changing two values: the starting page index and the page count passed to PDDocInsertPages. To split into two-page pairs, change the loop increment to 2 and pass 2 as the page count. To split at specific page boundaries, build an array of split points and use those as the loop indices. To produce a specific page range as a single output document, run the loop once with the appropriate start index and count.
The MergeDocuments sample in the same repository shows the reverse operation using the same PDDocInsertPages call, which is useful for understanding the full insert API.
Next Steps
Review all of our Adobe C++ sample code in the Datalogics GitHub repository and request a free trial of Adobe PDF Library to run the samples in your own environment.