We have some good news for developers who need PDF data extraction capabilities with JSON support – our PDF Alchemist tool now supports JSON outputs in our latest version (3.0)!
So, what does this mean for the end-user?
The major benefits of this new feature is that you can:
• Extract your data from PDF into a format that, compared with XML, is relatively lightweight and readable.
Here’s a look at how these new features work with PDF Alchemist:
Choosing JSON as output will identify and parse a variety of PDF data as detected by PDF Alchemist. The data is identified by type while retaining order of appearance in the document. As a result, data such as tables, lists, and paragraphs are identified and ready to be used by further processing.
The new JSON output option supports the existing data partition parameters in PDF Alchemist. A few examples are the “tablesOnly” option to extract only table data, setting the “reflowText” option to false to preserve line break information within paragraphs, and the “ocrMode” option to extract and identify image character data.
In addition to JSON output, PDF Alchemist now accepts XSLT Stylesheets via the xsltStylesheetPath Parameter. These stylesheets are applied to the XML output. Control the way PDF Alchemist writes output by providing your own custom stylesheet as input.
Ready to get started with PDF Alchemist 3.0 with JSON support? Request a free evaluation here. We also want to know how you will use PDF Alchemist for your JSON-specific projects! Leave us a comment to let us know what you’re working on. I also want to give thanks to the Datalogics Engineering department for their help with this post!