PDF Alchemist

Recover editable text from PDFs

  • Intelligently display contents of PDFs on tablets and small-screen displays
  • Reconstruct source files
  • Improve searching and indexing of PDFs within document repositories

Easily convert PDFs to HTML

Datalogics PDF Alchemist is a new (C/C++) SDK for intelligently extracting text and images from PDFs and exporting to HTML 5 or EPUB. It employs sophisticated techniques to identify and reconstruct “text flows” within the PDF. These text flows are often lost in PDFs, and yet are vital for repurposing the information locked within the PDF.


  • Converts columns and pages back into single continuous text flow
  • Discards “page artifacts” such as running headers and footer
  • Output in HTML5 or in EPUB format
  • Font size and style detection
  • Text justification and indentation detection
  • Text flow margin detection
  • List detection and conversion into real HTML
  • Table detection and conversion into real HTML
  • Converts PDF bookmarks into clickable navigation links
  • Detection of internal and external URL links
  • Includes DLL/ shared library for integration into products and server workflows 
  • Free, fully-functioning evaluation versions available

Watch PDF Alchemist in Action

This short demo walks through using PDF Alchemist to convert a PDF with various formatting (images, tables, underline, etc) into HTML.



  • Windows 64-bit
  • Windows 32-bit
  • Linux 64-bit
  • MacOS 64-bit


For all other PDF Alchemist Developer Resources, visit our Developer Resources area.


Our toolkits can be licensed to software developers who embed the technology into their applications (OEM), as well as for enterprise customers looking to build applications for internal use. Technical Support and regular updates are provided via our Support and Maintenance program – this enables you to keep your application current and compatible with the latest versions of Acrobat/Reader and the PDF Specification as they are released.


Our customers have integrated our toolkits into a vast array of applications with a wide variety of deployment configurations. Pricing for our products is highly situation-dependent but generally includes an initial license fee and per-platform annual maintenance and support fees. In addition, royalties are incurred for those applications intended for sale. Some companies may also qualify for special small business pricing.

To discuss your specific situation with a Datalogics Sales Representative, please contact us directly.



All the information you need to make your decision to purchase Datalogics PDF Alchemist is here. Click on the icons to learn more.

Companies Use PDF Alchemist to:


Display PDF Content as reflowable text on mobile devices

In certain instances, such as proofing page layout of a PDF flyer or brochure, page fidelity is important. In other situations, proofing the content of a PDF is more important. Consider:

  • An Account Manager on the road needs to review the updated terms and conditions paragraph in a contract to send to a prospective customer. She really just needs to check the language of that one specific paragraph, and she’s viewing it on her phone. Having text which is flowable and resizable would make it easier for her to find and approve that text, as opposed to paging through a PDF file, and zooming and panning the text in question
  • An Executive on the road receives a financial report as a PDF on his phone, but he only wants to skip down to the “bottom line” numbers on the last page. Doing this in HTML, where text is reflowable and resizable, is much easier than in PDF

In addition to having easier access to content, generally the HTML content is smaller in size, which can lead to improved performance on mobile devices.


Recontruct source documents when the original was lost

In the business world, it’s not uncommon to “lose” source documents: an old product datasheet, an old report or white paper, etc. And sometimes there’s a need to update those: update terms and conditions on a contract, translate a white paper to another language, etc.

While touching up text in Acrobat is possible, it is not effective at larger edits. The output from PDF Alchemist can be easily loaded into a word processor or desktop publishing application for editing.


Address Accessibility requirements

Many organizations have requirements to provide documentation in an “accessible” manner; for example, to conform to US Section 508 Accessibility Guidelines. This often means delivering documentation in a format compatible with screen readers or other assistive technology; and the key to this is being able to identify “text flow” information (i.e., to programmatically “read” the text of a document in a logical order, like a human would).

PDF Alchemist can recover these text flows as reflowable HTML. Once the text flow information is recovered, it can be used to help create PDF/UA documents, or final form HTML output.