Intelligently Extracting Content From PDF Files

eBook_Products_Page_3Greetings from chilly Chicago! Datalogics was out in force at the 2016 CodeMash devloper conference earlier in the month – nothing better than an indoor waterpark in the middle of January in the midwest! While we were there we had a great series of conversations with a number of people about a whole variety of PDF topics. Looks like a lot of interest blooming in intelligent processing of content in PDF files.

We’ve put up on SlideShare the slide presentation I gave at CodeMash on intelligent content extraction from PDF files. It’s an overview of the various challenges inherent in extracting contents from PDFs in a form that’s usable for intelligent processes. For example, when indexing PDFs for search or performing sentiment analysis on PDF content; loading the meaning of PDF pages into databases for machine learning; or turning into reflowable forms for presentation on small screens. Every scenario for content extraction has its own needs and tradeoffs.

Interested in intelligent conversion of PDF into reflowable HTML or EPUB? Check out Datalogics PDF Alchemist. Or, if you want to access the content of PDFs from a deeper level, the Adobe PDF Library or Datalogics PDF Java Toolkit may be just the toolkit you need!

Share this post with your friends

Share on facebook
Share on twitter
Share on linkedin

Leave a Comment

Your email address will not be published. Required fields are marked *