Solutions

Content Extraction

Content extraction capabilities can be found in the Adobe PDF Library SDK. APIs enable you to:

  • extract text for indexing
  • extract text style, position information, and encoding
  • extract the visible words on the page (i.e., words on the specified page that are visible in the given optional-content context)
  • extract content and information about annotations, including links and bookmarks
  • extract form (AcroForm) field data, both content and field descriptor metadata
  • extract metadata such as Title, Subject, and other document-level attributes
  • extract images and convert to image formats including PNG, JPEG, TIFF, etc.
  • convert images to CMYK, RGB or Grayscale

 

Learn more about the Adobe PDF Library SDK.