Skip to content
  • PDF Tools
    • Adobe® PDF Library
    • PDF SDKs
      • Adobe PDF Converter
      • Forms Extension
    • Specialized Utilities
      • PDF Checker
      • PDF Optimizer
      • FLIP2PDF
      • Forms Flattener
      • PDF2IMG
      • PDF2PRINT
      • Adobe Experience Reader Extensions
    • Adobe® InDesign Server
  • Developers
    • Support
    • Documentation
    • GitHub
    • Docker
    • NuGet
  • Solutions
    • Flatten Transparencies
    • Merge
    • Text Extraction
    • Watermark
    • Converting PDFs to PDF/A
    • Compression
    • Redactions
    • Extract Images
    • Text Search
  • Resources
    • Articles
    • Videos
  • Pricing
Menu
  • PDF Tools
    • Adobe® PDF Library
    • PDF SDKs
      • Adobe PDF Converter
      • Forms Extension
    • Specialized Utilities
      • PDF Checker
      • PDF Optimizer
      • FLIP2PDF
      • Forms Flattener
      • PDF2IMG
      • PDF2PRINT
      • Adobe Experience Reader Extensions
    • Adobe® InDesign Server
  • Developers
    • Support
    • Documentation
    • GitHub
    • Docker
    • NuGet
  • Solutions
    • Flatten Transparencies
    • Merge
    • Text Extraction
    • Watermark
    • Converting PDFs to PDF/A
    • Compression
    • Redactions
    • Extract Images
    • Text Search
  • Resources
    • Articles
    • Videos
  • Pricing
  • PDF Tools
    • Adobe® PDF Library
    • PDF SDKs
      • Adobe PDF Converter
      • Forms Extension
    • Specialized Utilities
      • PDF Checker
      • PDF Optimizer
      • FLIP2PDF
      • Forms Flattener
      • PDF2IMG
      • PDF2PRINT
      • Adobe Experience Reader Extensions
    • Adobe® InDesign Server
  • Developers
    • Support
    • Documentation
    • GitHub
    • Docker
    • NuGet
  • Solutions
    • Flatten Transparencies
    • Merge
    • Text Extraction
    • Watermark
    • Converting PDFs to PDF/A
    • Compression
    • Redactions
    • Extract Images
    • Text Search
  • Resources
    • Articles
    • Videos
  • Pricing

Cracking the Code: Adding OCR to a PDF

  • Datalogics Inc
  • March 16, 2023

Optical Character Recognition, or OCR, is the process that converts an image of text into a machine-readable text format. For example, if you scan a form or a receipt, your computer saves the scan as an image file, meaning you can’t use a text editor to edit, search, or count the words in the image. OCR converts the image into a text document with its contents stored as text data, therefore it can be edited and searched. 

One of the most common use cases for OCR is in preparing documents for searching or extracting the data into another process. By using OCR PDF APIs, the text data within these images is accessible without modifying the look of the input document. Let’s walk through some of the key components of our OCR API in the Adobe PDF Library using .NET. You can view the full code by visiting our public sample GitHub repository.

   OCRParams ocrParams = new OCRParams(); 

             ocrParams.PageSegmentationMode = PageSegmentationMode.Automatic; 

             ocrParams.Performance = Performance.BestAccuracy; 

 OCREngine ocrEngine = new OCREngine(ocrParams) 

Setting the PageSegmentationMode to Automatic lets the OCR engine choose how to segment the page for text detection. The Performance parameter allows for multiple levels of granularity when choosing speed vs performance. In this case, we are selecting the mode that will output the best accuracy. This is a common setting when you are unsure of the quality of your input document. The OCRParams will default to English; you’ll need to use the Languages parameter to select other languages. Multiple languages can be selected at the same time.

Once the OCREngine is configured, we can loop through the content of the document, identify the images, and apply the OCR processing:

   Element e = content.GetElement(index); 

             if (e is Datalogics.PDFL.Image) { 

                 Form form = engine.PlaceTextUnder((Image)e, doc); 

                 content.RemoveElement(index); 

                 content.AddElement(form, index -1); 

             } 

The image object is replaced by a form, which contains the original image and the identified text laid out behind it. Once this step is complete, the resulting document can be saved and it will contain the original content and the identified text.

As an added benefit, the .NET and Java interfaces support Dutch, English, French, German, Italian, Portuguese and Spanish languages, and with additional Chinese, Japanese and Korean languages to be added shortly. Try it out yourself by requesting a free trial, and feel free to take a look at our full sample code for Java and .NET (which includes how to start this process from an image rather than a PDF) under the OpticalCharacterRecognition section inside Sample_Source.

  • Tags: Adobe PDF Library

Share this post with your friends

Latest Updates from Datalogics

PDF Tips

How PDF Forms with Barcodes Improve Efficiency in Education Fields

Read More »
September 10, 2022
Adobe PDF Library

Powerful PDF Searching with DocTextFinder in APDFL

Read More »
August 31, 2022
Company News

Results Prove PDF Optimizer Maximizes Compression and Downsizes Competition

Read More »
July 28, 2022
PDF Solutions

Maximum PDF Compression with Minimal Effort

Read More »
August 20, 2022
Load More

PDF SDKs

  • Adobe PDF Library
  • GitHub
  • PDF Converter
  • Forms Extension
  • callas Tools
  • Adobe PDF Print Engine
  • Java Toolkit

Specialized Utilities

  • PDF Optimizer
  • PDF Checker
  • FLIP2PDF
  • PDF2IMG
  • Docker
  • PDF2PRINT
  • PDF Forms Flattener
  • Adobe Experience Reader Extensions

inDesign Server

Developers

  • Contact Support
  • Documentation
  • GitHub
  • Docker
  • NuGet

Solutions

  • Compress PDF Files
  • Converting PDFs to PDF/A
  • Flatten Transparencies
  • Merge PDFs
  • Redaction in PDFs
  • Text Extraction in PDFs
  • Text Search PDF
  • Extract Images from PDFs
  • Watermark PDFs

Resources

  • Education & Articles

Pricing & Licensing

Company

  • About Us
  • EULA
  • Terms & Conditions
  • Privacy Policy
  • Cookie Policy

Contact Us

Contact Support for:

  • Adobe Content Server
  • Adobe Reader Mobile SDK
  • DL Composer
  • PDF Alchemist
Linkedin Twitter Facebook-f Youtube Github