When it comes to programmatically searching a PDF, developers can now enjoy a boost when using our latest release of the Adobe PDF Library SDK (APDFL). Users may now employ regular expressions to search an entire document with the help of DocTextFinder, a new addition to the APDFL SDK on Windows, Mac, and Linux platforms.
DocTextFinder is a new feature that expands upon the existing WordFinder utility. As demonstrated by our TextSearch sample app, WordFinder allows users to supply a single search term and acquire a list of matching terms. Although WordFinder is handy for returning matches to precise searches, its coverage is limited to a single page per search. Moreover, because results must precisely match the input string, there is no versatility in the nature of results one can obtain in a single match list. These were a couple of limitations that we set out to surpass.
Through DocTextFinder, search capabilities are now expanded by way of regular expressions, or regex. This means that search results no longer must exactly match a single term. Instead, you can provide a search pattern and receive results that match this pattern on one or more pages. Those who are already familiar with regular expressions know how versatile searches can become, but here are just a few examples of the kind of data you could locate in PDF using regex:
- Email addresses
- Phone numbers
- ISBN identifiers
Using DocTextFinder works similarly to using WordFinder. Want to highlight all ISBNs in a document? Need to redact all email addresses past page 1? Simply iterate over the resultant DocTextFinderMatchList object and get the page number and position of each result to perform the necessary operation(s).
You can try DocTextFinder for yourself by running one of our new sample programs. RegexTextSearch includes several regex examples and highlights search results in an output PDF. AddRegexRedaction and RegexExtractText go one step further; AddRegexRedaction uses DocTextFinder to redact sensitive information in a document, while RegexExtractText exports the selected text to JSON.
Now, go harness the power of regular expressions for your project!