I am testing pdfalchemist.exe to extract text information from a PDF file that contains tables. The convertion to html just writes the table as an image; the convertion to xml showed the text from the table, but:
- it misses the line breaks;
- it misses the columns alignments when there are empty columns.
Is there a way to address these two issues?
Please review the documentation for the -purpose parameter. Note that HTML and EPUB default to “balanced” which may write tables as an image for a better original appearance. XML output uses “indexing” as a default and will preserve text for searching/indexing workflows but the output might differ significantly from the PDF appearance.
Please login or Register to submit your answer