Issue:

If you are trying to configure your batch classes, so you can ensure correct OCR of your documents, you must be aware of the Quality, Compression and other plugins needed in Ephesoft.
This can also help prevent issues with limitation on some of the OCR engines used in Ephesoft, such as Color Images. There are known issues with the Recostar_HOCR_Plugin, where color enabled images have caused the following in Ephesoft:

– Errors creating HOCR.xml files
– HOCR.xml files not being created
– Recostar BatchImage process crashes
– Recostar BatchImage process hangs

Solution:

The following recommended settings will help prevent issues during the OCR process in Ephesoft. They will also provide the correct settings for obtaining good OCR results with RecoStar and Tesseract OCR Engines.

For Color Images you need the following:

– Documents need a minimum of 200 DPI
– LZW compression enabled for both Ghostscript and ImageMagick Parameters in the IMPORT_MULTIPAGE_FILES plugin
– Add Create_OCR_Input Plugin in the Page Processing Module
– Turn ON the Color Switch in the Recostar_HOCR Plugin in the Page Processing Module
Your Import settings should look like this:
800px-ColorImages
For Black and White Images:

– Documents need a minimum of 200 DPI
– Use -sCompression=lzw in the GhostScript Image Parameters in the IMPORT_MULTIPAGE_FILES plugin
– Use -compress Group4 for the IM Convert Output Image Parameters in the IMPORT_MULTIPAGE_FILES plugin
– Remove Create_OCR_Input Plugin from the Page Processing Module
– Turn Off Color Switch in the Recostar_HOCR Plugin in the Page Processing Module
Your Import settings should look like this:
800px-BWImages

Was this article helpful to you?

Walter Lee