Affected Components:

Before 4.5

 

Issue Description:

There may be sometimes differences observed in results extracted in test extraction results vs results extracted running batch as a Batch Instance through upload functionality. This is mostly observed in cases where your workflows in your batch class have different configurations set for PDF conversion and TIFF enhancement.

 

Root Cause:

In versions earlier than 4.5 it is observed that test extraction results doesn’t follow the PDF conversion & TIFF enhancement tools which are set in batch class plugins (workflow) due to which different size and dimensions of files may be generated causing differences in extraction result.

The difference is seen in below plugins for which test extraction doesn’t follow plugin settings:

  • In Test Extraction Ghostscript is used for PDF to TIFF Conversion.
  • In Test Extraction ImageMagick is used for TIFF to TIFF Enhancement / Optimization.
  • Color switch is not considered in Test Extraction for Recostar HOCR Plugin.

So if you are using different settings in your plugins then you can see differences in the results.

 

Solution:

The above is fixed in 4.5 but in case you do not want to upgrade 4.5 then below is the workaround you can follow:

  • Use Ghostscript rather than Recostar in Import multipage file module.
  • Use ImageMagick for TIFF conversion rather than GraphicsMagick or LibTiff.
  • Keep color switch to OFF in Recostar HOCR Plugin.

Doing the above you will see similarities in the test extraction results vs upload batch results.

Was this article helpful to you?

Abhishek Jain