Below are the known issues when we are using Tesseract Plugin in our batch class:
- At a time we can only work with 2 languages in a Batch class with Tesseract Plugin.
- Arabic language is not properly recognised in Tesseract and usually gives error when ran through command line.
- There may be difference in OCRing and sometimes text may be recognised as some other character.
- Tesseract seems to have some issues to classify the characters properly when large number of languages are being used.
- Mandatory steps to perform OCRing for Chinese language:
- add chi_sim.traineddata and chi_tra.traineddata in tessdata folder
- specify tesseract language as chi_tra or chi_sim