Machine Learning doesn’t work when multiple languages are used.
KB Article # 22161
Topic/Category: Machine Learning
Applies to: 4.1 onwards
If a Batch Class is configured for multiple languages (within either RecoStar or Nuance), the Machine Learning plugin won’t work correctly.
Machine learning based extraction plugin expects dictionary file (containing stop words) for each language of the document being executed. Thus in one case, it needs 2 files, for English and Italian documents. As we are not shipping the Italian dictionary file, “Dictionary not found exception” is thrown at the time of learning and thus no extraction happens for subsequent batches.
Customer can manually create Italian dictionary file (it_stopWords.txt) under opt/Ephesoft/SharedFolders/BCXX/machine-learning-dictionaries/language-packs/* folder, learning and extraction would work in that case. But, if no (or wrong) stop words are specified in that file, wrong anchors may get learned causing low accuracy.