Difference between working of normal classification vs Predefined Document classification vs Unknown Document Type classification
Below is the Explanation on how the classification works:
The confidence scores of the document types needs to be defined after various test performed, If you are observing similar confidence generated for the documents when performing test classification then you should be configuring the confidence score for document types accordingly , However please note that this is not a one time value that is defined, the testing needs to be performed regularly in case you have some new documents coming in the input file and accordingly this value needs to be changed.
Normal Working: When the document goes under the classification process, it will try to match with the learned files of all the document types and will give the best confidence result obtained accordingly and will assign your document to that particular document type.
Predefined Document Type Working: If you have predefined document type configured in your document assembler plugin, the document first gets matched with the learned files of all the document types and will be assigned the document type with highest confidence score. Now this confidence score will be matched with the confidence score of your predefined document type defined in document assembler plugin. If predefined threshold > generated document type threshold then the document gets assigned to Predefined document else it will be assigned to otherdocument type with highest confidence score.
Case of Unknown Document Type: The unknown document types are those which doesn’t get matched with any of the learned files of document types and returns the document confidencescore as 0. In Test classification this will be classified as Unknown only but when you will run a batch and if in document assembler you have configured the property change Unknown DocumentType and have that switch on, then this document gets assigned to the configured document type.