Please find below the steps we performed on cloud server to get the Fuzzy DB plugin going.

  • Without changing any fuzzy DB configuration, we performed fuzzy search on the validation screen with the “*” as the search parameter.
  • No results were found.
  • We changed the search parameter to “06429”, which originally was the value extracted for “Invoice Number” Field.
  • No results were found.
  • But on analysing the tomcat console, we saw statements like “5288439 Documents found” and “1 Documents found” for respective cases.
  • This led us to believe that even after so many documents have been fetched, they are being ignored for some other condition.
  • We checked the “Minimum Confidence Threshold” property of the fuzzy DB Plugin, which was set to “89”.
  • And precisely for the reason to have such a high value for the threshold, all the results with confidence score less than “89” were ignored, and in this case all the documents had confidence score less than “89” and hence no results were shown.
  • To cross check the above case, we reduced the value of “Minimum Confidence Threshold” to “10”. And restarted the batch from the validation module so that new values are picked for the batch.

Untitled27
We performed the search on the same batch with search parameter “06429” and we got the desired results.

Untitled28

Notice the confidence score for this search “88.82”. Which is less than the initial threshold of 89 but more than the new threshold of 10. So the issue was with the high confidence score value.

The confidence of a particular document depends on various factors. Please see below for details:

  • Fuzzy DB searching is similarity based unlike the SQL searches which are equality based.
  • So the confidence of a string search depends on the frequency of its occurrence, minimum length of the word, how similar is it to other words available in the documents and many other.
  • Refer to this link for further understanding of how the confidence is calculated. http://lucene.apache.org/core/old_versioned_docs/versions/3_0_0/scoring.html
  • The whole confidence score calculation is dependent on how Lucene handles the indexes.
  • Although there is a way through which we can customize and enhance the score generation algorithm. (http://lucene.apache.org/core/old_versioned_docs/versions/3_0_0/scoring.html#Changing your Scoring — Expert Level).




Was this article helpful to you?

J.D. Abbey

Comments are closed.