Overview

This plugin is used to generate Advanced KV pairs to make the data extraction more appropriate based on past data extracted by the user manually. It keeps track of the data which is extracted manually by the user by populating DLFs directly from the 3rd panel image. Based on this, it generates advanced KV pairs using regular expressions defined in property files and save it for corresponding document types. Its properties can be configured using an ON/OFF switch from admin UI and property files: ‘dcma-key-regex.properties’, ‘dcma-key-value-location.properties’ and ‘dcma-value-regex.properties’ defined in META-INF.

This plugin will iterate over each document level field of each document. First, it will match the value of document level field with the regex patterns defined in the properties file. Most matched regular expressions will become the value pattern for that field which is picked from the properties file. This document level field value is then searched in the OCR data {HOCR file} for that page of the document.

If value is found successfully, it will search key value in all the eight directions as a location and try to match it with the regex patterns defined in the properties file. Most matched regular expression will become the key pattern and as it is found in the left of value (i.e., value exists in right of the key), location will be set as RIGHT. If no value is present in left, plugin will consequently search its top, right, bottom and other locations and match it to the regex patterns in the properties to get the key pattern and accordingly set the location.

Note: Location is set here for processing purpose only. This location has no link with the ‘Location’ field displayed in Advanced KV pairs. Location field value will always be empty for generated advanced KV pairs.

  • If any value is not matched to any of the regex pattern, value itself will be set as the key pattern of this field.
  • Application will search the key locations in below order that can be configured through semi colon separated in the property files. As soon as it will able to find first value it will take that location:
  • LEFT
  • RIGHT
  • TOP
  • BOTTOM
  • TOP_RIGHT
  • TOP_LEFT
  • BOTTOM_RIGHT
  • BOTTOM_LEFT

Multi word support for KV Learning

Key Value Learning plugin in Export module automatically creates a Key Value field corresponding to a document level field.

This enhancement allows multi words to be used for generation for key pattern in Key Value Learning plugin in Export module. If any word is found close to the key, it will be appended to the key and will be used for the key pattern generation.

Note:

Keys will be appended left for location LEFT, BOTTOM, TOP, BOTTOM_LEFT, TOP_LEFT and appended right for location BOTTOM_RIGHT, TOP_RIGHT.

Configuration

Property File Configuration

Property file: {Ephesoft-Home}/WEB-INF/classes/META-INF/dcma-key-value-learning/dcma-key-value-location.properties

Configurable property Type of value Value options Description
key_value.location_order String LEFT;RIGHT;TOP;TOP_LEFT;TOP_RIGHT; It is a semi-colon separated list of location. It represents the order of location in which key will searched in the image. Locations specified are of key with respect to value.
key_value.max_number_record Integer NA It represents maximum number of key value pairs that can be present for any DLF. If any DLF has already this maximum number of key value fields defined, this plugin will not add any more key value pair to this DLF. Default Value is 50
key_value.tolerance_threshold Integer
  • A
  • B
  • C
Length and width of the value rectangle created by the plugin will be increased by this tolerance value (width + (width*tolerance)/100). For example, if calculated width of plugin is 100 pixels and tolerance specified is 10, resultant width will be 110 pixels.
key_value.multiplier Integer Integer value This property holds an integer value which decides on <some logic>. (Also mention range if applicable)
key_value.fetch_value String
  • FIRST
  • LAST
  • ALL
Fetch value for key value field that is being created by the plugin. Default Value supplied is FIRST.
key_value.min_key_char_count Integer NA Minimum number of characters that must be present in the extracted key. Default value is 4.
key_value.gap_between_keys Integer NA Any word found left or right (depending on the location of Key found with respect to Value) will be considered for key depending on its distance with respect to the key. Default value is 50. See below example.

Example:

Consider image contains following data:

Invoice Date: 05/02/2012 Invoice Number: 99888888

Following is the location order specified in property file:

LEFT; RIGHT; BOTTOM_LEFT; BOTTOM_RIGHT; TOP; BOTTOM; TOP_RIGHT;

If 99888888 is a value for Invoice Number document level field, “Number will be first extracted as a key. Algorithm will search for left of Number, if gap between Invoice and “Number” is less than the value specified for key_value.gap_between_keys, “Invoice Number” will be used for key pattern generation, and else only “Numberwill be considered.

Property file: {Ephesoft-Home}/WEB-INF/classes/META-INF/dcma-key-value-learning/dcma-key-regex .properties

This property file contains regular expressions that can be used for key pattern generation.

Property file: {Ephesoft-Home}/WEB-INF/classes/META-INF/dcma-key-value-learning/dcma-value-regex .properties

This property file contains regular expressions that can be used for value pattern generation.

UI Configuration

Key value learning can be turned ON/OFF from at following UI:

Configurable property Type of value Value options Description
Key Value Learning Switch List of Values
  • ON
  • OFF
Set it to ON/OFF depending on whether plugin needs to be executed or not.
Numeric Key Learning Switch List of Values
  • ON
  • OFF
Set it to ON/OFF depending on whether plugin needs to be executed or not.

Dependencies

Key value learning plugin depends on following two plugins:

  • RECOSTAR_HOCR
  • TESSERACT_HOCR

One of the above plugins must be ON for key value learning as these plugins extract data from the image and create hOCR file which is required for the Key Value learning.

Frequently Asked Questions

Question: Key value field not added to the document level field after plugin execution.

Answer: There could be multiple reasons for key value field not created after plugin execution:

Reason 1: Maximum allowed number of key value fields have been already added to the document level field.

Solution: Check the value for key_value.max_number_record. Default value provided is 50.

Reason 2: Key found during extraction has less number of characters than minimum number of characters required for key.

Solution: Check for the key_value.min_key_char_count property. Its default value supplied is 4.

Reason 3: Required location is not defined in key_value.location_order property.

Solution: Check for the value of property key_value.location_order. It should have required location specified.

Question: Key value field added but is not accurate.

Reason: One possible reason for such an issue is location order specified is not as per the requirement.

Solution: Check for the key_value.location_order property. Most possible for key with respect to value should be specified first in the list.

 

<Back| 4.0.0.0 Release Documentation

Was this article helpful to you?

Engineering

Comments are closed.