Overview

This plug-in performs the functionality of extracting the document level field’s value according to the regex pattern given. We give a set of values as the regex pattern separated by semicolon. While extracting data, we break the regex pattern with respect to semicolon and the last part is treated as the pattern. We first match the last part, if it matches with some value found then all the other parts are searched going from right to left to the left of the value found. While the last part is compared as regex pattern, rest of the parts is compared as words. When all the parts are found then the value is extracted. If even any one value is not found then the value is not extracted.

Example

Consider following value is specified for the pattern field of a document level field:

Invoice;Date;\d{1,2}[/]\d{1,2}[/]\d{2,4}

Plugin will use last value in the semi-colon separated list, i.e., \d{1,2}\d{1,2}\d{2,4} for value extraction.

Consider following data is supplied as input data, i.e., present in an image:

Case 1: Input Data:          Invoice Date 21/03/2012

Result: This will extract 21/03/2012 successfully as Date and Invoice both are found to the left of extracted value 21/03/2102.

Case 2: Input Data:          Date 21/03/2012

Result: Regex pattern will be matched in this case but data won’t be extracted as Invoice is not found to the left of Date.

Configuration

Plugin Configurations

Regular regex extraction can be configured at following UI:

Plugin Configuration

Properties description:

Configurable property Type of value Value options Description
Regular Regex

Extraction Switch

List of Values
  • ON
  • OFF

 

The switch that describes that plug-in has to run or not.

Default ON.

Regular Regex Confidence Score Integer 0 – 100 Acts as a multiplier for the confidence score calculated by matching regex.

 

To add/edit the regular expression required for the Regular Regex Extraction, the user must add the corresponding document level field at following UI:

2

 

Upon Adding/Editing the document level field, following screen will be presented where regular expression can be entered in Pattern column:

3

Troubleshooting

Following are few common error messages seen due to mal-functioning of the plugin:

S no. Error message Possible root cause
1 Invalid input pattern sequence. This occurs when the entered regex pattern is not a valid pattern or is not of proper format.
2 No FieldType data found from data base for document type This happens when there is no field type initialized in a document.

Was this article helpful to you?

Engineering

Comments are closed.