Overview

Table extraction plugin is responsible for extracting data from the batch with tabular data in the form of tables.

User defines basic table information and a set of table columns for the table.

Table extraction is performed using a table extraction rule. There may be multiple table extraction rules defined for a table. The extraction rule giving best valid table columns data is picked for showing the table extraction results. The validity of extracted columns is based on either one or combination of given validation pattern for each column and table validation rules that are applied on each row of extracted table data.

Characteristics

  • For each document, consisting of one or more pages, the table extraction algorithm will extract all tables defined for a document type.
  • Document is parsed to identify tables starting from the first page to the last page of the document.
  • One table may span one or more pages.
  • A table defined for a document would consist of multiple table columns, table extraction rules and table validation rules. Table columns and at least one table extraction rule are minimum requirements for table extraction to give some results.
  • A table extraction rule contains start pattern and end pattern that denotes boundaries for table data for extraction process. A table extraction API for an extraction rule is a combination (using AND or OR operators) of 3 kinds of validation:
    • Column Coordinates Validation
    • Column Header Validation
    • Regex Validation

This API combination denote the behavior algorithm shall use for extracting data for every table column in a row.

  • Each table extraction rule has table column extraction rules i.e. one extraction rule for each of the table columns. It contains information used in column extraction by table extraction APIs like column pattern, column header pattern, start coordinate, end coordinate, multiline Anchor, required, etc.

The summary of which column extraction rule information is used with respect to which table extraction API is:

Table extraction Rule’s API Table column extraction rule fields used.
Column Header Validation It uses column header pattern to search the data matching column header pattern as string with some fuzziness or search column header regex pattern’s best matched value in the page, Learn matched header string’s coordinates to extract data beneath it as data for extraction. The text at left or right proximity of the text beneath the header is also appended to the result column extracted value.
Column coordinate validation It uses start coordinate and end coordinate to use as coordinates denoting the vertical boundaries for location of column data on page. These two can be set by clicking on set coordinates button, uploading an image sample and drawing overlays for giving coordinates for columns. Click on Ok button sets start and end coordinates to the column extraction rule.
Regex validation Column pattern, Between left pattern and Between right pattern are used to find best matched text in each row for the column data.

  • Column Pattern: Data matching this pattern will be extracted as column data value.
  • Between Right Pattern: Data that is extracted by the column pattern should have a data to the right matching this between right pattern. This pattern must be single word capturing pattern only.
  • Between Left Pattern: Data that is extracted by the column pattern should have a data to the immediate left matching this between left pattern. This pattern must be single word capturing pattern only.
  • Note :
  • If between right or between left pattern is specified but is not matched with the immediate right or left data, data will be extracted as invalid data.
  • Only single word capturing patterns are allowed for between left and between right patterns.

Configuration

Following is the list of configurable properties for plugin in dcma-tablefinder.properties located at {EphesoftHome} \WEB-INF\classes\META-INF\dcma-table-finder\*:

Configurable property Type of value Value options Description
tablefinder.gap_between_column_words Integer NA Gap between words of same column data. Used while column header extraction. Value is defined in pixels. By default its 60.
tablefinder.rule_removal_invalid_characters List of values separated by semicolon (;) NA Invalid characters in extracted column value which need to ignored before applying the table rule to the columns.

Table Configuration

Add /Delete Table Info

User can add /delete any table information upon clicking the corresponding buttons at following UI:

C:\Users\gajendrayadav\Desktop\Screen shots\4.0.0.0_BCM_TableInfo_10001.jpg

Upon clicking the Add, following UI will be presented where user can enter values for any property:

C:\Users\gajendrayadav\Desktop\Screen shots\4.0.0.0_BCM_TableInfo_10002.jpg

Test Table

Table extraction plugin is responsible for extracting data from the batch with tabular data in the form of tables. Using test table feature User can check whether table configuration is ok to extract tabular data in the form of tables without running any batch. User can upload a valid image file or place the image file at the given path:

{base-folder}\batch-class-id \test-table

Test Table output will be shown at the following UI:

C:\Users\gajendrayadav\Desktop\Screen shots\4.0.0.0_BCM_TableInfo_10003.jpg

Configurable Properties

Following are the list of configurable properties for the plugin:

Configurable property Type of value Value options Description
Name String NA Name for the data table.
Validation Rule Operator List of values
  • OR
  • AND
In case of AND, the table row becomes valid if and only if it satisfies all the table validation rules defined. In case of OR, the table row becomes valid if it satisfies at least one of the validation rules.
Remove Invalid Rows Boolean
  • True if checked.
  • False if unchecked.
Whether to remove invalid rows according to table validation rules from table result data or not.
Currency List of Values Ephesoft supported currencies. Name of the currency on the basis of which validation rules are to be applied for table.All table columns with currency field checked true, defined in a column extraction rule, will undergo currency extraction on the basis of this value for validation rule application.

Table Column Configuration

Add /Delete Table Column Info

Table column information can be added /deleted by clicking corresponding button at following UI:

C:\Users\gajendrayadav\Desktop\Screen shots\4.0.0.0_BCM_TableColumnInfo_10001.jpg

  • Upon clicking the add button, following UI will be presented where user can add table column fields:

C:\Users\gajendrayadav\Desktop\Screen shots\4.0.0.0_BCM_TableColumnInfo_10002.jpg

Configurable Properties

Following are the list of configurable properties for the plugin:

Configurable property Type of value Value options Description
Column Name String NA Name of the column.
Description String NA Description of the column.
Validation Pattern String NA Validation pattern of the pattern. This pattern validates extracted column data for each table row.
Alternate Values String NA A semi-colon separated list of values entered by user. These values appear as suggestions for the column in the table view at validation screen.

Table Extraction Rule Configuration

Add /Delete Table Extraction Rule

Table extraction rule can be added /deleted by clicking corresponding button at following UI:

C:\Users\gajendrayadav\Desktop\Screen shots\4.0.0.0_BCM_TableExtractionRule_10001.jpg

  • Upon clicking the add button, following UI will be presented where user can add table extraction rule fields:

C:\Users\gajendrayadav\Desktop\Screen shots\4.0.0.0_BCM_TableExtractionRule_10002.jpg

Test Table Extraction Rule

Using test table extraction rule feature User can check whether a table extraction rule configuration is ok to extract tabular data in the form of tables without running any batch. User can upload/drag & drop a valid image file or place the image file at the given path:

{base-folder}\batch-class-id \test-table

Test Table Extraction Rule output will be shown at the following UI:

C:\Users\gajendrayadav\Desktop\Screen shots\4.0.0.0_BCM_TableExtractionRule_10003.jpg

Configurable Properties

Following are the list of configurable properties for the plugin:

Configurable property Type of value Value options Description
Rule Name String NA Unique name of table extraction rule.
Start Pattern String A keyword or a valid regex expression. A keyword to be matched as a string with some fuzziness configurable from property file or regex pattern to match some string marking the beginning of the table in a page. Correct start pattern must be specified for table data to be extracted. It can be validated using the check button.
End Pattern String A keyword or a valid regex expression. A keyword to be matched as a string with some fuzziness configurable from property file or regex pattern to match some string marking the end of the table. It can be validated using the check button.
Table Extraction API Combination of some Boolean values using AND and OR operator. A combination of selected table extraction APIs (column header validation, column coordinate validation and regex validation) with AND/OR operators to decide algorithm to extract table columns.

Column Extraction Rule Configuration

Edit Column Extraction Rule

Column extraction rule can be updated at following UI:

C:\Users\gajendrayadav\Desktop\Screen shots\4.0.0.0_BCM_TableColumnExtractionRule_10001.jpg

  • Upon clicking the edit button, following UI will be presented where user can edit column extraction rule fields:

C:\Users\gajendrayadav\Desktop\Screen shots\4.0.0.0_BCM_TableColumnExtractionRule_10002.jpg

Configurable Properties

Configurable property Type of value Value options Description
Column Name String NA Name of the column. Non editable field, only for reference with table column for the table.
Column Pattern Regular Expression Valid regular expression The regex pattern for column data.
Between Left Regular Expression Valid regular expression The regex pattern for data in left of the actual searched column.
Between Right Regular Expression Valid regular expression The regex pattern for data in right of the actual searched column.
Column Header Pattern Regular Expression A keyword or a valid regex expression. A keyword to be searched as a string with some fuzziness in the page or regex pattern to search column header regex pattern’s best matched value in the page.
Start Coordinate Integer NA Start Coordinate for the column.
End Coordinate Integer NA End Coordinate for the column.
Multiline anchor Boolean
  • True if checked.
  • False if unchecked
Marks the column as a required column and anchor to denote the start of a new row in table of the page. This is useful in case of one table row spanning in multiple rows in documents.
Required Boolean
  • True if checked.
  • False if unchecked
If radio button checked, each table row extracted must contain some valid data for that column. If invalid data is extracted for the column, corresponding row will not be added to table data.
Extract data from column String value with support of a List of values. List of values containing names of other columns for the table that can be selected to fill textbox containing name of the column for extraction. Selection for the table column name from which current column’s data needs to be extracted when using regular expression based extraction. If it is left empty, then it is not applicable.
Currency Boolean For example :$ 12,000.00 will be manipulated as 12000.00 for validations.EURO 12.000,00 will be manipulated as 12000.00 for validations. Specifies whether the column is a currency field. If it’s a currency field then validation rules will be applied according to the currency representation.Manipulation will be done on the basis of the value for currency chosen at Table Info Level. If this field is unchecked, no currency extraction will be done for the column irrespective of the value chosen at Table Info Level.

Table Validation Rule Configuration

Add /Delete Table Validation Rule

A table validation rule is applicable to operands (table columns) that must be containing extracted column data as numerical values. Table validation rules are applied to rows of table extraction data. Multiple rules are applied at each row in OR or AND fashion as defined at table information level in Validation operator. If a row is invalid it is shown as orange shaded in extraction results if remove invalid rows is not selected at table info definition level or are removed from extraction result if remove invalid rows is selected at table info definition level.

Table validation rule can be added /deleted by clicking corresponding button at following UI:

C:\Users\gajendrayadav\Desktop\Screen shots\4.0.0.0_BCM_TableValidaationRule_10001.jpg

  • Upon clicking the add button, following UI will be presented where user can add table validation rule fields:

C:\Users\gajendrayadav\Desktop\Screen shots\4.0.0.0_BCM_TableValidaationRule_10002.jpg

First drop down list contains list of operands (Table column names).

C:\Users\gajendrayadav\Desktop\Screen shots\4.0.0.0_BCM_TableValidaationRule_10003.jpg

Second drop down consists of list of valid mathematical operators for a rule.

C:\Users\gajendrayadav\Desktop\Screen shots\4.0.0.0_BCM_TableValidaationRule_10004.jpg

  • Clear: This button clears the rule.
Configurable property Type of value Value options Description
Rule String NA A mathematical rule that applies to the combination of column values and governs the validity of a table row data.
Description String NA The rule description. This description becomes visible on the table view on a selecting a row not satisfying the rule defined for it.

Column Header Based Extraction

Enter column header regex pattern from following UI:

[Batch Class List]>>[Batch Class]>>[Document Type]>>[Table Info]>>[Table Extraction Rule]>>[Table Column Extraction Rule]

User can set the Column header pattern field for each table column extraction rule.

There is a configurable property for table extraction using column header in

{ephesoft-home}\WEB-INF\classes\META-INF\dcma-table-finder\*

tablefinder.gap_between_column_words=60

This value should be specified in pixels. In addition to words that are below the column header, all words (to the left or right) will also be extracted for the column in case gap between them and the extracted data is less than the value specified for gap_between_column_word.

Column Coordinates Based Extraction

Admin can set the column coordinates by clicking on Set Coordinates button at following:

[Batch Class List]>>[Batch Class]>>[Document Type]>>[Table Info]>>[Table Extraction Rule]>>[Table Column Extraction Rule]

C:\Users\gajendrayadav\Desktop\Screen shots\4.0.0.0_BCM_SetCoordinates_10002.jpg

On clicking the Set Coordinates button, new UI will open where user can select an image and select column coordinates by drawing an overlay on image.

[[File:3.1_Table_Extraction_Plugin_Documentation_10015.jpg|400px]]

  • User need to draw a rectangle to select the start and end column coordinates for selected column.
  • To select coordinates for other columns, select that column from the drop down list on left hand side. This drop down contains names of all the table columns for selected columns.
  • Clear Button: On clicking Clear button, coordinates for selected table column will be cleared.
  • Clear All Button: On clicking Clear All button, coordinates for all the table columns of the selected table will be cleared.

Regex Based Extraction

A table extraction rule must be defined with have valid start and end patterns, along with Regex validation selected in any combination of table extraction API.

User needs to enter valid column patterns (optional between left pattern and between right patterns ) for regex based extraction.

Select table extraction technique to be used

Select a table Extraction API in combination using AND or OR operators between three techniques as shown below:

[Batch Class List]>>[Batch Class]>>[Document Type]>>[Table Info]>>[Table Extraction Rule]

Dependencies

Table extraction plugin has following dependencies:

  • RECOSTAR_HOCR
  • TESSERACT_HOCR

One of the above plugins must be ON for key value learning as these plugins extract data from the image and create hOCR file which is required for the table extraction.

Troubleshooting

Following are few common areas for troubleshooting for table extraction plugin:

S no. Error message Possible root cause
1 Table info list is null or empty. No table is configured for the document type.
2 Table Columns Info list is null or empty. No table column is defined for table.
3 Table Extraction Rule List is null or empty. No table extraction rule is defined for table.
4 Exception occurred while validating rule for a table row. Table validation rules could not be applied properly on extraction results.
5 Skipping Table extraction. Switch set as off. Table extraction switch is set to OFF.

 

Copy Table

Overview

This feature helps in making a copy of the existing tablepicture1

The table has following configuration fields -> name, validation rule operator, remove invalid rows and currency. As each table should have different names in a document, copied table will be renamed automatically.

 

Steps to copy a table

  • Open the document from the document type list appearing under the batch class in which the table is to be copied. Select the Tables from the batch class tree view appearing on the left of the screen and click Copy button on the top of the screen
  • A new row is added to the existing table list.
  • After completing the table configurations click on Apply

    Table Import/Export

    Overview

    This feature allows a user to export/import existing tables within documents or batch classes or even different Ephesoft Transact instances. Using this feature, user has a benefit of transferring the exact information of tables to another Ephesoft application running on a remote system which will save a lot of time needed to reconfigure tables for having exact processing ability on a remote system.

    Export Tables

    By exporting tables, one can transfer the exact environment/configuration of tables present on a system to other. This also helps a lot in testing and debugging of issues faced in a configuration dependent environment.

    Steps for Exporting Tables:

     

    • On Table Listing screen, select table to be exported from the grid by via checkbox and then click on “Export” button.picture2This exported zipped table file can now be transferred to any other system and can be imported over there. 

      Please Note: Before exporting all the changes should be saved, else you will get an error pop-up asking to save your pending changes.

       

      Refer below screen shot for same:

      In this user has added a new table but it’s not saved. Also, user can export multiple tables at a time.picture3

      Data Exported with Table:

      When we export the table the complete table hierarchy which is defined in database, is exported in a zip file.

      Import Table

      By importing table, one can create the exact environment/configuration for table present on any other remote system from which table has been exported.

      Steps for Importing Table

      Prerequisites:                            

      • Exported zipped table

       

      Steps:

      • On ‘Table Listing’ UI, click “Import Table” link present in Import Table(s) panel or drag and drop the zip file for exported tables in the bottom panel as shown below:picture5After completing the upload of table user will be shown a success message.picture6Please note:  User can upload only one zip file at a time but zip file may contain multiple tables.

<Back| 4.0.0.0 Release Documentation

Was this article helpful to you?

Engineering

Comments are closed.