What’s New in Transact 4.5?


Extraction | Table Extraction for 2-Column Layout

 

In previous versions of Transact, table extraction was allowed for tables with a single column layout. If the data was wrapped into multiple columns with repeating headers on the same page or extending from one page to another, table extraction failed to extract all data.

In Ephesoft Transact v4.5.0.0, table extraction in a 2-column layout is supported. For this purpose, a checkbox called “2-Column Layout” has been added to the Table Extraction Rule screen.

You can use 2-column layout extraction in the following cases:

  • to extract data from the table only in the right column of the page;
  • to extract data from the table extending from the left column to the right column on the same page;
  • to extract data from the table starting in the right column of the first page and extending to the left column of the second page.

The following configurations are required prior to performing table extraction:

  1. Make sure to add the Table Extraction plugin in the Extraction module.
  2. Turn-on the Table Extraction switch on the Table Extraction Plugin Configuration screen.

It is also important to make sure that the page is divided into two columns before using the 2-Column Layout option for table extraction. In the below example, the table contains data segregated into two columns with repeating headers.

C:\Users\Ephesoft\AppData\Local\Microsoft\Windows\INetCache\Content.Word\tablesample.png

 

To extract data from tables with a 2-column layout:

1. Open or create a Batch Class.

2. Create a new Document Type.

3. Navigate to the Document Type and click on the Tables section in the left panel.

Click Add to add a new table.

C:\Users\Ephesoft\AppData\Local\Microsoft\Windows\INetCache\Content.Word\addtable1.png

4. Click Apply to save table details.

C:\Users\Ephesoft\AppData\Local\Microsoft\Windows\INetCache\Content.Word\addtable2.png

5. Navigate to Table Columns and click on the Add button to add the table columns.

C:\Users\Ephesoft\AppData\Local\Microsoft\Windows\INetCache\Content.Word\addcolumns.png

6. Click Apply to save your changes.

7. Navigate to Table Extraction Rules in the left panel and add a new Table Extraction Rule.

C:\Users\Ephesoft\AppData\Local\Microsoft\Windows\INetCache\Content.Word\tableextr.png

8. On the Table Extraction Rule screen, click on the Select Files link or simply drag-and-drop the file containing the table.

C:\Users\Ephesoft\AppData\Local\Microsoft\Windows\INetCache\Content.Word\extr-rule.png

9. Configure a Table Extraction Rule:

  • Enter a name for the Extraction Rule and select Table Extraction API in the Extraction Rule tab.
  • Select the 2-Column Layout checkbox.
  • Collapse the Extraction Rule tab to get a better view of the Column Configuration tab.
  • From the Table Column dropdown in the Column Configuration tab, select a pre-defined column.
  • On the left-hand side, drag-and-drop the Start Pattern and End Pattern overlay to define the beginning and end of the table. Both patterns must be unique, i.e. they must not appear anywhere else in the document. If required, you can also use Pattern Left and Pattern Right overlays to indicate areas to the left and right of the Column header.
  • Specify Column Header and Column Data by using the corresponding overlays. If the table does not have column headers, you can specify only Column Data and use table extraction based on regex or column coordinates (Table Extraction API –> Regex Extraction/Column Coordinates).
  • Select existing Regex or create a new Regex for each value by clicking on each overlay and using the Suggest Regex dialog box.
  • Click on the Validate Regex button to validate the defined Regex patterns.

C:\Users\Ephesoft\AppData\Local\Microsoft\Windows\INetCache\Content.Word\7799-table extraction1.png

10. Click the Test Table button.

The extraction results are populated in the Test Table Results section on the same screen.

C:\Users\Ephesoft\AppData\Local\Microsoft\Windows\INetCache\Content.Word\7799-table extraction results.png

11. Click Apply to save the configuration.

 

To extract data from the table only in the right column, follow the same steps as described above, but in this case, place Start and End Pattern overlays in the right-hand column. Click on the Test Table button to see the values extracted only from the specified table area.

C:\Users\Ephesoft\AppData\Local\Microsoft\Windows\INetCache\Content.Word\right column extr.png

 

To extract data from the table extending from the right column of the first page to the left column of the second page, place the Start Pattern overlay in the right column of the first page and use the corresponding overlays to specify Column Header/Data:

C:\Users\Ephesoft\AppData\Local\Microsoft\Windows\INetCache\Content.Word\2page extr1.png

 

Use the Page Number drop-down in top right corner to move to the second page and use the End Pattern overlay to specify the end of the table. Make sure that both Start and End Pattern overlays are unique.

C:\Users\Ephesoft\AppData\Local\Microsoft\Windows\INetCache\Content.Word\2page extr2.png

 

Click the Apply button, which will take you back to the Table Extraction Rules list. Here, select the rule and click on the Test Table Extraction Rule button.

C:\Users\Ephesoft\AppData\Local\Microsoft\Windows\INetCache\Content.Word\2page extr3.png

The results extracted from both pages are populated on a new screen:

C:\Users\Ephesoft\AppData\Local\Microsoft\Windows\INetCache\Content.Word\2page results1.png

C:\Users\Ephesoft\AppData\Local\Microsoft\Windows\INetCache\Content.Word\2page results2.png

Note: You can check the results extracted from the tables extending from one page to another by using the Test Table Extraction Rule button on the Table Extraction Rules list screen. The Test Table option offered on the Table Extraction Rule screen will only fetch results from one page that is displayed in the left-hand section of the screen.

 

In some cases, despite applied configurations, table extraction might fail to produce the necessary results. Let’s consider the following example. In this case, data must be extracted only from the right column.

C:\Users\Ephesoft\AppData\Local\Microsoft\Windows\INetCache\Content.Word\error1.png

However, when you click on the Test Table button, values are extracted from both the left and right columns.

C:\Users\Ephesoft\AppData\Local\Microsoft\Windows\INetCache\Content.Word\error2.png

This happens because the selected Start and End Patterns are not unique. It is important to remember that table extraction is applied to the entire document. So, if your Start and End Patterns are not unique, the application will fetch results which are beyond the required scope.

The user must change the Start Pattern and extend the End Pattern overlay to include unique data.

C:\Users\Ephesoft\AppData\Local\Microsoft\Windows\INetCache\Content.Word\error3.png

Let’s check table extraction results by clicking on the Test Table button. Now, the data is extracted correctly.

C:\Users\Ephesoft\AppData\Local\Microsoft\Windows\INetCache\Content.Word\error4.png

In all cases, make sure to check that:

  • The Table Extraction plugin is added and turned ON in the Extraction Module
  • The document has a 2-column layout
  • The 2-Column Layout checkbox is selected
  • The Table Extraction API is selected correctly
  • The Start and End Patterns are unique and do not appear anywhere else in the document