What’s New in Transact 4.5?


Extraction | Support for PDF Portfolio Files

Previously, when the user tried to process PDF Portfolio files, the batches went into error at the Folder Import stage. The system used to break the PDF Portfolio into TIFF files successfully, however, PDF page count was always calculated only for the first embedded file. Since the Portfolio usually includes two or more files, the generated TIFF pages count was not equal to the input PDF page count, so the application used to go into error (“Exception in breaking the input file. Converted Tiff files count not equal to the TIFF pages count”).

In Ephesoft Transact v.4.5.0.0, PDF Portfolios are supported. You can now successfully process batches with this type of files as well as test your classification and extraction results. PDF Portfolios upload is also supported for creating KV Extraction Rules, Cross Section Extraction Rules, Paragraph Extraction Rules, Barcode Extraction Rules, and Table Extraction Rules.

Note: PDF portfolio can contain several individual PDF files. The files within the PDF Portfolio can have one or several pages.

For example, the PDF Portfolio shown below includes 3 PDF files: the first file contains two pages, the second file contains four pages, and the third file contains one page.

C:\Users\Ephesoft\AppData\Local\Microsoft\Windows\INetCache\Content.Word\1.png

So, this PDF Portfolio contains overall seven pages.

 

Let’s use this file to process a batch:

1. Log in to Ephesoft Transact.

2. Open or create a Batch Class.

3. Create a Document Type and upload a Learn File (PDF Portfolio).

4. Upload Test Classification file and click Test Classification.

C:\Users\Ephesoft\AppData\Local\Microsoft\Windows\INetCache\Content.Word\2.png

5. On the Test Classification screen, click Classify. All seven pages are successfully classified.

C:\Users\Ephesoft\AppData\Local\Microsoft\Windows\INetCache\Content.Word\3.png

6. Create Index Fields/Tables and configure extraction rules by clicking Add on corresponding Extraction Rule screen (KV Extraction / Cross Section Extraction / Paragraph Extraction / Barcode Extraction / Table Extraction).

C:\Users\Ephesoft\AppData\Local\Microsoft\Windows\INetCache\Content.Word\4.png

7. On the selected Extraction Rule Configuration screen, click Select File(s) or simply drag and drop the file to upload the PDF Portfolio. All seven pages of our PDF unit will be successfully uploaded like shown below:

  • KV Extraction Rule Configuration screen

C:\Users\Ephesoft\AppData\Local\Microsoft\Windows\INetCache\Content.Word\5.png

  • Cross Section Extraction Rule Configuration screen

C:\Users\Ephesoft\AppData\Local\Microsoft\Windows\INetCache\Content.Word\6.png

  • Paragraph Extraction Rule Configuration screen

C:\Users\Ephesoft\AppData\Local\Microsoft\Windows\INetCache\Content.Word\7.png

  • Barcode Extraction Rule Configuration screen

C:\Users\Ephesoft\AppData\Local\Microsoft\Windows\INetCache\Content.Word\8.png

  • Table Extraction Rule Configuration screen

C:\Users\Ephesoft\AppData\Local\Microsoft\Windows\INetCache\Content.Word\9.png

8. Once the extraction rules are configured, we can check the extraction results on the Test Extraction screen. For that, navigate to the Document Type screen, upload the Test Extraction file and click Test Extraction.

C:\Users\Ephesoft\AppData\Local\Microsoft\Windows\INetCache\Content.Word\12.png

9. On the Test Extraction screen, click Extract. The data is extracted from the entire document (seven pages).

C:\Users\Ephesoft\AppData\Local\Microsoft\Windows\INetCache\Content.Word\13.png

10. Now, go to the Upload Batch screen, upload the file, select the Batch Class and click Start Batch.

C:\Users\Ephesoft\AppData\Local\Microsoft\Windows\INetCache\Content.Word\10.png

11. If your input is required to confirm the Document Type, the batch will stop at the Review stage. On the Review screen, all pages of the PDF Portfolio will be picked up and displayed. Then, you can merge/split them as required and learn the file(s). Once done, click Review to move on to the next stage.

C:\Users\Ephesoft\AppData\Local\Microsoft\Windows\INetCache\Content.Word\11.png

12. If your input is required to confirm the extracted value(s), the batch will stop at the Validation stage. On the Validation screen, all pages of the PDF Portfolio will be picked up and displayed. Then, you can update and learn the values as required. Once done, click Validate to move on to the next stage.

C:\Users\Ephesoft\AppData\Local\Microsoft\Windows\INetCache\Content.Word\14.png

The batch containing the PDF Portfolio is processed successfully.

C:\Users\Ephesoft\AppData\Local\Microsoft\Windows\INetCache\Content.Word\15.png

Thus, as you could see, the PDF Portfolio is supported at every stage of batch configuration and processing.