Last Updated on

Overview

Ephesoft helps you turn unstructured content into actionable data. Our supervised machine learning-powered document and data capture solutions reduce errors, improve workflow and save you money. Here’s an overview of how it works:

What you will learn

This is a quick, hands-on tutorial that will teach you the basic concepts of how to put together a document and data capture project.

 

 

 

What you need

  • Ephesoft Transact 4.5.0.0 or greater
  • Tutorial Helper files ready for download here
    • In this download you will find the following folder structure:
      • Tutorial Images
        • W-2
        • Insurance Form
        • Bank Statement
      • 30-minutes of time to run through the tutorial

Create Batch Class

  • Open your instance of Ephesoft Transact
    • Whether you have a cloud or on-premises instance of Ephesoft Transact, you will need to connect to the user interface by navigating to the following URL: http://<Server-Name>:8080/dcma
    • You should see the Ephesoft Transact Home Screen

    • Click the Administrator user icon
    • Click the Batch Class Management icon
    • If you are not logged in already, you can log in with the following credentials:
      • User Name: ephesoft
      • Password: demo

NOTE: Your login details could be different if you have a cloud instance.

    • Click the “Add” button to create a new Batch Class
      • A Batch Class is at the heart of the OCR/Classification/Extraction project. A Batch Class is a collection of document types, index fields, extraction rules and workflow processes that are used to help automate a business document and data capture problem.

      • Batch Class Name: FinancialDocumentTutorial
      • Batch Class Description: Financial Document Tutorial
      • Priority: 1
      • Drop Folder: C:\Ephesoft\SharedFolders\WATCH\Financial Document Tutorial

NOTE: The Drop Folder is any valid folder (local or network) path. Your Drop Folder will differ if you are on the Linux platform.

      • Click “OK” to create the Batch Class

Create Document Types

Now that we have a newly created Batch Class, the next step is to create the Document Types. A Document Type at its most basic level is a category that is used to organize different types of business-related content.  Ephesoft Transact has a powerful content classification algorithm that helps organizations separate documents into pre-configured categories.

    • Open the Batch Class that you created in the previous step by double-clicking on it or selecting it and the “Open” button.
    • Click the “Doc Type”  button to create a Document Type.
      • Click “Create New”
      • A new Doc Type row will be added. Enter its name and description:
        • Name:  W-2
        • Description: W-2

NOTE: As a best practice, Document Type names should not include any spaces.

      • Click the “Apply” button to save the newly created Document Type.
    • Train the Doc Type for Classification
    • Now that the Document Type is created and saved. You will need to train it with a representative training set.
      • Select the newly created “W-2” Document Type.

    • Drag the provided W-2 sample file (downloaded at the beginning of this tutorial) into the “Upload Learn File(s)” area to train the system on what a W-2 looks like. This will help Ephesoft Transact to recognize a W-2 from other types of documents.
    • Repeat steps 5.2-5.3 for the Insurance Form and Bank Statement documents. Use the respective sample files to train each of the Document Types.
    • Test Document Classification
      • Now that the training is complete for all three forms, drag the three documents into the “Upload Test Classification Files(s)” area. This is where we can test documents to make sure Ephesoft Transact is separating and classifying them correctly.
      • Click the “Test Classification” button. This will open an administrative screen where you can see the results of the document classification.
      • Click the “Classify” button in the upper left-hand corner.
  • Inspect the results and make sure you have three distinct documents separated. Click the “Close” button to exit the classification testing interface.

Create Index Fields

The next step will be to create Index Fields that that will be used to capture data from the documents. An Index Field can be populated manually by a validation operator or automatically by an extraction technique such as Key-Value Extraction or Fuzzy DB.

    • Create an Index Field for the W-2 Document Type.
      • Expand the Document Types tree to show all three document types. Expand the W-2 Document Type tree and click on “Index Fields”.

      • Click the “Add” button to insert a new row.
        • Name: EmployeesName
        • Description: Employees Name
      • Click the “Apply” button to save the newly created field.

NOTE: As a best practice, Index Field names should not contain any spaces.

      • We are going to force the review of this field so that we make sure the data is present and reviewed every time in the validation screen.

      • Click the “Additional Configuration” dropdown list and select the “Force Review” checkbox.
      • Click “Apply” button to save changes.
      • Click the “Add” button to insert the second row:
        • Name: SSN
        • Description: Social Security Number
      • Click the “Apply” button to save the newly created field.
  • Create a Key-Value Extraction Rule
    • Now that we have a couple of Index Fields, we can create an extraction rule to populate the SSN Index Field.
      • Expand “Index Fields” on the left-hand tree navigation menu and select the “SSN”. Underneath the SSN field is the “KV Extraction Rule”. This is the most common way to build out extraction logic.
      • Click “Add” on the top menu bar to create a KV Extraction Rule.
      • The KV Extraction Rule builder interface will appear.
      • Drag and Drop or click the “Select Files” link to upload the W-2.pdf document to build an extraction rule on it.
      • Move and resize the Green (Key) and Red (Value) boxes or zones as shown below to build a relationship between the Key and the Value.

      • Now that the Green (Key) and Red (Value) zones are in their correct places, the Key and Value input fields need to be defined. The easiest way to do this is to single-click in the Green (Key) and Red (Value) zones.

 

  • Single-click in the Green (Key) zone.

      • A context menu will pop up for the Key. Click “OK” to accept the default Regex pattern. This will populate the Key field in the left-hand side panel.
        • Now that the Key is populated with a keyword anchor, we need to supply a pattern for what the Value will hold. In this case, it’s an SSN number with a pattern of XXX-XX-XXXX. Regular expressions are used for this type of pattern matching.
        • Click on the Red (Value) zone to populate a regular expression pattern.
      • Ephesoft Transact will automatically recognize the SSN pattern that can be used for matching a Social Security Number.

      • Another option to populate this pattern would be to use the “Regex Builder”. This tool will allow for custom patterns to be built.

      • You can access the “Regex Builder” tool by using the drop-down menu in the Value pattern field in the left-hand side panel.

Once you have the Green (Key) and Red (Value) zones in the correct area and the Key and Value fields populated, you can test the extraction rule to make sure that the data is extracted correctly. To do so, select the “Test KV” button. A separate window will appear below the document preview with the extracted values.

      • Click “Apply KV” button to save the extraction rule.
      • Click the “Apply” button to commit it for the SSN Index Field.

Export Setup

    • The last setup step is to configure the export location for documents after Ephesoft Transact has completed processing them.
    • Navigate to the Module folder on the left-hand side navigation tree.
    • Expand the Export module and select the COPY_BATCH_XML plugin. This plugin is responsible for exporting out a structured XML file and the PDF images to a network or local path folder.

  • Take note of the “Export Document Folder Location”. This is where the PDF files will be exported. The default location is “C:\Ephesoft\SharedFolders\final-drop-folder\…” with subfolders named by the Batch Class, Batch Identifier and Date.
  • Let’s add the Document Type to the “Export Document File Name”.
  • Export Document File Name:  $DOCUMENT_TYPE & _ & $BATCH_IDENTIFIER & _ & $DOCUMENT_ID
  • This will add the Document Type to the PDF file export.
    • Select the “Apply” and “Deploy” buttons to commit and activate the changes.
    • Select the “Close” button.
    • The next step will be to run a batch of documents end-to-end through the process.

Run a Batch of Documents

There are many ways to run a batch of documents through a Batch Class. Let’s look at one of the simplest ways using the upload batch interface. 

  • Use the Upload Batch interface
  • From the left-hand pop-out navigation panel, select “Upload Batch”. To access this panel, hover your mouse cursor over the triangular arrow pointing right near the middle of the left side of the screen.

  • From the Batch Class drop-down menu at the top middle of the screen, select the the Batch Class that was created earlier (Financial Document Tutorial) to process the documents against it.
  • Use the “Drag and Drop Files Here” area or “Select Files” option add the three documents for processing. They will be listed in the window above as shown below.

  • Click the “Start Batch” button. This will initiate the process.
    • Use the pop-out menu on the left side of the screen again and select “Batch Instance Management” listed under Administrator.
    • In the Batch Instance Management screen, you will see a Batch Instance appear with a Batch Class Name of “FinancialDocumentTutorial”. Once the batch is done processing, its Status will be listed as “Ready For Validation”.
    • With this batch selected, click the “Open” and “OK” buttons to see the results in the Review/Validation screen.

    • Click the “Validate” button twice from the top menu bar and the “OK” button in the Validation Done pop-up dialog box. The batch will move to the export phase of the process and create three PDF files in the final drop folder location as defined in step 7.

Congratulations!

You have completed your first document and data capture project. You now have a basic understanding of how to create a Batch Class, define Document Types and Index Fields, set up Export options and run a batch of documents through Ephesoft Transact.