Contents

 

 

 

Batch Class Import/Export

Overview

This feature allows a user to export/import an existing batch class to other Ephesoft instance. Using this feature, user has a benefit of transferring the exact information of a batch class to another Ephesoft application running on a remote system which will save a lot of time needed to reconfigure the batch class to having exact processing ability on a remote system.

Export Batch Class

By exporting a batch class, one can transfer the exact environment/configuration of a batch class present on a system to other. This helps a lot when a system fault occurs since there is no immediate way to migrate the environment to run batches. This also helps a lot in testing and debugging of issues faced in a configuration dependent environment.

Steps for exporting a Batch Class

  • On ‘Batch Class Management’ UI, select a batch class to be exported (BC1 in screenshot) and click ‘Export’ button.

ExportBatchClass.jpg

 

  • An application pop up will be generated as displayed below:

ExportBatchClassPopup.jpg
Already selected disabled checkboxes show the data which is mandatory to be exported.
Here, one has an option to either check or uncheck ‘image-classification-sample’ and ‘lucene-search-classification-sample’ based on which, already learnt files and sample html, tiff and xml files will or will not be exported.

Note: If user exports a batch class by un-checking ‘image-classification-sample’ and ‘lucene-search-classification-sample’, user first needs to put sample images in:

‘//Shared Folders/BC1 (Batch Class ID)/ image-classification-sample and lucene-search-classification-sample’

folders and learn those too by clicking on ‘Learn Files’ button before running a batch in the exported batch class. Else the batch will go into error.

 

  • After that, one just needs to click ‘Save’ button and corresponding Batch Class will get exported to the desired location in ‘zip’ format.

Clicking ‘Cancel’ will not perform any operation and pop up will be closed.
This zipped batch class file can now be transferred to any other system and can be imported over there.
Data Exported with Batch Class:

When batch class is exported the following data is exported with batch class:

  • Batch Class specific folder including Script files, properties files, sample images etc.
  • Document types, page types, RegEx Expressions (complete batch class hierarchy) defined in database.
  • Some optional data can be selected by the Export Batch Class pop up UI.

Import Batch Class

By importing a batch class, one can create the exact environment/configuration for a batch class present on any other remote system from which a batch class has been exported. This includes import of batch class configurations, document types and related html, xml and tiff files, learnt indexes etc.

Steps for importing a batch class

Prerequisites:

Exported zipped batch class file.

 

  • On ‘Batch Class Management’ UI, click Import button. Following “Import Batch Class” pop up will be generated:

ImportBatchClass.jpg

 

  • Here user needs to browse the zipped batch class file and click ‘Attach’ button. Pop will expand as displayed below:

ImportBatchClassPopup.jpg
Following are the options provided:

 

  • Browse and Attach buttons.

To replace the existing attached zipped batch class file with a new one. User doesn’t need to do anything if the existing attached zipped batch class file is correct.

 

  • UNC folder textbox/dropdown and Use Existing Checkbox.

Here user has an option to either override an existing batch class or create a completely new batch class from the attached zipped batch class file.

Create a new Batch class:

To create a new batch class from the exported zip file, one needs to uncheck ‘Use Existing’ checkbox. As soon as the checkbox is unchecked, UNC Folder dropdown will turn into an empty checkbox and user has to write the exact path (like D:\\Shared Folders\\new-public-unc-folder) where he needs to create a new UNC folder for the newly created batch class.

Please note that the unc folder should be non-existing and unique, else application will prompt user to enter the folder path again.
Override an existing Batch Class:

To override an existing batch class from the exported zip file, one needs to check ‘Use Existing’ checkbox. As soon as the checkbox is checked, UNC Folder textbox will turn into a dropdown containing a list of all existing unc folders as option values. Here user just needs to select one of the unc folders belonging to a batch class which he wants to override.

 

  • Name, Description, Priority textboxes:

Name textbox contains Batch Class type or workflow name (RecostarMailRoom, TesseractMailRoom etc.) of exported Batch Class.

It should always be unique and must not contain space/hyphens.

In Description textbox user can enter the description of the batch class.

Note: Priority textbox is integer bound and should contain values from 1-100 with-

1-25 = Urgent priority

26-50 = High priority

51-75 = Medium priority

75-100 = Low priority.

 

  • Roles, Email Accounts and Batch Class Definition checkboxes.

Roles: If checked, Roles will be picked from the zipped batch class file. Else, Roles will be blank or same as that of existing batch class (in case of override)
Email Accounts: If checked, Email accounts will be picked from the zipped batch class file. Else, Email Accounts will be blank or same as that of existing batch class (in case of override)
Batch Class Definition: If checked, Batch Class Definition will be picked from the zipped batch class file. Else, it will be same as that of existing batch class (to be overridden). It contains three checkboxes Scripts, Folder List and Batch Class Modules.

 

  • Scripts: By default script checkbox is disabled and selected. This contains all the list of scripts of imported batch class.

It will be enabled if one checks ‘Use Existing’ checkbox.

ImportBatchClass scripts.jpg

 

  • Folder List: This contains all the list of folders which were included while importing the batch class. All the mandatory folders are selected and disabled. For non-mandatory folders user can check or uncheck the option based on which corresponding folder will be included or discarded.

ImportBatchClass FolderList.jpg

 

  • Batch Class Modules: By default Batch Class Module checkbox is disabled and selected. This contains all the list of modules of imported batch class.

ImportBatchClass Modules.jpg

 

  • After that, clicking ‘Save’ button will create a new Batch Class with a new Batch Class Identifier.

FAQ

  • Batches are not visible on Batch Instance management and RV screen which run in imported batch class.

Answer: This may happen when roles are not assigned or ‘Roles’ checkbox is left unchecked while importing the batch class. Issue will be solved by assigning roles to the corresponding batch class.

Steps: Go to Batch Class Management screen – Edit the corresponding batch class – Assign roles to the batch class.

 

  • Batches going to error in Page Processing module which runs in imported batch class.

Answer: There are 2 possible reasons for this:

  • Exported zipped batch class file doesn’t contain Lucene and Image classification samples in it.
  • After its import, sample images are put into these folders but their learning is not done.

Issue will be solved either by exporting the batch class again with ‘image-classification-sample’ and ‘lucene-search-classification-sample’ checkboxes checked or by putting sample images in ‘image-classification-sample’ and ‘lucene-search-classification-sample’ folders present in ‘Ephesoft-data’ and clicking ‘Learn Files’ then for the corresponding batch class.

 

Batch Instance Search

Overview

This feature allows a user to search the batch instance on the basis of the batch instance id or name. If for a batch, the search text is contained in either of them, the batch instance will be listed in the search results.

Usability

Search feature is available on the following screens:

Batch Instance Management screen

BatchInstanceManagementScreen.jpg

The above example searched for string “14” in the batch instance id and batch instance name and returns the corresponding results.

 

Batch List screen

BatchListScreen.jpg

The above example searched for string “14” in the batch instance id and batch instance name and returns the corresponding results.

 

Batch Instance Status

Overview

This document defines various batch instance status used to specify the current state of a batch instance.

Batch Instance Status list

Following are the status used during the course of processing of a batch instance:

  • NEW

This status is for a newly created batch. This status shows that a batch has been created but has not yet been picked up for processing. This status can only occur once in the lifetime of a batch instance before its processing is finished.

  • LOCKED

This status is for a batch instance when a lock is taken over it by an executing server so that no other server can start its processing. This is the first thing done on the batch instance after it has been picked up. Once the batch has been locked, further processing can be done over it by the server having its lock held.

  • READY

This status is for a batch instance which has been either restarted or moved out of Review/validation state and is waiting for the pick-up service to start its processing.

  • ERROR

This status is for a batch whose processing has failed, i.e. during its processing some business logic has been defied and hence the batch has been sent to error.

  • FINISHED

This status is for a batch whose processing has been finished and the desired output has been acquired.

  • RUNNING

This status is for a batch which is currently in one of the automatic processing stages. All the stages except for Review and Validation are counted as the automatic processing stages.

  • READY_FOR_REVIEW

This status is for a batch which has reached the “Review_Document” plugin during the course of its processing and now needs to be reviewed manually by the user. During this stage, user can perform various operations on the batch like classifying, splitting, copying, deleting the document etc.

  • READY_FOR_VALIDATION

This status is for a batch which has reached the “Validate_Document” plugin during the course of its processing and now needs to be validated manually by the user. During this stage, user can perform all the operations it can do during the review stage along with the ability to change the value of the document level fields which have been extracted.

  • RESTARTED

This status is for a batch which has been restarted in earlier versions of Ephesoft Application (before 2.4 versions). This status is no longer used in latest version of Ephesoft Application and has only been present to maintain backward compatibility.

  • DELETED

This status is for a batch which has been deleted during the course of its processing.

  • TRANSFERRED

This status is confined for a batch belonging to a batch class which implements “grid computing”. Batch is displayed in ‘Transferred’ status on a source machine when it has been received completely at the remote destination machine but its processing has not started there as yet.

  • RESTART_IN_PROGRESS

This status is for a batch which has been asked to restart and the system is processing on the information on how and from what point the batch will be restarted.

  • REMOTE

This status is confined for a batch belonging to a batch class which implements “grid computing”. Batch is displayed in ‘Remote’ status on a destination machine when it has been received completely from the source machine but its processing has not started here as yet i.e. the batch is still being prepared to be processed on the destination machine.

CMIS Import

Overview

CMIS Import feature downloads files from CMIS server and process them as batches in Ephesoft Application. Using CMIS import user can monitor the CMIS server using a cron job which checks the specified folder for a new file after the specified interval of time. Along with the document, its properties are also downloaded in an xml format. Users can write their own custom scripts to access these properties in the batch being executed.
Batch is created for every file downloaded file from the CMIS server and execute it on the Ephesoft Application.
FORMAT FOR DOWNLOADED XML (containing document properties)

<CmisImport>

<Properties>

<Property>

<Name>Description</Name>

<Value/>

</Property>

<Property>

<Name>Title</Name>

<Value>BI1E_documentDOC2.pdf</Value>

</Property>

</Properties>

</CmisImport>
CMIS Import feature downloads the file having valid file extension and having cmis property configured in the Property column which have the value mention in the Value column. After downloading the file from CMIS server our application updates that property value using new value configured in the New Value property.

Let’s take the example which will help us in understanding the property. CMIS server contains 15 documents but 10 of them are valid as per our confgured file extension. The property is configured as “cm:author” and value is configured as “Ephesoft”, then only that document out of 10 documents which satisfy the cmis property “cm:author” and its value “Ephesoft” will be downloaded by the application and that document cmis property “cm:author” will be updated to New Value configured.

 

Configuration

User can specify the CMIS server configuration in the batch class.

CMIS Import.jpg

 

Configurable property Type of value Value options Description
Server URL String NA URL for making connection to CMIS server

e.g. http://localhost:8090/alfresco/service/cmis

Username String NA User name for authentication to the specified CMIS server.
Password String NA Password for authentication to the specified CMIS server.
Repository ID String NA CMIS server repository ID.
File Extension String Read only Supported file extensions which will get downloaded. In version 3.0, application supports only PDF and tiff files.
Folder String NA Folder name on the CMIS server from where files need to be downloaded.
Property String NA This property is used to specify the cmis property which should be used to download file from CMIS server URL. Valid documents containing this property with the specified value mentioned below will be marked for selection.

e.g. cmis:name, cm:description, cm:title, cm:author

Value String NA This property contains the value for the property mentioned above. This key value pair decides which document will get downloaded.
New Value String NA This specifed the new value to be updated after downloading file from the cmis server of the specified cmis property. This is to ensure that same document doesn’t get downloaded again.

 

Cron job expression

For cron job scheduler: Please update the following property file {Application}\WEB-INF\classes\META-INF\dcma-cmis-import\cmis-import.properties for cron job.

cmisImport.cronxpression=0 0/15 * ? * *

Default value for this property is set to every 15 mins by default.

 

Disabling/Enabling CMIS import functionality

For enabling/disabling CMIS import functionality please uncomment/comment the following line at {Application}\applicationContext.xml

<import resource=”classpath:/META-INF/applicationContext-dcma-cmis-import.xml” />

Default: CMIS import is disabled.

 

Screenshots for Configuration

Screenshot for CMIS folder:

CMIS Folder.jpg
Screenshot for CMIS document:

CMIS Document.jpg
Screenshot for CMIS properties:

CMIS Properties.jpg

Screenshot for CMIS repository information:

CMIS Repository.jpg

URL for fetching repository information in alfresco:

http://{server}:{port}/alfresco/service/cmis/index.html

Troubleshooting

S no. Error message Possible root cause
1 Unable to connect to the server Invalid configuration being used for making connection to the cmis server.
2 Error while generating cmis properties xml
  • Either {Ephesoft Application} is not access to write the properties on the disk.
  • Either network path is unable to connect while writing the file.

Document Info Display

Overview

Document Tree info on the Review-Validation Screen is now configurable based on a new tag introduced in the batch xml. This tag is named as “Document Display Info”.

Whatever is the value set in this tag, that value is displayed in document tree. The value of the tag can be confidence score, confidence threshold, document name, document description or any customer specific data as well.

DocumnetDisplayInfoTag.jpg

DocumentDisplayInfoScreen.jpg

If no value is provided for ‘<DocumentDisplayInfo>’ than default value will be shown which is currently ‘Document Type Name’ i.e. “Unknown” for the above example.

 

Advantages

  • Users can customize the display information by manipulating the batch xml using custom scripts.
  • Customer specific information can also be displayed.

Dynamic Workflow

Overview

This feature allows the user to create a customize workflow dynamically. I.e. the user will be capable of adding/removing/ordering any module/plugin in the workflow. After alteration in the workflow, the user will be allowed to deploy these changes made to the workflow only after it has validated the workflow by fulfilling the dependencies of individual plugin.

Configuration

Configure Modules

To configure the modules of a particular batch class, the user needs to follow the following are the steps:

  • Choose the batch class for which the user wants to change the workflow from “Batch Class Management” screen and go into its edit options.

BatchClassManagement ModuleTab.jpg

  • Under the “Modules” tab in the Edit view of that batch class, there is a button “Configure” on the top right corner. Clicking on that button takes the user to a screen where they can add/delete/re-order any module.

ConfigureModules.jpg

  • In this view, the user can see the following:
    • Selected Modules”: the list of selected modules for the workflow.
    • Available Modules”: the list of available modules.
    • Add New Module” button: to add a new module to the available modules list.
    • Remove” button: to remove any of selected modules from the selected modules list.
    • Add” button: to add any selected module from available modules list to selected modules list. By default the added module will be placed at the bottom of selected modules list.
    • Up” button: to move up in order any selected module in selected modules list. The user can select multiple modules at once and each module will be moved one place up each time the button is clicked.
    • Down” button: to move down in order any selected module in selected modules list. The user can select multiple modules at once and each module will be moved one place down each time the button is clicked.
    • Ok” button: to apply changes locally.
    • Reset” button: to reset the state of selected modules list.
    • Cancel” button: to cancel the module configuring action and move to previous screen.
  • Any newly added module would initially be empty.

Configure plugins

Likewise the module configure functionality, there is a plugin configure functionality. This functionality allows the user to add/delete/re-order the plugins in a particular module of a batch class.

To configure plugins for a particular module of a batch class, the user needs to follow the following steps:

  • Select any particular module of a batch class.

PluginListingTab.jpg

  • Under the “Plugins Listing” tab in the Edit view of that Module, there is a button “Configure” on the top right corner. Clicking on that button takes the user to a screen where they can add/delete/re-order any plugin.

ConfigurePlugins.jpg

 

  • The functionality of above view is similar to the “Configure Modules” view. With “Add”, “Remove”, “Up”, “Down”, “Ok”, “Reset” and “Cancel” buttons having the common functionalities from “Configure Modules” view.
  • Apart from the common functionality, below is the additional functionalities for the “Configure Plugins” view are:
  • Dependency highlight
    • Whenever a plug-in is selected in the available list (currently CMIS EXPORT), all its dependencies will be highlighted (currently CREATE MULTIPAGE FILES) in the same list.

ConfigurePluginsDependency.jpg

  • Warning on plugin addition
    • While adding the plug-in to the selected plugins list using the add button, if all the dependencies of the plugin are NOT already present in selected plugins list, following pop up will be displayed.

WarningPluginAddition.jpg

    • In the above pop-up:
      • Yes: pressing this button will add all the dependencies of the plugin along with it to the selected plugins list.
      • No: pressing this button will just add the selected plugin to the selected plugins list ignoring the dependencies.
      • Cancel: pressing this button will cancel the operation.

Validate and Deploy workflow

Validate

This button is present in center bottom portion of the batch class edit view. Pressing this button will check all the rules to be applied on the selected plug-ins. If no violations are found, pop-up will say “Dependencies Validated Successfully” and the “Deploy Workflow” button will now be enabled.

ValidateDependency.jpg

And if there is any violation of dependencies among plugins, the first violation will be reported in the pop up. And “Deploy Workflow” button will remain disabled.

DependencyViolated.jpg

Deploy Workflow

This button is present in center bottom portion of the batch class edit view. Pressing this button will be initially disabled. It will be enabled after having a confirmation from the validate button that the complete workflow has been validated. On successful deployment of batch class, below pop-up is shown.

DeployWorkflow.jpg

Notes

  • The user needs to deploy the batch class each time it makes any change in the workflow by configuring modules or plugins.
  • Validate and Deploy workflow buttons will be disabled while the user is on either of the configure modules or configure plugins view.
  • User can only deploy a validated batch class.
  • Saving a batch class using “Save” or “Apply” button will not deploy the batch class. But deploying a batch class using “Deploy Workflow” button will 1st perform the save batch class function and will then deploy the batch class.

E-mail Import

Overview

This plug-in is responsible for importing the documents present in a defined form from the user’s mail account. User is allowed to configure any mail account as well as the type of documents which the plug-in will support. This configuration is done per batch class. Multiple email accounts can be setup for each batch class.

Configuration

Mail configuration

EmailConfiguration.jpg

Following are the configurable mail account properties:

 

Configurable property Type of value Value options Description
Username String A valid email account username. The user account name to be configured with Ephesoft on which the Email Import service will keep a watch.
Password String Corresponding password for the configured username Password for the configured user account.
Server name String A valid mail server name The name of the mail server to which the configured user account belongs.
Server type String A valid mail server type The type of the mail server to which the configured user account belongs.
Folder Name String A valid and existing mail folder name The name of the mail folder on which the Ephesoft Email import will be checking
Is SSL Check Box
  • Checked
  • Unchecked

 

The property that defines whether application will be connecting to mail server using the SSL settings or Non-SSL.
Port number Integer A valid port number The port number on which the configured mail server type will work.

Configurable Properties file

  • <Ephesoft installation directory>\ Application\WEB-INF\classes\META-INF\dcma-mail-import\mail-import. properties:

 

Configurable property Type of value Value options Description
dcma.importMail.cronExpression String A valid cron expressions The CRON expression defining the look up time for the plug-in, i.e. at what time the plug-in looks for any updates in the configured mail account.
dcma.supported.attachment.extension String List of valid file extensions Defines the supported documents by the plug-in. Multiple entries are separated by a “;”.
  • <Ephesoft installation directory>\ Application\WEB-INF\classes\META-INF\dcma-mail-import\open-office.properties:

 

Configurable property Type of value Value options Description
openoffice.serverUrl List of values * ON

  • OFF

 

Server used for connecting to the remote open office server instance. Used in case of connecting to external/remote service.
openoffice.serverPort Integer A valid and available port number. Port number used for connecting to the open office server instance. Default port is 8100
openoffice.autoStart Boolean * True

  • False

 

If the open office server should be started / connected upon XE starts. Default value is false.
openoffice.homePath String N-A Path to open office installation. If no path is provided, a default value will be calculated based on the operating environment.
openoffice.maxTasksPerProcess Integer Any valid integer value. Maximum number of simultaneous conversion tasks to be handled by a single open office process. Default value for optimized performance is 50.
openoffice.taskExecutionTimeout Integer Timeout for conversion tasks (in milliseconds). Default value for optimized performance is 30 seconds.

Characteristics

  • The functionality/service allows the user to set up any number of mail accounts for gathering data.
  • The user is allowed to configure the account via UI.
  • The functionality/service can support multiple document formats.
  • The functionality/service makes use of the open-office to convert the received data files into application usable formats.
  • The functionality/service is capable of downloading and saving the attachments of a mail.

Steps of execution/working

  • When the plug-in properties have been set up properly, Ephesoft moves ahead with mail downloading by accessing the mail account.
  • Email import service reads the user’s mail configuration from the database, and tries to access the user’s mail account using the configured settings.
  • If the service is able to connect to the user account, it reads all the mails contained in the configured folder.
  • After the service has read the mails, it starts processing multiple mails at a time.
  • Each read mail goes through a three step procedure of processing, downloading, converting and creating a batch for the mail.
  • If any error occurs processing of a mail, the service sends notification mail to mail accounts configured for notification.

Troubleshooting

Following are few common error messages received due to mal-functioning of the plugin:

S no. Error message Possible root cause
1 Unable to convert Email into PDF file. Open office service is either not running or have not been configured correctly
2 Error in Server Type Configuration, only imap/pop3 is allowed. Plug-in only supports imap or pop3 server type. Check the user’s account configuration.
3 Not able to establish connection. Connection could not be established for the current user’s account configuration.
4 Could not find port number. Trying with default value of 995. Port number specified in the user’s configuration is invalid, hence plug-in tries to connect on the default pop3 port.
5 Could not find port number. Trying with default value of 993. Port number specified in the user’s configuration is invalid, hence plug-in tries to connect on the default imap port.
6 Error while reading mail contents Either email body or other attachments could not be read and converted
7 Not able to process the mail reading. Some error in reading the contents of mail. Open-office could not convert the source file into desired.

Multi-Server Deployment

Overview

This feature allows user to set up multi server environment for Ephesoft. By using multi server environment, two or more servers can run at the same time having shared database and shared folders.

This feature helps the user to increase the throughput via processing the batches using multiple servers.

Steps to setup Multi-Server environment for Ephesoft

User can install Ephesoft through installer on all the machines by following mentioned steps:

At the time of installing Ephesoft on the first machine

  • User need to enter the information as described in below screenshots:
  • On database configuration screen, user should enter the information in below format:

The same information, entered here, should be used while installing Ephesoft on other machines.
MSSQL Configuration.jpg

 

  • Select “No” option on shared folder configuration screen.

SharedFolderConfiguration.jpg

 

  • The path of shared folders should be such that it should be shared over the network, so that it can be accessed by the other machines also.

For example: [\\server_name\path_to_shared_folders]

DestinationFolder.jpg

 

  • Now complete the installation by following the standard steps.

Note*: The installation path should not contain white spaces.

At the time of installing Ephesoft on the other machines

  • User needs to enter the information as described in screenshots below:
  • Database information needs to be same as for first install.

MSSQL Configuration Multiserver.jpg

 

  • Select “Yes” option on shared folder configuration screen.

SharedFolderConfiguration Multiserver.jpg

 

  • On destination folder configuration screen, user should enter the path of shared folder same as entered while installing on first machine.

DestinationFolder Multiserver.jpg

 

  • Now complete the installation by following the standard steps.

Other Configurations

  • User should enable Folder Monitor Service on one of the Ephesoft server. To do this, user should go to following folder <Ephesoft Installation Directory>\Application. In this directory, comment the folder monitor service inapplicationContext.xml file, on the servers where folder monitor service should not run.

<import resource=”classpath:/META-INF/applicationContext-folder-monitor.xml” />

  • The cron expression will have different values at different machines for following cron jobs at

{Application}\WEB-INF\classes\META-INF\dcma-workflows\dcma-workflows.properties

dcma.pickup.cronjob.expression=15 0/1 * ? * *

dcma.resume.cronjob.expression=15 0/1 * ? * *
e.g.

dcma.pickup.cronjob.expression1=15 0/1 * ? * *

dcma.resume.cronjob.expression1=15 0/1 * ? * *

dcma.pickup.cronjob.expression2=45 0/1 * ? * *

dcma.resume.cronjob.expression2=45 0/1 * ? * *

 

  • All the machines running in multi-server environment should be verified by a single Ephesoft license, installed on an Ephesoft server. To do so, following steps need to be followed:
  • License server should be commented only on the machines where the license server is not running. This should be done in applicationContext.xml

<import resource=”classpath:/META-INF/applicationContext-license-server.xml” />

  • The license server host configuration should be changed in the machines where the license server is not running i.e. they all should refer to the machine with Ephesoft license installed and license server running. This is done by changing the ephesoft.license.server.host property in license-client.properties file, to the IP address of the machine on which license server is running.

Sample properties file:

Location: META-INFephesoft-license-client license-client.properties

How to Change Ephesoft’s Port Number

Sometimes it is necessary to change the port that Ephesoft operates on. This is to prevent conflicts with other programs that are using port: 8080

Shut down Ephesoft if it is currently running.

  • Navigate to the web.xml file found at <Ephesoft Installation Directory >\Application\WEB-INF\web.xml

User needs to change this value from:

<context-param>

<param-name>port</param-name>

<param-value>8080</param-value>

</context-param>

To this 🙁 or the desired port number)

<context-param>

<param-name>port</param-name>

<param-value>8090</param-value>

</context-param>

  • Navigate to dcma-batch.properties

Found at <Ephesoft Installation Directory >\Application\WEB-INF\classes\META-INF\dcma-batch\dcma-batch.properties
Then proceed to change this from:

batch.base_http_url=http://localhost:8080/dcma-batches

to this:

batch.base_http_url=http://localhost:8090/dcma-batches

  • Navigate to server.xml file

Found at <Ephesoft Installation Directory >\Ephesoft\JavaAppServer\conf\server.xml

Change the highlighted value below to match the port number in the previous files (8090 or the desired port number)

<Connector port=”8080″ protocol=”HTTP/1.1″

connectionTimeout=”20000″

redirectPort=”8443″ />
Note: The easiest way to do this is to do a find/replace for “8080” and replace all cases of 8080 with 8090 (or the desired port number).

Change this:

<Server port=”8005″ shutdown=”SHUTDOWN”>

To this:

<Server port=”8006″ shutdown=”SHUTDOWN”>

Also change this:

<Connector port=”8009″ protocol=”AJP/1.3″ redirectPort=”8443″ />
To this: (or desired port)

<Connector port=”8019″ protocol=”AJP/1.3″ redirectPort=”8443″ />

Finally, restart Ephesoft.

Ephesoft Web Service

Overview

This document gives detailed explanation of web services exposed by Ephesoft application.

Authenticated client calls code sample

Here is the code for making authenticated client calls via Ephesoft Web Services:-

Credentials defaultcreds = new UsernamePasswordCredentials (“username”, “password”);

client.getState().setCredentials(new AuthScope(“serverName”, 8080), defaultcreds);

client.getParams().setAuthenticationPreemptive(true);

List of API’s exposed in Ephesoft Product

Image Processing Web Service

createSearchablePDF

This API will generate the searchable pdf. It takes the input tif/tiff files and rsp file for processing. Input parameters will used to specify the output pdf is searchable or color.

Web Service urlhttp://{serverName}:{port}/dcma/rest/createSearchablePDF

Input Parameter Values Descriptions
isColorImage Either “true”/”false” Generates the color pdf if input image is color and value is “true”.
isSearchableImage Either “true”/”false” Generates the searchable pdf if value is “true”.
outputPDFFileName String value should ends with .pdf extension Output pdf file name generated using API.
projectFile String value should ends with .rsp extension RSP file used as recostar processing.

Checklist:

  1. Input only tiff, tif files for generating searchable pdf.
  2. RSP file is mandatory for generating the searchable pdf.

Sample Input Used:

ephesoft-web-services\create-searchable-pdf.zip
Sample client code using apache commons http client:-

private static void createSearchablePDF() {

HttpClient client = new HttpClient();

// URL for webservice of create searchable pdf

String url = “http://localhost:8080/dcma/rest/createSearchablePDF“;

PostMethod mPost = new PostMethod(url);
// adding file for sending

// Adding tif images for processing

File file1 = new File(“C:\\sample\\sample1.tif”);

File file2 = new File(“C:\\sample\\sample2.tif”);

File file3 = new File(“C:\\sample\\sample3.tif”);

File file4 = new File(“C:\\sample\\sample4.tif”);

// Adding rsp file for recostar for processing

File file5 = new File(“C:\\sample\\Fpr.rsp”);
Part[] parts = new Part[9];

try {

parts[0] = new FilePart(file1.getName(), file1);

parts[1] = new FilePart(file2.getName(), file2);

parts[2] = new FilePart(file3.getName(), file3);

parts[3] = new FilePart(file4.getName(), file4);

parts[4] = new FilePart(file5.getName(), file5);

// adding parameter for color switch

parts[5] = new StringPart(“isColorImage”, “false”);

// adding parameter for searchable switch

parts[6] = new StringPart(“isSearchableImage”, “true”);

// adding parameter for outputPDFFileName

parts[7] = new StringPart(“outputPDFFileName”, “OutputPDF.pdf”);

// adding parameter for projectFile

parts[8] = new StringPart(“projectFile”, “Fpr.rsp”);
MultipartRequestEntity entity = new MultipartRequestEntity(parts, mPost.getParams());

mPost.setRequestEntity(entity);
int statusCode = client.executeMethod(mPost);

if (statusCode == 200) {

InputStream inputStream = mPost.getResponseBodyAsStream();

// output file path for saving result

String outputFilePath = “C:\\sample\\serverOutput.zip”;

// retrieving the searchable pdf file

File file = new File(outputFilePath);

FileOutputStream fileOutputStream = new FileOutputStream(file);

try {

byte[] buf = new byte[1024];

int len = inputStream.read(buf);

while (len > 0) {

fileOutputStream.write(buf, 0, len);

len = inputStream.read(buf);

}

finally {

if (fileOutputStream != null) {

fileOutputStream.close();

}

}

System.out.println(“Web service executed successfully.”);

else if (statusCode == 403) {

System.out.println(“Invalid username/password.”);

else {

System.out.println(mPost.getResponseBodyAsString());

}

catch (FileNotFoundException e) {

System.err.println(“File not found for processing.”);

catch (HttpException e) {

e.printStackTrace();

catch (IOException e) {

e.printStackTrace();

finally {

if (mPost != null) {

mPost.releaseConnection();

}

}

}

convertTiffToPdf

This API will generate the pdf for the input tiffs. If 5 input tiffs are provided then 5 pdf will be return using this api. This API will have following parameter for configuration.
Web Service URL : http://{serverName}:{port}/dcma/rest/convertTiffToPdf

Input Parameter Values Descriptions
inputParams This value can be empty. Reference for image magick parameter. http://www.imagemagick.org/script/command-line-options.php This are the image magick input parameters used for processing the input and output file.
outputParams This value can be empty. Reference for image magick parameter. http://www.imagemagick.org/script/command-line-options.php This are the image magick output parameters used for optimizing the output file.
pdfGeneratorEngine Either “IMAGE_MAGICK”/”ITEXT” This will used for pdf generator engine.

Checklist:

  1. Input only tiff, tif files for generating pdf.
  2. If pdfGeneratorEngine is “IMAGE_MAGICK”, than only input params and output params are works.
  3. If Input tiff is multipage tiff than single multipage pdf is generated as output.

Sample Input Used:

ephesoft-web-services\convert-tiff-to-pdf.zip

Sample client code using apache commons http client:-

private static void convertTiffToPdf() {

HttpClient client = new HttpClient();
String url = “http://localhost:8080/dcma/rest/convertTiffToPdf“;

PostMethod mPost = new PostMethod(url);
// adding image file for processing.

File file1 = new File(“C:\\sample\\sample1.tif”);

File file2 = new File(“C:\\sample\\sample2.tif”);
Part[] parts = new Part[5];
try {

parts[0] = new FilePart(file1.getName(), file1);

parts[1] = new FilePart(file2.getName(), file2);

// adding parameter for input params

parts[2] = new StringPart(“inputParams”, “”);

// adding parameter for output params

parts[3] = new StringPart(“outputParams”, “”);

// adding parameter for pdfGeneratorEngine

parts[4] = new StringPart(“pdfGeneratorEngine”, “IMAGE_MAGICK”);
MultipartRequestEntity entity = new MultipartRequestEntity(parts, mPost.getParams());

mPost.setRequestEntity(entity);

int statusCode = client.executeMethod(mPost);

if (statusCode == 200) {

System.out.println(“Web service executed successfully..”);

InputStream in = mPost.getResponseBodyAsStream();

// output file path for saving results.

String outputFilePath = “C:\\sample\\serverOutput.zip”;

// retrieving the searchable pdf file

File f = new File(outputFilePath);

FileOutputStream fos = new FileOutputStream(f);

try {

byte[] buf = new byte[1024];

int len = in.read(buf);

while (len > 0) {

fos.write(buf, 0, len);

len = in.read(buf);

}

finally {

if (fos != null) {

fos.close();

}

}

else if (statusCode == 403) {

System.out.println(“Invalid username/password..”);

else {

System.out.println(mPost.getResponseBodyAsString());

}

catch (FileNotFoundException e) {

System.err.println(“File not found for processing.”);

catch (HttpException e) {

e.printStackTrace();

catch (IOException e) {

e.printStackTrace();

finally {

if (mPost != null) {

mPost.releaseConnection();

}

}

}

splitMultipageFile

This API will break the pdf and multipage tiff into single page tiff. This will use the image magick and ghost script for splitting the input file. This API will have following parameter for configuration.
Web Service URL : http://{serverName}:{port}/dcma/rest/splitMultipageFile

Input Parameter Values Descriptions
inputParams For Image MagicK:This value can be empty. Reference for image magick parameter. http://www.imagemagick.org/script/command-line-options.phpFor Ghost Script:This value should not be empty. Reference for ghost script input parameter :[#Output_device http://ghostscript.com/doc/8.54/Use.htm#Output_device] This parameter will used for both image magick and ghost script.
outputParams For Image MagicK:This value can be empty. Reference for image magick parameter. http://www.imagemagick.org/script/command-line-options.php This are the image magick output parameters used for optimizing the output file.
isGhostscript Either “true”/”false” This parameter is used to specified the weather ghost script is using for breaking the pdf/multipage tiff into single page tiff.

Checklist:

  1. Input only tiff and pdf file only.
  2. If “isGhostscript” is “true”, than only input params will works and file only break PDF files.
  3. If “isGhostscript” is “false”, than input params and output params will works.

Sample Input Used:

ephesoft-web-services\split-multipage-file.zip

Sample client code using apache commons http client:-

private static void splitMultiPageFile() {

HttpClient client = new HttpClient();

String url = “http://localhost:8080/dcma/rest/splitMultipageFile“;

PostMethod mPost = new PostMethod(url);
File file1 = new File(“C:\\sample\\sample.pdf”);

File file2 = new File(“C:\\sample\\sample.tif”);
Part[] parts = new Part[5];

try {

parts[0] = new FilePart(file1.getName(), file1);

parts[1] = new FilePart(file2.getName(), file2);
parts[2] = new StringPart(“inputParams”, “gswin32c.exe -dNOPAUSE -r300 -sDEVICE=tiff12nc -dBATCH”);

parts[3] = new StringPart(“isGhostscript”, “true”);

parts[4] = new StringPart(“outputParams”, “”);
MultipartRequestEntity entity = new MultipartRequestEntity(parts, mPost.getParams());

mPost.setRequestEntity(entity);
int statusCode = client.executeMethod(mPost);

if (statusCode == 200) {

InputStream in = mPost.getResponseBodyAsStream();

File file = new File(“C:\\sample\\serverOutput.zip”);

FileOutputStream fos = new FileOutputStream(file);

try {

byte[] buf = new byte[1024];

int len = in.read(buf);

while (len > 0) {

fos.write(buf, 0, len);

len = in.read(buf);

}

finally {

if (fos != null) {

fos.close();

}

}

System.out.println(“Web service executed successfully..”);

else if (statusCode == 403) {

System.out.println(“Invalid username/password..”);

else {

System.out.println(mPost.getResponseBodyAsString());

}

catch (FileNotFoundException e) {

System.err.println(“File not found for processing..”);

catch (HttpException e) {

e.printStackTrace();

catch (IOException e) {

e.printStackTrace();

finally {

if (mPost != null) {

mPost.releaseConnection();

}

}

}

createMultipageFile

This API will create the multipage tif/pdf using “Image MagicK”, “IText” and “GhostScript”. This API works only for tif/tiff files and provided XML file for input parameters. This API will have following parameter for configuration.
Web Service URL :

http://{serverName}:{port}/dcma/rest/createMultiPageFile

 

Input Parameter Values Descriptions
imageProcessingAPI Either “IMAGE_MAGICK” /”GHOSTSCRIPT”/”ITEXT” This parameter is used for generating pdf using image_magick , itext and ghost script.
pdfOptimizationParams This value should not be empty. Reference for ghost script input parameter :[#Output_device http://ghostscript.com/doc/8.54/Use.htm#Output_device] This are the ghost script output parameters used for optimizing the output file.
multipageTifSwitch Either “ON”/”OFF” This parameter is used for generating multipage tif along with multipage pdf.
pdfOptimizationSwitch Either “ON”/”OFF” This switch is used for generated optimized pdf.
ghostscriptPdfParameters This value should not be empty. Reference for ghost script input parameter :[#Output_device http://ghostscript.com/doc/8.54/Use.htm#Output_device] This are the ghost script parameter used for creating multipage pdf.

Checklist:

  1. Input only tiff file for processing and xml file for inputs.
  2. If “imageProcessingAPI” is “GHOSTSCRIPT”, than only ghostscriptPdfParameters will works.
  3. If “pdfOptimizationSwitch” is “ON”, than pdfOptimizationParams will works.

Sample Input Used:

ephesoft-web-services\ create-multipage-file.zip

Format for XML:
<WebServiceParams>

<Params>

<Param>

<Name>imageProcessingAPI</Name>

<Value>GHOSTSCRIPT</Value>

</Param>
<Param>

<Name>pdfOptimizationSwitch</Name>

<Value>on</Value>

</Param>

<Param>

<Name>pdfOptimizationParams</Name>

<Value>-q -dNODISPLAY -P- -dSAFER -dDELAYSAFER — pdfopt.ps</Value>

</Param>
<Param>

<Name>multipageTifSwitch</Name>

<Value>on</Value>

</Param>

<Param>

<Name>ghostscriptPdfParameters</Name>

<Value>-dQUIET -dNOPAUSE -r300 -sDEVICE=pdfwrite -dBATCH</Value>

</Param>

</Params>

</WebServiceParams>
Sample client code using apache commons http client:-

private static void createMultiPage() {

HttpClient client = new HttpClient();
String url = “http://localhost:8080/dcma/rest/createMultiPageFile“;

PostMethod mPost = new PostMethod(url);
// Adding XML file for parameters

File file1 = new File(“C:\\sample\\WebServiceParams.xml”);

// Adding tif file for processing

File file2 = new File(“C:\\sample\\sample1.tif”);

File file3 = new File(“C:\\sample\\sample2.tif”);
Part[] parts = new Part[3];

try {

parts[0] = new FilePart(file1.getName(), file1);

parts[1] = new FilePart(file2.getName(), file2);

parts[2] = new FilePart(file3.getName(), file3);
MultipartRequestEntity entity = new MultipartRequestEntity(parts, mPost.getParams());

mPost.setRequestEntity(entity);

int statusCode = client.executeMethod(mPost);

if (statusCode == 200) {

InputStream inputStream = mPost.getResponseBodyAsStream();

// Retrieving file from result

File file = new File(“C:\\sample\\serverOutput.zip”);

FileOutputStream fos = new FileOutputStream(file);

try {

byte[] buf = new byte[1024];

int len = inputStream.read(buf);

while (len > 0) {

fos.write(buf, 0, len);

len = inputStream.read(buf);

}

finally {

if (fos != null) {

fos.close();

}

}

System.out.println(“Web service executed successfully..”);

else if (statusCode == 403) {

System.out.println(“Invalid username/password..”);

else {

System.out.println(statusCode + ” *** ” + mPost.getResponseBodyAsString());

}

catch (FileNotFoundException e) {

System.err.println(“File not found for processing..”);

catch (HttpException e) {

e.printStackTrace();

catch (IOException e) {

e.printStackTrace();

finally {

if (mPost != null) {

mPost.releaseConnection();

}

}

}

Classification Web Service

classifyImage

This API classifies the input image as per batch class identifier provided. This API will depend on the three plugin for completion “CREATE_THUMBNAILS_PLUGIN”, “CLASSIFY_IMAGES_PLUGIN” and “DOCUMENT_ASSEMBLER_PLUGIN”. If any batch class doesn’t have those plugin than classify image api will not work.

Web Service URL : http://{serverName}:{port}/dcma/rest/classifyImage

 

Input Parameter Values Descriptions
batchClassId This value should not be empty and it should be batch class identifier as like BC1. This parameter is used for providing batch class identifier on which classify image will perform.

Sample Input Used:

ephesoft-web-services\classify-image.zip

Checklist:

  1. Input file should be single page tif/tiff file only.
  2. batchClassId should be valid batch class identifier and must have the “CREATE_THUMBNAILS_PLUGIN”, “CLASSIFY_IMAGES_PLUGIN” and “DOCUMENT_ASSEMBLER_PLUGIN”.

Sample client code using apache commons http client:-

private static void classifyImage() {

HttpClient client = new HttpClient();
String url = “http://localhost:8080/dcma/rest/classifyImage“;

PostMethod mPost = new PostMethod(url);
// Adding tif file for processing

File file1 = new File(“C:\\sample\\US-Invoice.tif”);
Part[] parts = new Part[2];

try {

parts[0] = new FilePart(file1.getName(), file1);

// Adding parameter for batchClassId

parts[1] = new StringPart(“batchClassId”, “BC1”);
MultipartRequestEntity entity = new MultipartRequestEntity(parts, mPost.getParams());

mPost.setRequestEntity(entity);

int statusCode = client.executeMethod(mPost);
if (statusCode == 200) {

System.out.println(“Web service executed successfully..”);

String responseBody = mPost.getResponseBodyAsString();

System.out.println(statusCode + ” *** ” + responseBody);

else if (statusCode == 403) {

System.out.println(“Invalid username/password..”);

else {

System.out.println(mPost.getResponseBodyAsString());

}

catch (FileNotFoundException e) {

System.err.println(“File not found for processing..”);

catch (HttpException e) {

e.printStackTrace();

catch (IOException e) {

e.printStackTrace();

finally {

if(mPost != null) {

mPost.releaseConnection();

}

}

}

classifyHOCR

This API will classify the input HOCR as per batch class identifier provided. This API will depend on the following plugins “SEARCH_CLASSIFICATION_PLUGIN”, “DOCUMENT_ASSEMBLER_PLUGIN” and the learning done on the batch class. If any batch class doesn’t have those plugins than classify hocr will not work.

Web Service URL : http://{serverName}:{port}/dcma/rest/classifyHOCR

 

Input Parameter Values Descriptions
batchClassId This value should not be empty and it should be batch class identifier as like BC1. This parameter is used for providing batch class identifier on which classify HOCR will perform.

Checklist:

  1. Input file should be html file only.
  2. batchClassId should be valid batch class identifier and must have the “SEARCH_CLASSIFICATION_PLUGIN” and “DOCUMENT_ASSEMBLER_PLUGIN”.

Sample Input Used:

ephesoft-web-services\classify-hocr.zip

Sample client code using apache commons http client:-

private static void classifyHocr() {

HttpClient client = new HttpClient();
String url = “http://localhost:8080/dcma/rest/classifyHocr“;

PostMethod mPost = new PostMethod(url);
// Adding HTML file for processing

File file1 = new File(“C:\\sample\\US-Invoice.html”);
Part[] parts = new Part[2];

try {

parts[0] = new FilePart(file1.getName(), file1);

// Adding parameter for batchClassId

parts[1] = new StringPart(“batchClassId”, “BC1”);
MultipartRequestEntity entity = new MultipartRequestEntity(parts, mPost.getParams());

mPost.setRequestEntity(entity);

int statusCode = client.executeMethod(mPost);
if (statusCode == 200) {

System.out.println(“Web service executed successfully..”);

String responseBody = mPost.getResponseBodyAsString();

System.out.println(statusCode + ” *** ” + responseBody);

else if (statusCode == 403) {

System.out.println(“Invalid username/password..”);

else {

System.out.println(mPost.getResponseBodyAsString());

}

catch (FileNotFoundException e) {

System.err.println(“File not found for processing..”);

catch (HttpException e) {

e.printStackTrace();

catch (IOException e) {

e.printStackTrace();

finally {

if (mPost != null) {

mPost.releaseConnection();

}

}

}

classifyMultiPageHOCR

This API will classify the input HOCR as per batch class identifier provided. This API will depend on the following plugins “SEARCH_CLASSIFICATION_PLUGIN”, “DOCUMENT_ASSEMBLER_PLUGIN” and the learning done on the batch class. If any batch class doesn’t have those plugins than classify hocr will not work.

Web Service URL : http://{serverName}:{port}/dcma/rest/classifyHOCR

 

Input Parameter Values Descriptions
batchClassId This value should not be empty and it should be batch class identifier as like BC1. This parameter is used for providing batch class identifier on which classify HOCR will perform.

Checklist:

  1. Input file should be zip file containing HTML’s in it.
  2. batchClassId should be valid batch class identifier and must have the “SEARCH_CLASSIFICATION_PLUGIN” and “DOCUMENT_ASSEMBLER_PLUGIN”.

Sample client code using apache commons http client:-

private static void classifyMultiPageHocr() {

HttpClient client = new HttpClient();
String url = “http://localhost:8080/dcma/rest/classifyMultiPageHocr“;

PostMethod mPost = new PostMethod(url);
// Adding ZIP file for processing

File file1 = new File(“D:\\sample\\New folder.zip”);
Part[] parts = new Part[2];

try {

parts[0] = new FilePart(file1.getName(), file1);

// Adding parameter for batchClassId

parts[1] = new StringPart(“batchClassId”, “BC1”);
MultipartRequestEntity entity = new MultipartRequestEntity(parts, mPost.getParams());

mPost.setRequestEntity(entity);

int statusCode = client.executeMethod(mPost);

String responseBody = mPost.getResponseBodyAsString();

System.out.println(statusCode + “***” + responseBody);

mPost.releaseConnection();

catch (FileNotFoundException e) {

e.printStackTrace();

catch (HttpException e) {

e.printStackTrace();

catch (IOException e) {

e.printStackTrace();

}

}

classifyBarcodeImage

This API is used to classify the input image as per specified batch class. Image file should have barcode and barcode value should be document type which is present in the batch class.

Web Service URLhttp://{serverName}:{port}/dcma/rest/classifyBarcodeImage

 

Input Parameter Values Descriptions
batchClassId This value should not be empty and it should be batch class identifier as like BC1. This parameter is used for providing batch class identifier on which classify HOCR will perform.

Checklist:

  1. Input file should be tif/tiff file only.
  2. batchClassId should be valid batch class identifier and must have the “BARCODE_READER_PLUGIN” .

Sample client code using apache commons http client:-

private static void classifyBarcodeImage(){

HttpClient client = new HttpClient();
String url = “http://locahost:8080/dcma/rest/classifyBarcodeImage“;

PostMethod mPost = new PostMethod(url);
// Adding image file for processing the barcode classification

File file1 = new File(“C:\\sample\\US-Invoice.tif”);

Part[] parts = new Part[2];

try {

parts[0] = new FilePart(file1.getName(), file1);

// Adding batchClassId for which barcode classification to be perform.

parts[1] = new StringPart(“batchClassId”, “BC1”);

MultipartRequestEntity entity = new MultipartRequestEntity(parts, mPost.getParams());

mPost.setRequestEntity(entity);

int statusCode = client.executeMethod(mPost);

if (statusCode == 200) {

System.out.println(“Web service executed successfully..”);

String responseBody = mPost.getResponseBodyAsString();

System.out.println(statusCode + ” *** ” + responseBody);

else if (statusCode == 403) {

System.out.println(“Invalid username/password..”);

else {

System.out.println(mPost.getResponseBodyAsString());

}

catch (FileNotFoundException e) {

System.err.println(“File not found for processing..”);

catch (HttpException e) {

e.printStackTrace();

catch (IOException e) {

e.printStackTrace();

finally {

if (mPost != null) {

mPost.releaseConnection();

}

}

}

Extraction Web Service

extractKV

This API will extract the document level fields for the corresponding Key Value pattern provided using input XML. This API will take the HOCR file as input. If the Key Value pattern is not found in the HOCR file than it will create the empty document level fields.

Web Service URL : http://{serverName}:{port}/dcma/rest/extractKV

Batch Class List >>Recostar Mail Room [BC1] >>Application-Checklist >>Invoice Date >>New KV Extraction
KV Extraction.jpg

 

Input Parameter Values Descriptions
AdvancedKV Either “true”/”false” This parameter is used to specifying the KeyValue extraction is perform by advanced key value or not.
LocationType This value should be one of the following:TOPRIGHTLEFTBOTTOMTOP_RIGHTTOP_LEFT,BOTTOM_LEFTBOTTOM_RIGHT This parameter will fetch the Value pattern of the particular key pattern on the location provided.
NoOfWords Should be Integer This parameter is used for specify in case of AdvancedKV is false. This parameter is used for adding number word of RIGHT location in the result of the value pattern found in the HOCR.
KeyPattern This value should not be empty.This value should be valid regex expression. This is used for verify the Key pattern present in given HOCR.
ValuePattern This value should not be empty.This value should be valid regex expression. This is used for verify the Value pattern present in given HOCR for that particular Key Pattern.
KVFetchValue This value should be one of the following:ALLFIRSTLAST This parameter is used to specify whether application needs to fetch all, first or last value pattern found.
Multiplier This value should be float and should be in between 0 to 1 This value is used to multiply with confidence for updating the confidence of the fields extracted using advanced KV.
Length This value should be integer For getting length value use Ephesoft Admin Screen as display screen shot above
Width This value should be integer For getting width value use Ephesoft Admin Screen as display screen shot above
Xoffset This value should be integer For getting xoffset value use Ephesoft Admin Screen as display screen shot above
Yoffset This value should be integer For getting yoffset value use Ephesoft Admin Screen as display screen shot above
hocrFileName This value should be string This value should be having HOCR file name passing for processing in XML file format.

Check List:

  1. For using Advance KV user should have admin access to fetch the accurate value of Length, Width, Xoffset and Yoffset. Before using AdvancedKV, please test the image with Ephesoft Admin Screen and note the values of Length, Width, Xoffset, Yoffset and LocationType for the particular KeyValue pattern.
  2. If AdvancedKV is true than NoOfWords is not use and all other parameters is used.
  3. If AdvancedKV is false than NoOfWords, KeyPattern, ValuePattern and LocationType will work.

Sample Input Used:

ephesoft-web-services\extractkv.zip
Format for XML:

<ExtractKVParams>

<Params>

<AdvancedKV>true</AdvancedKV>

<LocationType>BOTTOM_LEFT</LocationType>

<NoOfWords>0</NoOfWords>

<KeyPattern>APPLICATION</KeyPattern>

<ValuePattern>[a-zA-Z]{10,15}</ValuePattern>

<KVFetchValue>ALL</KVFetchValue>

<Multiplier>1</Multiplier>

<Length>384</Length>

<Width>251</Width>

<Xoffset>284</Xoffset>

<Yoffset>105</Yoffset>

</Params>

</ExtractKVParams>
Sample client code using apache commons http client:-

private static void extractKV() {

HttpClient client = new HttpClient();
String url = “http://localhost:8080/dcma/rest/extractKV“;

PostMethod mPost = new PostMethod(url);
// Adding XML for the input.

File f1 = new File(“C:\\sample\\extractKV.xml”);

// Adding HOCR for processing.

File f2 = new File(“C:\\sample\\Application-Checklist.xml “);

Part[] parts = new Part[3];

try {

parts[0] = new FilePart(f1.getName(), f1);

parts[1] = new FilePart(f2.getName(), f2);

parts[2] = new StringPart(“hocrFileName”, f2.getName());
MultipartRequestEntity entity = new MultipartRequestEntity(parts, mPost.getParams());

mPost.setRequestEntity(entity);

int statusCode = client.executeMethod(mPost);
if (statusCode == 200) {

System.out.println(“Web service executed successfully.”);

String responseBody = mPost.getResponseBodyAsString();

// Generating result as responseBody.

System.out.println(statusCode + ” *** ” + responseBody);

else if (statusCode == 403) {

System.out.println(“Invalid username/password.”);

else {

System.out.println(mPost.getResponseBodyAsString());

}

catch (FileNotFoundException e) {

System.err.println(“File not found for processing.”);

catch (HttpException e) {

e.printStackTrace();

catch (IOException e) {

e.printStackTrace();

finally {

if (mPost != null) {

mPost.releaseConnection();

}

}

}

 

extractFixedForm

This API extracts the document level fields from the given RSP file and the image provided. This image should be tif/png.

Web Service URL :

http://{serverName}:{port}/dcma/rest/extractFixedForm

 

Input Parameter Values Descriptions
colorSwitch Either “ON”/”OFF” This parameter is used for extracting the data from color image or black and white image.
projectFile This value should not be empty and should have valid recostar project file name. This is used for HOCRing the image file using project file associated.

Format for project file :

<_Project MajorRevision=”6″ MinorRevision=”0″ Timeout=”180000″>

<_Collection Name=”Libraries”>

<_Library Type=”Dll” BaseName=”ImageProcess”/>

<_Library Type=”Dll” BaseName=”ImageProcess2″/>

<_Library Type=”Dll” BaseName=”FormIdent”/>

<_Library Type=”Dll” BaseName=”Recognition”/>

</_Collection>

<FormOperator Name=”Operator” SetupImageFileName=”” ProjectID=”0″ DefaultFormType=”Voting_Pharmacy” ExternalFormType=”” Country=”USA” FormRegistration=”Off” FormReading=”true” FormGeometry=”0 0 0 0 0 0″ ResultCoordinates=”OriginalImage” ResultImage=”Off” ResultGraphicalObjects=”false” PassThroughID=”Ignore” DiagnosticsMode=”OnError” DiagnosticsFileName=””>

<ImageSequence2Operator Name=”ImageProcessing” SetupImageFileName=”” RegisterImage=”false” DiagnosticsMode=”OnError” DiagnosticsFileName=”” ConfigurationFileName=”” Geometry=”0 0 1488 0 0 1019″>

<LoadImageOperator Name=”ImageSourceOperator” FileName=”” FileFormat=”Unknown” Resolution=”ReadFromFile” UnifyResolution=”false” RepairResolution=”false” AutoRotate=”false” IgnorePalette=”true” ScaleToGray=”0″/>

<ExtractGrayFromRgbOperator Name=”ColorFilterOperator” LumaRed=”0.299″ LumaGreen=”0.587″ LumaBlue=”0.114″/>

<BinarizeEdgeAdaptiveOperator Name=”BinarizeOperator” EdgeThreshold=”80″ DoubleResolution=”false”/>

<_Collection Name=”BinaryImageSequence”>

<DetectPaperAreaOperator Name=”DetectPaperArea” KeepBlackFrame=”false” SafetyClass=”Medium” DetectTextSkew=”false”/>

</_Collection>

</ImageSequence2Operator>

<_Collection Name=”Forms”>

<FormRecoOperator Name=”Voting_Pharmacy” SetupImageFileName=”” SetupImageWidth=”125.98″ SetupImageHeight=”86.27″>

<_Collection Name=”RecoOperators”>

<IcrField Name=”Field1″ Zone=”440 1151 11269 796 0 1″ ReaderSelection=”Voter” Orientation=”Normal” SyntaxMode=”Alphanumerical” Font=”Unknown” NumberOfLines=”1″ HandprintHeight=”5.50″ HandprintPitch=”5.00″ HandprintMinConfidence=”100″ MachinetypeHeight=”Unknown” MachinetypePitch=”Unknown” MachinetypeMinConfidence=”100″ LogicalContext=”On” TrigramMode=”On” DictionaryFileName=”” DictionaryMode=”Incomplete” DictionaryCandidates=”Words” CharacterSet=”” Pattern=”^[$%*+,\-.0-9:;<=>?A-Z\\a-z]*$” LeftBoundaryHandling=”On” TopBoundaryHandling=”On” RightBoundaryHandling=”On” BottomBoundaryHandling=”On” Classifiers=”” PassThroughID=”None”>

<_Collection Name=”IgnoreAreas”/>

</IcrField>

<IcrField Name=”Field2″ Zone=”593 1930 11803 796 0 1″ ReaderSelection=”Voter” Orientation=”Normal” SyntaxMode=”Numerical” Font=”Unknown” NumberOfLines=”1″ HandprintHeight=”5.50″ HandprintPitch=”5.00″ HandprintMinConfidence=”100″ MachinetypeHeight=”Unknown” MachinetypePitch=”Unknown” MachinetypeMinConfidence=”100″ LogicalContext=”On” TrigramMode=”On” DictionaryFileName=”” DictionaryMode=”Incomplete” DictionaryCandidates=”Words” CharacterSet=”” Pattern=”^[$*+,\-.0-9\\]*$” LeftBoundaryHandling=”On” TopBoundaryHandling=”On” RightBoundaryHandling=”On” BottomBoundaryHandling=”On” Classifiers=”” PassThroughID=”None”>

<_Collection Name=”IgnoreAreas”/>

</IcrField>

</_Collection>

</FormRecoOperator>

</_Collection>

<FormGenerator Name=”Generator”/>

</FormOperator>

</_Project>
Sample for XML:

<WebServiceParams>

<Params>

<Param>

<Name>colorSwitch</Name>

<Value>off</Value>

</Param>

<Param>

<Name>projectFile</Name>

<Value>Fpr.rsp</Value>

</Param>

</Params>

</WebServiceParams>
check List:

  1. projectFile should have fields like the fields marked yellow in above.
  2. If colorSwitch is ON than image should be png.
  3. If colorSwitch is OFF than image should be tif/tiff.

Sample Input Used:

ephesoft-web-services\extract-fixed-form.zip

Sample client code using apache commons http client:-

private static void extractFixedForm() {

HttpClient client = new HttpClient();

String url = “http://localhost:8080/dcma/rest/extractFixedForm“;

PostMethod mPost = new PostMethod(url);
// adding file for sending

File file1 = new File(“C:\\sample\\WebServiceParams.xml”);

File file2 = new File(“C:\\sample\\Voting_Pharmacy.rsp”);
Part[] parts = new Part[2];

try {

parts[0] = new FilePart(file1.getName(), file1);

parts[1] = new FilePart(file2.getName(), file2);
MultipartRequestEntity entity = new MultipartRequestEntity(parts, mPost.getParams());

mPost.setRequestEntity(entity);
int statusCode = client.executeMethod(mPost);
if (statusCode == 200) {

System.out.println(“Web service executed successfully.”);

String responseBody = mPost.getResponseBodyAsString();

// Generating result as responseBody.

System.out.println(statusCode + ” *** ” + responseBody);

else if (statusCode == 403) {

System.out.println(“Invalid username/password.”);

else {

System.out.println(mPost.getResponseBodyAsString());

}

catch (FileNotFoundException e) {

System.err.println(“File not found for processing.”);

catch (HttpException e) {

e.printStackTrace();

catch (IOException e) {

e.printStackTrace();

finally {

if (mPost != null) {

mPost.releaseConnection();

}

}

}

extractFieldFromHocr

This API will extract the KV pattern for the given word in the given HOCR.

Web Service URL :

http://{serverName}:{port}/dcma/rest/extractFieldFromHocr

 

Input Parameter Values Descriptions
fieldValue This should not be empty. This parameter is used for extracting the Key Value pattern for the word provided.

Check List:

  1. fieldValue is provided for the word on which Key Value pattern would be found.

Sample Input Used:

ephesoft-web-services\extract-field-from-hocr.zip

Sample client code using apache commons http client:-

private static void extractFieldFromHocr() {

HttpClient client = new HttpClient();
String url = “http://localhost:8080/dcma/rest/extractFieldFromHocr“;

PostMethod mPost = new PostMethod(url);
// Adding HTML for extracting field

File file1 = new File(“C:\\sample\\Application-Checklist.html”);

Part[] parts = new Part[2];

try {

parts[0] = new FilePart(file1.getName(), file1);

// Adding field value for extracting Key Value Pattern.

parts[1] = new StringPart(“fieldValue”, “APPLICATION”);
MultipartRequestEntity entity = new MultipartRequestEntity(parts, mPost.getParams());

mPost.setRequestEntity(entity);

int statusCode = client.executeMethod(mPost);
if (statusCode == 200) {

System.out.println(“Web service executed successfully.”);

String responseBody = mPost.getResponseBodyAsString();

System.out.println(statusCode + ” *** ” + responseBody);

else if (statusCode == 403) {

System.out.println(“Invalid username/password.”);

else {

System.out.println(mPost.getResponseBodyAsString());

}

catch (FileNotFoundException e) {

System.err.println(“File not found for processing.”);

catch (HttpException e) {

e.printStackTrace();

catch (IOException e) {

e.printStackTrace();

finally {

if (mPost != null) {

mPost.releaseConnection();

}

}

}

extractFuzzyDB

This API will create the document level fields for the document type for the specified batch class for HOCR file passing it.

Web Service URL:

http://{serverName}:{port}/dcma/rest/extractFuzzyDB

 

Input Parameter Values Descriptions
documentType This should not be empty and valid document type for that batch class This parameter is used for generating document level fields for defined document type.
batchClassIdentifier This should not be empty and valid batch class identifier This parameter used for fetching the information of the document for defined document type
hocrFile This value should not and empty and should have same name as HOCR file attached for processing. This parameter is used for verifying the HOCR file name.

Check List:-

  1. hocrFile should have same HOCR file name that are passed for processing.
  2. BatchClass having that batchClassIdentifier should have fuzzyDB plugin for processing.
  3. DocumentType should have document level fields for specified document type.

Sample Input Used:

ephesoft-web-services\extract-fuzzy-db.zip

Sample client code using apache commons http client:-

private static void extractFuzzyDB() {

HttpClient client = new HttpClient();

String url = “http://localhost:8080/dcma/rest/extractFuzzyDB“;

PostMethod mPost = new PostMethod(url);
// Adding HOCR file for processing

File file = new File(“C:\\sample\\Application-Checklist_000.html”);
Part[] parts = new Part[4];

try {

parts[0] = new FilePart(file.getName(), file);

// Adding parameter for docuement type.

parts[1] = new StringPart(“documentType”, “Application-Checklist”);

// Adding parameter for batch class.

parts[2] = new StringPart(“batchClassIdentifier”, “BC1”);

parts[3] = new StringPart(“hocrFile”, “Application-Checklist.html”);
MultipartRequestEntity entity = new MultipartRequestEntity(parts, mPost.getParams());

mPost.setRequestEntity(entity);
int statusCode = client.executeMethod(mPost);
if (statusCode == 200) {

System.out.println(“Web service executed successfully.”);

String responseBody = mPost.getResponseBodyAsString();

// Generating result as responseBody.

System.out.println(statusCode + ” *** ” + responseBody);

else if (statusCode == 403) {

System.out.println(“Invalid username/password.”);

else {

System.out.println(mPost.getResponseBodyAsString());

}

catch (FileNotFoundException e) {

System.err.println(“File not found for processing.”);

catch (HttpException e) {

e.printStackTrace();

catch (IOException e) {

e.printStackTrace();

finally {

if (mPost != null) {

mPost.releaseConnection();

}

}

}

barcodeExtraction

This API will create the document level fields for the document type for the specified batch class for barcode in tiff files passing it.

Web Service URL :

http://{serverName}:{port}/dcma/rest/barcodeExtraction

 

Input Parameter Values Descriptions
documentType This should not be empty and valid document type for that batch class This parameter is used for generating document level fields for defined document type.
batchClassIdentifier This should not be empty and valid batch class identifier This parameter used for fetching the information of the document for defined document type
imageName This value should not and empty. On this file extraction operation will be performed.

Check List:-

  1. BatchClass having that batchClassIdentifier should have Barcode Extraction plugin for processing.
  2. DocumentType should have document level fields for specified document type.
  3. Image name should have valid extension i.e. TIF/TIFF.

Sample Input Used:

ephesoft-web-services\barcodeExtraction.zip

Sample client code using apache commons http client:-

private static void barcodeExtraction() {

HttpClient client = new HttpClient();

String url = “http://localhost:8080/dcma/rest/barcodeExtraction“;

PostMethod mPost = new PostMethod(url);
File file1 = new File(“C:\\sample\\sample.tif”);

// adding xml file for taking input

File file2 = new File(“C:\\sample\\WebServiceParams-barcodeExtraction.xml”);
Part[] parts = new Part[2];

try {

parts[0] = new FilePart(file1.getName(), file1);

parts[1] = new FilePart(file2.getName(), file2);
MultipartRequestEntity entity = new MultipartRequestEntity(parts, mPost.getParams());

mPost.setRequestEntity(entity);
int statusCode = client.executeMethod(mPost);
if (statusCode == 200) {

System.out.println(“Web service executed successfully.”);

System.out.println(mPost.getResponseBodyAsString());

else if (statusCode == 403) {

System.out.println(“Invalid username/password.”);

else {

System.out.println(mPost.getResponseBodyAsString());

}

catch (HttpException e) {

e.printStackTrace();

catch (IOException e) {

e.printStackTrace();

finally {

if (mPost != null) {

mPost.releaseConnection();

}

}

}

regularRegexExtraction

This API will extract the document level fields for the document type for the specified batch class.

Web Service URL : http://{serverName}:{port}/dcma/rest/extractFieldsUsingRegex

 

Input Parameter Values Descriptions
documentType This should not be empty and valid document type for that batch class This parameter is used for generating document level fields for defined document type.
batchClassIdentifier This should not be empty and valid batch class identifier This parameter used for fetching the information of the document for defined document type.
hocrFileName This value should not be empty. XML file name for which document level fields will be extracted.

Check List:-

  1. This batch class specified should have Regular Regex plugin defined for it.
  2. DocumentType should have document level fields for specified document type.
  3. HOCR file name should have valid extension, i.e., XML.

Sample Input Used:

ephesoft-web-services/regularRegexExtraction.zip

Sample client code using apache commons http client:-

private static void extractFieldsUsingRegex() {

HttpClient client = new HttpClient();

String url = “http://localhost:8080/dcma/rest/extractFieldsUsingRegex“;

PostMethod mPost = new PostMethod(url);

File file1 = new File(“C:\\sample\\sample1.xml”);

// adding xml file for taking input

File file2 = new File(“C:\\sample\\WebServiceParams.xml”);

Part[] parts = new Part[3];

try {

parts[0] = new FilePart(file1.getName(), file1);

parts[1] = new FilePart(file2.getName(), file2);

parts[2] = new StringPart(“hocrFileName”, file1.getName());

MultipartRequestEntity entity = new MultipartRequestEntity(parts, mPost.getParams());

mPost.setRequestEntity(entity);
int statusCode = client.executeMethod(mPost);
if (statusCode == 200) {

System.out.println(“Web service executed successfully.”);

System.out.println(mPost.getResponseBodyAsString());

else if (statusCode == 403) {

System.out.println(“Invalid username/password.”);

else {

System.out.println(mPost.getResponseBodyAsString());

}

catch (HttpException e) {

e.printStackTrace();

catch (IOException e) {

e.printStackTrace();

finally {

if (mPost != null) {

mPost.releaseConnection();

}

}

}

commonAPIForExtraction

This API is required to set the Header in the client for which the extraction to be performed. Rest of the information for the individual api found above.
Input for Extraction Type:

Pass the name of extraction api that is to use in the client header as shown in following example: BARCODE_EXTARCTIONRECOSTAR_EXTARCTIONREGULAR_REGEX_EXTRACTIONKV_EXTRACTIONFUZZY_DB
Here’s the sample client code using Regular Regex Extraction:
private static void extractFields() {

HttpClient client = new HttpClient();

String url = “http://localhost:8080/dcma/rest/extractFields“;

PostMethod mPost = new PostMethod(url);
File file1 = new File(“C:\\sample\\input\\sample1.html”);

// adding xml file for taking input

File file2 = new File(“C:\\sample\\input\\WebServiceParams.xml”);
Part[] parts = new Part[2];

try {

parts[0] = new FilePart(file1.getName(), file1);

parts[1] = new FilePart(file2.getName(), file2);
MultipartRequestEntity entity = new MultipartRequestEntity(parts, mPost.getParams());

/* Pass the name of extraction api that is to use:

BARCODE_EXTARCTION

RECOSTAR_EXTARCTION

REGULAR_REGEX_EXTRACTION

KV_EXTRACTION

FUZZY_DB*/

Header header = new Header(“extractionAPI”, “REGULAR_REGEX_EXTRACTION”);

mPost.addRequestHeader(header);

mPost.setRequestEntity(entity);
if (statusCode == 200) {

System.out.println(“Web service executed successfully.”);

System.out.println(mPost.getResponseBodyAsString());

else if (statusCode == 403) {

System.out.println(“Invalid username/password.”);

else {

System.out.println(mPost.getResponseBodyAsString());

}} catch (HttpException e) {

e.printStackTrace();

catch (IOException e) {

e.printStackTrace();

finally {

if (mPost != null) {

mPost.releaseConnection();

}

}

}

OCR Web Service

createOCR

This API will generate the OCR result for the specified sample image file. This API works for tif and png file. If processing the color image application accept the png/tif file as input and for black and white image application processes tif file as input.

Web Service URL :

http://{serverName}:{port}/dcma/rest/createOCR

 

Input Parameter Values Descriptions
ocrEngine Either “recostar”/”tesseract” This parameter is used for configuring the ocrEngine to be used
colorSwitch Either “ON”/”OFF” This parameter is used
tesseractVersion Currently application supports“tesseract_version_3” This parameter is used for tesseract version to be used.
cmdLanguage Either “tha”/”eng” This parameter is used for configure the language that has been learnt by tessearact
projectFile This can be empty in case of tesseract ocrEngine This parameter is validating the RSP used for OCRing in case of recostar

Check List:-

  1. In case of ocrEngine is recostar, than colorSwitch and projectFile is mandatory parameters.
  2. In case of ocrEngine is tesseract than colorSwitch, tesseractVersion and cmdLanguage are mandatory parameters.
  3. If colorSwitch is ON, input image can be tif/png.
  4. If colorSwitch is OFF than input image should be TIFF.

Sample Input Used:

ephesoft-web-services\create-ocr.zip

File format for XML file:

<WebServiceParams>

<Params>

<Param>

<Name>ocrEngine</Name>

<Value>recostar</Value>

</Param>

<Param>

<Name>colorSwitch</Name>

<Value>off</Value>

</Param>

<Param>

<Name>tesseractVersion</Name>

<Value>tesseract_version_3</Value>

</Param>

<Param>

<Name>cmdLanguage</Name>

<Value>eng</Value>

</Param>

<Param>

<Name>projectFile</Name>

<Value>Fpr.rsp</Value>

</Param>

</Params>

</WebServiceParams>
Format for RSP file:

<_Project MajorRevision=”1″ MinorRevision=”0″ Timeout=”180000″>

<_Collection Name=”Libraries”>

<_Library Type=”Dll” BaseName=”ImageProcess”/>

<_Library Type=”Dll” BaseName=”ImageProcess2″/>

<_Library Type=”Dll” BaseName=”Recognition”/>

</_Collection>

<FullPageOperator Name=”Operator” SetupImageFileName=”” Country=”USA” TextReading=”true” ResultCoordinates=”OriginalImage” ResultImage=”RecoImage” ResultGraphicalObjects=”false” DiagnosticsMode=”OnError” DiagnosticsFileName=””>

<ImageSequence2Operator Name=”ImageProcessing” SetupImageFileName=”” RegisterImage=”true” DiagnosticsMode=”OnError” DiagnosticsFileName=”” ConfigurationFileName=”” Geometry=”-1 0 1683 0 1 2190″>

<LoadImageOperator Name=”ImageSourceOperator” FileName=”” FileFormat=”Unknown” Resolution=”200″ UnifyResolution=”true” RepairResolution=”true” AutoRotate=”false” IgnorePalette=”false” ScaleToGray=”0″/>

<ExtractGrayFromRgbOperator Name=”ColorFilterOperator” LumaRed=”0.299″ LumaGreen=”0.587″ LumaBlue=”0.114″/>

<BinarizeEdgeAdaptiveOperator Name=”BinarizeOperator” EdgeThreshold=”80″ DoubleResolution=”false”/>

<_Collection Name=”BinaryImageSequence”>

<RemoveShadingOperator Name=”RemoveShading” MinRegionWidth=”10.00″ MinRegionHeight=”3.00″/>

<DetectPaperAreaOperator Name=”DetectPaperArea” KeepBlackFrame=”false” SafetyClass=”Medium” DetectTextSkew=”true”/>

<BinaryAutoRotateOperator Name=”AutoRotate” DocumentOrientation=”Unknown” InputOrientation=”MostlyCorrect”/>

<ProtectBarCodesOperator Name=”ProtectBarCodes” SafetyClass=”Medium” SearchRegion=””/>

<RemoveLineSystemOperator Name=”RemoveLineSystem” HorizontalLineLength=”10.00″ VerticalLineLength=”12.00″ DashedLineLength=”30.00″ MaxLineWidth=”1.50″ MaxGapWidth=”1.00″ BoxSeparatorHeight=”4.00″ InvertedRegionWidth=”12.00″ InvertedRegionHeight=”4.00″ LineQuality=”Medium”/>

</_Collection>

</ImageSequence2Operator>

<LayoutOperator Name=”LayoutOperator” FindTextBlocks=”true”/>

<FullPageField Name=”TextField” Zone=”0 0 0 0 0 1″ ReaderSelection=”Voter” Orientation=”Normal” SyntaxMode=”Alphanumerical” NumberOfLines=”TextLineSegments” MachinetypeHeight=”Unknown” MachinetypePitch=”Unknown” MachinetypeMinConfidence=”100″ LogicalContext=”On” TrigramMode=”On” DictionaryFileName=”” DictionaryMode=”Incomplete” DictionaryCandidates=”Words” CharacterSet=”” Pattern=”” PassThroughID=”None”/>

<FormGenerator Name=”Generator”/>

</FullPageOperator>

</_Project>
Sample client code using apache commons http client:-

private static void createOCR() {

HttpClient client = new HttpClient();

String url = “http://localhost:8080/dcma/rest/createOCR“;

PostMethod mPost = new PostMethod(url);
// adding image file for processing

File file1 = new File(“C:\\sample\\sample1.tif”);

// adding xml file for taking input

File file2 = new File(“C:\\sample\\WebServiceParams.xml”);

// adding rsp file used for creating OCR in case of recostar

File file3 = new File(“C:\\sample\\Fpr.rsp”);
Part[] parts = new Part[3];

try {

parts[0] = new FilePart(file1.getName(), file1);

parts[1] = new FilePart(file2.getName(), file2);

parts[2] = new FilePart(file3.getName(), file3);
MultipartRequestEntity entity = new MultipartRequestEntity(parts, mPost.getParams());

mPost.setRequestEntity(entity);
int statusCode = client.executeMethod(mPost);
if (statusCode == 200) {

System.out.println(“Web service executed successfully.”);

InputStream in = mPost.getResponseBodyAsStream();

// saving result generated.

File outputFile = new File(“C:\\sample\\serverOutput.zip”);

FileOutputStream fos = new FileOutputStream(outputFile);

try {

byte[] buf = new byte[1024];

int len = in.read(buf);

while (len > 0) {

fos.write(buf, 0, len);

len = in.read(buf);

}

finally {

if (fos != null) {

fos.close();

}

}

else if (statusCode == 403) {

System.out.println(“Invalid username/password.”);

else {

System.out.println(mPost.getResponseBodyAsString());

}

catch (FileNotFoundException e) {

System.err.println(“File not found for processing.”);

catch (HttpException e) {

e.printStackTrace();

catch (IOException e) {

e.printStackTrace();

finally {

if (mPost != null) {

mPost.releaseConnection();

}

}

}

Users and Groups Web Service

getBatchInstanceForRole

This API is used to fetch all batch instance list having accessed by the specified role. This API is GET api, works with web url and client code.
Web Service URL :

http://{serverName}:{port}/dcma/rest/getBatchInstanceForRole/{role}

 

Input Parameter Values Descriptions
role This value should not be empty. This parameter is used for specifying the role name for which batch instance list to be fetched.

Sample client code using apache commons http client:-

private static void getBatchInstanceForRole() {

HttpClient client = new HttpClient();
// URL path to be hit for getting the batch instance list having accessed by the role specified.

String url = “http://localhost:8080/dcma/rest/getBatchInstanceForRoles/admin“;

GetMethod getMethod = new GetMethod(url);
int statusCode;

try {

statusCode = client.executeMethod(getMethod);
if (statusCode == 200) {

System.out.println(“Web service executed successfully.”);

String responseBody = getMethod.getResponseBodyAsString();

System.out.println(statusCode + ” *** ” + responseBody);

else if (statusCode == 403) {

System.out.println(“Invalid username/password.”);

else {

System.out.println(getMethod.getResponseBodyAsString());

}

catch (HttpException e) {

e.printStackTrace();

catch (IOException e) {

e.printStackTrace();

finally {

if (getMethod != null) {

getMethod.releaseConnection();

}

}

}

getBatchClassForRole:

This API is used to fetch all batch class lists having accessed by the specified role. This API is GET api, works with web url and client code.

Web Service URL :

http://{serverName}:{port}/dcma/rest/getBatchClassForRole/{role}

 

Input Parameter Values Descriptions
role This value should not be empty. This parameter is used for specifying the role name for which batch class list to be fetched.

Sample client code using apache commons http client:-

private static void getBatchClassForRole() {

HttpClient client = new HttpClient();
// URL path to be hit for getting the batch class list having accessed by the role specified.

String url = “http://localhost:8080/dcma/rest/getBatchClassForRole/admin“;

GetMethod getMethod = new GetMethod(url);

int statusCode;

try {

statusCode = client.executeMethod(getMethod);

if (statusCode == 200) {

System.out.println(“Web service executed successfully.”);

String responseBody = getMethod.getResponseBodyAsString();

System.out.println(statusCode + ” *** ” + responseBody);

else if (statusCode == 403) {

System.out.println(“Invalid username/password.”);

else {

System.out.println(getMethod.getResponseBodyAsString());

}

catch (HttpException e) {

e.printStackTrace();

catch (IOException e) {

e.printStackTrace();

finally {

if (getMethod != null) {

getMethod.releaseConnection();

}

}

}

Reporting Web Service

runReporting

This API is used to run reporting on using web services. This web service takes server side installer path as an input and performs synchronizing the report database.
Web Service URL :
http://{serverName}:{port}/dcma/rest/runReporting

 

Input Parameter Values Descriptions
installerPath This value should be valid path. This parameter is used for specifying path fie build.xml for reporting present on the server side.

Checklist :-
1. This path should be valid file path and must be server path for the build.xml file.
Sample Input Used:
ephesoft-web-services\run-reporting.zip
Format for inputXML file :
<ReportingOptions>
<installerPath>C:\\testing</installerPath>
</ReportingOptions>
Sample client code using apache commons http client:-
private static void runReporting() {

 HttpClient client = new HttpClient();
 String url = "http://localhost:8080/dcma/rest/runReporting&quot;;
 PostMethod mPost = new PostMethod(url);
 File file1 = new File("C:\\sample\\reporting.xml");
 Part[] parts = new Part[1];
 try {
   parts[0] = new FilePart(file1.getName(), file1);
   MultipartRequestEntity entity = new MultipartRequestEntity(parts, mPost.getParams());
   mPost.setRequestEntity(entity);
   int statusCode = client.executeMethod(mPost);
          if (statusCode == 200) {
                            System.out.println("Web service executed successfully.");
                            String responseBody = mPost.getResponseBodyAsString();
                            System.out.println(statusCode + " *** " + responseBody);
                                              }

else if (statusCode == 403) { System.out.println(“Invalid username/password.”);

                                               }

else { System.out.println(mPost.getResponseBodyAsString());

                                               }
                               } catch (FileNotFoundException e) {
                             System.out.println("File not found for processing.");
                               } catch (HttpException e) {
                                               e.printStackTrace();
                               } catch (IOException e) {
                                               e.printStackTrace();
                               } finally {
                                               if (mPost != null) {
                                                               mPost.releaseConnection();
                                               }
                               }
               }

Batch Instance Management Web Service

restartBatchInstance

This API is used to restart the batch instance from specified module. User can restart those batch instances those are accessible by their role. This API is GET api, works with client code and web url.

Web Service URL :

http://{serverName}:{port}/dcma/rest/restartBatchInstance/{batchInstanceIdentifier}/{restartAtModuleName}

 

Input Parameter Values Descriptions
batchInstanceIdentifier This value should be valid batch instance identifier. This parameter is used to specifying the batch instance identifier for which batch instance to be restart.
restartAtModuleName This value should not be empty. This parameter is used specifying the module name from where batch to be restart.

Checklist:-

  1. Batch Instance identifier should be valid identifier and having access by the user which are authenticate the web service.
  2. restartAtModuleName this value should valid module name and it can be differing with batch class.

Sample client code using apache commons http client:-

private static void restartBatchInstance() {

HttpClient client = new HttpClient();
// URL path to be hit for restarting batch instance identifier from specified module.

// User can restart only those batch instance having status “ERROR”, “READY_FOR_REVIEW”, “READY_FOR_VALIDATION”, “RUNNING”

String url = “http://{serverName}:{port}/dcma/rest/restartBatchInstance/BI1/Folder_Import_Module“;

GetMethod getMethod = new GetMethod(url);
int statusCode;

try {

statusCode = client.executeMethod(getMethod);
if (statusCode == 200) {

System.out.println(“Web service executed successfully.”);

String responseBody = getMethod.getResponseBodyAsString();

System.out.println(statusCode + ” *** ” + responseBody);

else if (statusCode == 403) {

System.out.println(“Invalid username/password.”);

else {

System.out.println(getMethod.getResponseBodyAsString());

}

catch (HttpException e) {

e.printStackTrace();

catch (IOException e) {

e.printStackTrace();

finally {

if (getMethod != null) {

getMethod.releaseConnection();

}

}

}

deleteBatchInstance

This API is used to delete the batch instance for specified batch instance identifier. This API will delete that batch instance having accessed by the authenticated user.

Web Service URL :

http://{serverName}:{port}/dcma/rest/deleteBatchInstance/{identifier}

 

Input Parameter Values Descriptions
batchInstanceIdentifier This value should be valid batch instance identifier. This parameter is used to specifying the batch instance identifier to be deleted.

Sample client code using apache commons http client:-

private static void deleteBatchInstance() {

HttpClient client = new HttpClient();

// URL path to be hit for deleting the batch instance having access to the authenticated user.

// User can delete only those batch instance having status “ERROR”, “READY_FOR_REVIEW”, “READY_FOR_VALIDATION”, “RUNNING”

String url = “http://localhost:8080/dcma/rest/deleteBatchInstance/BI1“;

GetMethod getMethod = new GetMethod(url);

int statusCode;

try {

statusCode = client.executeMethod(getMethod);

if (statusCode == 200) {

System.out.println(“Web service executed successfully.”);

String responseBody = getMethod.getResponseBodyAsString();

System.out.println(statusCode + ” *** ” + responseBody);

else if (statusCode == 403) {

System.out.println(“Invalid username/password.”);

else {

System.out.println(getMethod.getResponseBodyAsString());

}

catch (HttpException e) {

e.printStackTrace();

catch (IOException e) {

e.printStackTrace();

finally {

if (getMethod != null) {

getMethod.releaseConnection();

}

}

}

restartAllBatchInstance

This API is used to restart all the batch instance having status READY_FOR_REVIEW and READY_FOR_VALIDATION and having access by the authenticated user. This API is GET API, works with client code and web url.

Web Service URL : http://{serverName}:{port}/dcma/rest/restartAllBatchInstance

Checklist:-

  1. Only those batch will restart having status READY_FOR_REVIEW and READY_FOR_VALIDATION which are accessible by authenticated user.

Sample client code using apache commons http client:-

private static void restartAllBatchInstance() {

HttpClient client = new HttpClient();
// URL path to be hit for restarting the batch instance having access to the authenticated user.

// User can restart only those batch instance having status “READY_FOR_REVIEW”, “READY_FOR_VALIDATION”

String url = “http://localhost:8080/dcma/rest/restartAllBatchInstance“;

GetMethod getMethod = new GetMethod(url);
int statusCode;

try {

statusCode = client.executeMethod(getMethod);
if (statusCode == 200) {

System.out.println(“Web service executed successfully.”);

String responseBody = getMethod.getResponseBodyAsString();

System.out.println(statusCode + ” *** ” + responseBody);

else if (statusCode == 403) {

System.out.println(“Invalid username/password.”);

else {

System.out.println(getMethod.getResponseBodyAsString());

}

catch (HttpException e) {

e.printStackTrace();

catch (IOException e) {

e.printStackTrace();

finally {

if (getMethod != null) {

getMethod.releaseConnection();

}

}

}

addUserRolesToBatchInstance

This API is used to adding roles to batch instance identifier. This API takes batch instance identifier and role name as an input and adding it to the database. This API is GET api, works with client code and web browser both.
Web Service URL :

http://{serverName}:{port}/dcma/rest/addUserRolesToBatchInstance/{batchInstanceIdentifier}/{userRole}

 

Input Parameter Values Descriptions
batchInstanceIdentifier This value should be valid batch instance identifier. This parameter is used to specifying the batch instance identifier for which roles to be added.
userRole This value should not be empty. This parameter is used specifying the role to be added on the specified batch instance identifier.

Sample client code using apache commons http client:-

private static void addUserRolesToBatchInstance() {

HttpClient client = new HttpClient();
// URL path to be hit for adding user roles to batch instance identifier

String url = “http://localhost:8080/dcma/rest/addUserRolesToBatchInstance/BI45/admin“;

GetMethod getMethod = new GetMethod(url);
int statusCode;

try {

statusCode = client.executeMethod(getMethod);
if (statusCode == 200) {

System.out.println(“Web service executed successfully.”);

String responseBody = getMethod.getResponseBodyAsString();

System.out.println(statusCode + ” *** ” + responseBody);

else if (statusCode == 403) {

System.out.println(“Invalid username/password.”);

else {

System.out.println(getMethod.getResponseBodyAsString());

}

catch (HttpException e) {

e.printStackTrace();

catch (IOException e) {

e.printStackTrace();

finally {

if (getMethod != null) {

getMethod.releaseConnection();

}

}

}

Batch Class Management Web Service

importBatchClass

This API is used for importing batch class to the ephesoft. This API takes XML for input parameters and exported batch class data as an input. Exported batch class is in zip format as exported by Ephesoft.
Web Service URL :

http://{serverName}:{port}/dcma/rest/importBatchClass

 

Input Parameter Values Descriptions
RolesImported Either “true”/”false” This value is used for importing roles with batch class or not.
EmailAccounts Either “true”/”false” This value is used for importing email accounts with batch class or not.
UseSource Either “true”/”false” This value is used for saving the information of source batch class to be imported
Name This value should not be empty This value is used to configure the batch class name of the imported batch class.
Description This value should not be empty This value is used to configure the description of the imported batch class.
Priority This value should lie in between 1 to 100. This value indicates the priority of batch class.
UseExisting Either “true”/”false” This value is used for overwrite the existing batch class with new batch class.
UncFolder This value should not be empty and have any string value that specified directory path These values specify the UNC folder path for batch class to be imported along with batch class.
Script This tag is configured for ScriptFile to be imported This tag is configured for which Script file to be imported
Folder This tag is configured for Folder to be imported This tag is configured for which folder to be imported along with batch class

Checklist:-

  1. If UseExisting is “true”, existing batch class will be overwriting with the Folders and Script as well as others parameter.
  2. If UseExisting is “false”, new batch class will created and Folders and Scripts will be used as false.
  3. If UseSource is “true”, new batch class will have same Name, Description and Priority as source batch class.
  4. If UseSource is “false”, new batch class will have property like Name, Description and Priority configured.

SampleInputXML:

<ImportBatchClassOptions>
<RolesImported>false</RolesImported>

<EmailAccounts>true</EmailAccounts>
<UseSource>false</UseSource>
<Name>BatchClassName</Name>

<Description>Description</Description>

<Priority>10</Priority>
<UseExisting>true</UseExisting>

<UncFolder>C:\ephesoft-data\Test-UNC</UncFolder>
<BatchClassDefinition>

<Scripts>

<Script>

<FileName>ScriptDocumentAssembler.java</FileName>

<Selected>true</Selected>

</Script>

<Script>

<FileName>ScriptPageProcessing.java</FileName>

<Selected>true</Selected>

</Script>

</Scripts>
<Folders>

<Folder>

<FileName>image-classification-sample</FileName>

<Selected>false</Selected>

</Folder>

</Folders>

<BatchClassModules>

<BatchClassModule>

<ModuleName></ModuleName>

<PluginConfiguration>true</PluginConfiguration>

</BatchClassModule>

</BatchClassModules>

</BatchClassDefinition>

</ImportBatchClassOptions>
Sample client code using apache commons http client:-

private static void importBatchClass() {

HttpClient client = new HttpClient();

String url = “http://localhost:8080/dcma/rest/importBatchClass“;

PostMethod mPost = new PostMethod(url);

mPost.setDoAuthentication(true);

// Input XML for adding parameter.

File file1 = new File(“C:\\sample\\importbatchclass.xml”);

// Input zip file for importing batch class.

File file2 = new File(“C:\\sample\\BC1_050712_1714.zip”);

Part[] parts = new Part[2];

try {

parts[0] = new FilePart(file1.getName(), file1);

parts[1] = new FilePart(file2.getName(), file2);
MultipartRequestEntity entity = new MultipartRequestEntity(parts, mPost.getParams());

mPost.setRequestEntity(entity);

int statusCode = client.executeMethod(mPost);
if (statusCode == 200) {

System.out.println(“Batch class imported successfully”);

else if (statusCode == 403) {

System.out.println(“Invalid username/password.”);

else {

System.out.println(mPost.getResponseBodyAsString());

}

catch (FileNotFoundException e) {

System.out.println(“File not found for processing.”);

catch (HttpException e) {

e.printStackTrace();

catch (IOException e) {

e.printStackTrace();

finally {

if (mPost != null) {

mPost.releaseConnection();

}

}

}

exportBatchClass

This API is used for exporting existing batch class. This method will take the batch class identifier and learnt-sample to be exported with the batch class.
Web Service URL :

http://{serverName}:{port}/dcma/rest/exportBatchClass

 

Input Parameter Values Descriptions
identifier This value should not be empty and valid batch class identifier. This parameter is used for identifying which batch class is to be exported.
lucene-search-classification-sample Either “true”/”false” This parameter is used to configure the lucene learnt sample is exported with batch class or not.
image-classification-sample Either “true”/”false” This parameter is used to configure the image classification sample is exported with batch class or not.

Check List:-

  1. Identifier should be batch class identifier.

Sample client code using apache commons http client:-

private static void exportBatchClass() {

HttpClient client = new HttpClient();

String url = “http://localhost:8080/dcma/rest/exportBatchClass“;

PostMethod mPost = new PostMethod(url);
mPost.addParameter(“identifier”, “BC1”);

mPost.addParameter(“lucene-search-classification-sample”, “true”);

mPost.addParameter(“image-classification-sample”, “false”);

int statusCode;

try {

statusCode = client.executeMethod(mPost);
if (statusCode == 200) {

System.out.println(“Batch class exported successfully”);

InputStream in = mPost.getResponseBodyAsStream();

File f = new File(“C:\\sample\\serverOutput.zip”);

FileOutputStream fos = new FileOutputStream(f);

try {

byte[] buf = new byte[1024];

int len = in.read(buf);

while (len > 0) {

fos.write(buf, 0, len);

len = in.read(buf);

}

finally {

if (fos != null) {

fos.close();

}

}

else if (statusCode == 403) {

System.out.println(“Invalid username/password.”);

else {

System.out.println(mPost.getResponseBodyAsString());

}

catch (HttpException e) {

e.printStackTrace();

catch (IOException e) {

e.printStackTrace();

finally {

if (mPost != null) {

mPost.releaseConnection();

}

}

}

getBatchClassList

This API returns all the batch class having accessible by the authenticated user. This API is GET API, works with the client code and web url.
Web Service URL :

http://{serverName}:{port}/dcma/rest/getBatchClassList
Sample client code using apache commons http client:-

private static void getBatchClassList() {

HttpClient client = new HttpClient();

String url = “http://localhost:8080/dcma/rest/getBatchClassList“;

GetMethod mGet = new GetMethod(url);

int statusCode;

try {

statusCode = client.executeMethod(mGet);
if (statusCode == 200) {

System.out.println(“Web service executed successfully.”);

String responseBody = mGet.getResponseBodyAsString();

System.out.println(statusCode + ” *** ” + responseBody);

else if (statusCode == 403) {

System.out.println(“Invalid username/password.”);

else {

System.out.println(mGet.getResponseBodyAsString());

}

catch (HttpException e) {

e.printStackTrace();

catch (IOException e) {

e.printStackTrace();

finally {

if (mGet != null) {

mGet.releaseConnection();

}

}

}

 

getRoles

This API is used to get the roles of the specified batch class. This API is GET API, works with the client code and web url.

Web Service URL :

http://{serverName}:{port}/dcma/rest/getRoles/{batchClassIdentifier}

 

Input Parameter Values Descriptions
identifier This value should not be empty and valid batch class identifier. This parameter is used for identifying which batch class roles to be fetched.

Check List:-

  1. Identifier should be batch class identifier.

Sample client code using apache commons http client:-

private static void getRoles () {

HttpClient client = new HttpClient();

String url = “http://localhost:8080/dcma/rest/getRoles/BC1“;

GetMethod mGet = new GetMethod(url);

int statusCode;

try {

statusCode = client.executeMethod(mGet);
if (statusCode == 200) {

System.out.println(“Web service executed successfully.”);

String responseBody = mGet.getResponseBodyAsString();

System.out.println(statusCode + ” *** ” + responseBody);

else if (statusCode == 403) {

System.out.println(“Invalid username/password.”);

else {

System.out.println(mGet.getResponseBodyAsString());

}

catch (HttpException e) {

e.printStackTrace();

catch (IOException e) {

e.printStackTrace();

finally {

if (mGet != null) {

mGet.releaseConnection();

}

}

}

getAllModulesWorkflowNameByBatchClass

This API will return the module workflow names and the module names of the specified batch class identifier. This API is GET API, works with client code and web url.

Web Service URL :

http://{serverName}:{port}/dcma/rest/getAllModulesWorkflowNameByBatchClass/{batchClassIdentifier}

 

Input Parameter Values Descriptions
batchClassIdentifier This value should not be empty. This parameter is used for specifying the batch class identifier for which module name to be fetched.

Sample client code using apache commons http client:-

private static void getAllModulesWorkflowNameByBatchClass() {

HttpClient client = new HttpClient();
// URL path to be hit for getting the module workflow name of the specified batch class identifier

String url = “http://localhost:8080/dcma/rest/getAllModulesWorkflowNameByBatchClass/BC1“;

GetMethod getMethod = new GetMethod(url);
int statusCode;

try {

statusCode = client.executeMethod(getMethod);
if (statusCode == 200) {

System.out.println(“Web service executed successfully.”);

String responseBody = getMethod.getResponseBodyAsString();

System.out.println(statusCode + ” *** ” + responseBody);

else if (statusCode == 403) {

System.out.println(“Invalid username/password.”);

else {

System.out.println(getMethod.getResponseBodyAsString());

}

catch (HttpException e) {

e.printStackTrace();

catch (IOException e) {

e.printStackTrace();

finally {

if (getMethod != null) {

getMethod.releaseConnection();

}

}

}

Uploading a Batch through a Web Service

uploadBatch

This API is for uploading a new batch to a watch folder for a given batch class. It executes the new batch with supplied tif, tiff or pdf files. User need to be authorized to execute a batch for a particular batch class otherwise an error message would be generated. All the files would be copied to the unc folder of the requested batch class with the folder name supplied by the user as input.

Web Service URL :

http://{serverName}:{port}/dcma/rest/uploadBatch/{batchClassIdentifier}/{batchInstanceName }

 

Input Parameter Descriptions
batchClassIdentifier The identifier of the batch class in which user wishes to upload its batch.
batchInstanceName This name with which user wishes to upload the batch.

Check List:-

  1. The value for batchClassIdentifier is compulsory and should be valid with permissions to the user to run the batch on it.
  2. The value for batchInstanceName is compulsory and if left empty then it will send an error.

Sample client code using apache commons http client:-private static void uploadBatch () {

HttpClient client = new HttpClient();

String url = “http://localhost:8080/dcma/rest/uploadBatch/{BatchClassIdentifier}/{BatchInstanceName} “;

PostMethod mPost = new PostMethod(url);

// adding image file for processing

File file1 = new File(“C:\\sample\\sample1.tif”);

Part[] parts = new Part[1];

try {

parts[0] = new FilePart(file1.getName(), file1);

MultipartRequestEntity entity = new MultipartRequestEntity(parts, mPost.getParams());

mPost.setRequestEntity(entity);

int statusCode = client.executeMethod(mPost);

String responseBody = mPost.getResponseBodyAsString();

// Generating result as responseBody.

System.out.println(statusCode + “***” + responseBody);

if (statusCode == 200) {

System.out.println(“Web service executed successfully.”);

else if (statusCode == 403) {

System.out.println(“Invalid username/password.”);

else {

System.out.println(mPost.getResponseBodyAsString());

}

catch (FileNotFoundException e) {

System.err.println(“File not found for processing.”);

catch (HttpException e) {

e.printStackTrace();

catch (IOException e) {

e.printStackTrace();

finally {

if (mPost != null) {

mPost.releaseConnection();

}

}

}

Ephesoft UI Theme Configuration

Overview

Ephesoft UI is now made configurable to use different color themes of user’s choice. Now user can choose any three base colors of his choice and it will be reflected in the whole application by just refreshing the page. Images for a particular theme need to be modified and placed in the respective folder.

Configuration:

Folder Structure

A ‘themes’ folder is added inside the Application containing following files:-

  • theme.less
  • common.css
  • folders of respective themes

UI ThemeConfiguration.jpg

theme.less is the file where all configurations have to be made for color and image path. It contains the following properties which need to be configured:-

  • @base: hexadecimal color codes for the tabs and sub-tabs of the application.

TabsBaseColour.jpg

  • @baseDark: hexadecimal color codes for the buttons and links in the application.

ButtonsBaseColour.jpg

 

  • @baseLight: hexadecimal color codes for backgrounds, selected document or folder and grid tables.

BackgroundColour.jpg

 

  • @gridRowSelectionColor: hexadecimal color code or color name for grid row selection.

GridRowSelectionColour.jpg

 

  • @overlayColor, @overlaySecondColor, @overlayThirdColor, @overlayBorderColor: hexadecimal color code or color name for overlay backgrounds shown on RV Screen and Advanced KV Extraction screen.

To be specific:

@overlayColor – background color of overlay on RV screen, overlay on Table Extraction and overlay generated on selecting a key/value while defining an advanced KV pair

@overlaySecondColor – background color of overlay generated on capturing a key while defining an advanced KV pair

@overlayThirdColor – background color of overlay generated on capturing a value while defining an advanced KV pair

@overlayFourthColor – background color of overlay generated on captured value on Table Extracted on RV Screen

@overlayBorderColor – Color of all overlay boundaries
OverlayBoundaryColour.jpg

 

  • @themeImagePath: path of the image folder for the respective theme. For example:-

Default_theme: default_theme/images/
After making these changes user needs to clear the browser cache and refresh the page to see the reflected changes.

 

Function Keys

Overview

This functionality aims at providing the application user (mainly review operators) to have the flexibility of customizing it according to its own needs by adding shortcuts to RV screen performing a specific operation. The user will be allowed to run some code script as per the need which will be fired just by pressing a key.

 

  • Parameters involved:
    • Method name: defines the name of the method in the script which should be executed upon usage.
    • Key: the function key associated with the method. Can be used as a shortcut.
    • Description: contains the user’s description for the method.

AddFunctionKey.jpg

  • Sample values:

FunctionKeySampleValues.jpg

Characteristics

  • The functionality allows one to customize the RV screen to use shortcut keys performing user defined functions.
  • The user can associate a function key to a particular method specified in the script ‘ScriptFieldValueChange.java‘ present at the location ‘{Ephesoft-Home}\SharedFolders\BCID\scripts’
  • User can run the script either by clicking the button displayed on UI or the function key button available on the Review Validate UI.
  • User can only associate one method to a particular key but same method can be assigned to multiple keys.
  • User can choose all the values from F1 to F11 except F5.
  • The functionality allows user to add description of the script associated to a key.
  • Function keys are document type specific and will only be displayed on RV screen if selected document type has function keys defined for a batch class.

Working

  • When a batch reached Review/Validation stage, user can either press the function key to run a particular method or can press the function key button displayed in the 2nd panel.

FunctionKey RVScreen.jpg

 

  • A dialog box saying ‘Executing script’ will appear. By the time it goes off, user’s script has been executed.

FunctionKey ExecutingScript.jpg

Grid Computing

Overview

Ephesoft supports distributed computing via Grid Computing Enabled Workflows. These batch classes have the feature for transferring the batches from one independent Ephesoft server to another (over the network either internet or intranet), with the assumption that, batch class on the both the system should be same. Each Ephesoft server transfers the data using Web-Services and FTP.

Configuration

Configurable Properties

Following are the list of configurable properties for the above configuration:

  • FTP Configuration

Create the FTP connection for transferring and retrieving file from FTP server.

It uses the following properties for data transmission:

(META-INF\dcma-ftp\dcma-ftp.properties)

 

Configurable property Type of value Value options Description
ftp.server.url String Any valid url.Default Value: ftp.yourFTPserver.com The URL for FTP server.
ftp.server.username String Any string.Default Value: ephesoft Username used for accessing the server.
ftp.server.password String Any string.Default Value: ******** Password required for user’s authentication.
ftp.number_of_retries Integer Any Integer Value.Default Value: 3 Number of retries a client makes if any exceptions occur while transferring or retrieving file from FTP.
ftp.upload_base_dir String Any valid directory path.Default Value: test Folder location on ftp server in which data has to be uploaded or location from which the data to be downloaded.
ftp.data_timeout Integer Any Integer value.Default Value: 600000 Maximum time provided for the data transfer. If data is not uploaded or downloaded within the specified time interval, transfer is stopped. Stored in milliseconds.

Assumptions:

  • upload_base_dir property mentioned in the properties file should be present on ftp server.
  • Username and Password are valid.
  • Same Batch Class must be available on all the Ephesoft Server Instances.
  • Dcma-Workflow.properties Configuration

Web Service related configuration need to be provided in dcma-workflow.properties file.

 

Configurable property Type of value Value options Description
dcma.batch.status.cronjob.expression String Any valid cron expression.Default Value: 0 0/1 * ? * * Cronjob for batch status pulling of remote batches. Defines the time after which a source machine will check the remote location for the result or Batch Instance status.
wb.hostURL String Any valid http url.Default Value:http://172.16.1.68:8080/dcma/rest Host URL in the specified format. i.e. http://LocalHostAdress:port/dcma/rest.
wb.folderPath String Any valid folder path.Default Value: test Server folder path.

Assumptions:

  • wb.hostURL should be like http://172.16.1.68:8080/dcma/rest.
  • hostURL link should have unique IP Address or user domain name in URL.
  • Wb.folderPath should be same as ftp.upload_base_dir.
  • Web based Configuration

BatchClassManagementBatchClassModuleEdit

EditBatchClassModule.jpg

Upon clicking the Edit button, following screen will be presented where user can configure grid computing properties:
ConfigureGridComputingProperties.jpg

Map the remote URL and remote Batch Class identifier on which module user need to execute on remote server, except in Folder Import Module.

Constraints

  • User can restart batch from the module which is executing on his local system.
  • User cannot restart those batches which are executing remotely.
  • User cannot delete the batch instance if it is transferred to other system. Also if the batch is transferred from the one system to another than none of the user can delete that batch instance.

Troubleshooting

Following are few common error messages received due to mal-functioning of the feature:

 

S no. Issue Possible root cause
1 Source directory is null wb.folderPath and ftp.upload_base_dirBoth the paths should be same and valid
2 Destination directory is null wb.folderPath, ftp.upload_base_dirBoth the paths should be same and valid
3 Invalid Connection to FTP server Invalid attempt to make FTP connection.
4 Error in generating output Stream for file Invalid output file name.
5 TargetServerURL is null Check the remote URL entry for the batch
6 BatchInstanceId is null Check database connection or network.
7 BatchClassId is null Check the remote batch class ID entry for the batch
8 SourceServerURL is null hostURL is not mapped in property files.
9 FolderPath is null folderPath is not mapped in property files.
10 moduleName is null Check database connection or network.
11 batchName is null Batch name is not found in batch xml.
12 Exception in transferring batch to remote location Any of the error among 5 to 11 must have caused this.

<html xmlns:v=”urn:schemas-microsoft-com:vml” xmlns:o=”urn:schemas-microsoft-com:office:office” xmlns:w=”urn:schemas-microsoft-com:office:word” xmlns:m=”http://schemas.microsoft.com/office/2004/12/omml” xmlns=”http://www.w3.org/TR/REC-html40“>

<head> <meta http-equiv=Content-Type content=”text/html; charset=windows-1252″> <meta name=ProgId content=Word.Document> <meta name=Generator content=”Microsoft Word 14″> <meta name=Originator content=”Microsoft Word 14″> <link rel=File-List href=”1_files/filelist.xml”> <link rel=Edit-Time-Data href=”1_files/editdata.mso”> <link rel=themeData href=”1_files/themedata.thmx”> <link rel=colorSchemeMapping href=”1_files/colorschememapping.xml”> <style> </style> </head>

<body lang=EN-US style=’tab-interval:.5in’>

Installer Upgrade

Overview

This document describes step by step procedure of upgrading Ephesoft on a machine. This document should be referenced when user is going to upgrade Ephesoft Application through Ephesoft installer setup.

 

Steps of execution

Following are the steps to upgrade an existing installation with Installer version 3.0.3.0 or later-

 

  • Points to check before upgrade –

a. If user is running Ephesoft by JavaAppServer (Apache tomcat) then first stop server before upgrading.
b. Any file and folder inside Ephesoft install directory (The path where previous installation exists) like dcma-all.log and other such files and folders must be closed before upgrading.
c. Make sure that Windows installation drive(C drive in most of cases) has enough free space so that installer setup file can be properly extracted. Same in the case of drive where previous Ephesoft application is installed.

 

  • First Step –

Run command prompt as administrator and execute the command as shown in below screen shot –
Installer upgrade cmd.jpg
In this command “D:\Ephesoft_3.0.3.2.msi” is the path of Ephesoft installer setup on user’s system. This command will initiate Ephesoft installer setup and below screen will be displayed to user –
Installer upgrade welcome screen.jpg
Click on ‘Next’ button. Following screen will be displayed to user –
Installer upgrade license agreement.jpg
Accept Ephesoft end user agreement by clicking check box on UI and then click ‘Next’ button.

 

  • Second step 

After License agreement UI following screen will be displayed –
Installer upgrade upgrade.jpg
If this checkbox is checked before clicking next button then after successful upgrade when user start Ephesoft, database patch will execute on database and if this checkbox is unchecked before clicking next button then database patch will not execute.
Uncheck this checkbox if user is upgrading an Ephesoft installation which is in multi-server environment and database patch is already executed on common database by some other Ephesoft installation.
For standalone Ephesoft application this checkbox must be checked before clicking ‘Next’ button.

 

  • Third step –

After clicking ‘Next’ button on Ephesoft Upgrade Installation screen, following screen will be displayed if some files or folders are in use or installer setup has no sufficient privileges to perform Windows service operations –
Installer upgrade popup.jpg
Close all such files and folders and re-run installer setup with admin privileges.

 

  • Fourth step –

After completing all these steps below screenshot will be displayed-
Installer upgrade ready install.jpg
Click on install button and installer will do rest of the work.

 

  • Fifth step –

After complete installation, install the latest license of Ephesoft and please restart the machine.

 

  • Sixth step –

Start EphesoftEnterprise and EphesoftWebService services manually as installer can’t start these services. Upgrade procedure by installer setup is completed now.

Integrating External Application with Ephesoft Validation

Overview

Ephesoft allows its customers to develop external modules or applications and integrate them to work together with Ephesoft. This document gives in depth details on how to integrate external modules with Ephesoft Validation module.

External modules/applications are technology independent and can be written in any language HTML/JavaScript or GWT or JSP/Servlet or combination of both.

Review Validation Screen

AppsOnRV.jpg

There are shortcut keys as well as buttons defined to fire an External Application for a batch on the Review Validate UI. (App1, App2, App3, App4 as seen in the above screenshot)

When the shortcut key or the App button is pressed, the integrated application or module will be displayed in the Review Validation screen as Modal Window.

In the image below, the Right Hand side shows the image of the documents whereas the Left Hand side shows the integrated application/module.
External Application

ExternalApplication.jpg
IMPORTANTbatch.xml must be updated by the external application. Ephesoft simply loads the batch.xml.

Configuration

Please follow the below steps to integrate external application with Ephesoft validation:

  • Let’s assume external application is available at http://localhost:8080/dcma/ExternalApp.html.
  • Login to the Ephesoft Admin Module (Batch Class Management).
  • Navigate to Batch Class -> Modules -> Review/Validate Document module -> Validate Document plugin.

ReviewDocumentPluginConfiguration.jpg

Closing the external application modal window:

As one can see in the screenshot of the External Application on the review-validate screen, there were two buttons present for closing the external application (earlier).

The OK button provided the functionality of refreshing and displaying refreshed content (on the review-validate screen) for the batch that has been modified through the external application.

The CLOSE button simply got back to the review-validate screen without refreshing the content of the batch, assuming that no changes have been made to the batch.xml by the external application.

The extra clicks that the user used to do in case of refreshing the screen or closing the pop up dialog window have been removed now.

Now both the functionalities (i.e. refreshing the screen after batch.xml updates or closing the pop up window without refreshing the Review Validate Screen) will be the implemented via third party applications. Ephesoft application provides a handle to the externally integrated applications by means of which both applications can communicate.

Ok and cancel buttons have been removed. External applications need to copy the below mentioned method in their code. External applications need to invoke this method on the respective button (ok or close) calls which it has implemented. External applications will signal Ephesoft to perform respective operations by passing the appropriate operation string in the method argument. Accepted operation strings are listed in the table below.

Method code for GWT based applications:
private native void fireEvent(String operation) /*-{

window.top.postMessage(operation,”*”);

}-*/;
Method code for Javascript based applications:
function fireEvent(var operation) {

window.top.postMessage(operation, “*”);

}

The action performed by us in accordance to the argument passed to this method in the external application’s code:

 

Argument Passed by External application Result on our application
“Save” The dialog box containing the external application on the review-validate screen closes and the changes made in batch.xml get reflected on the screen(The functionality previously provided by the OK button on the dialog box)
“Cancel” The dialog box containing the external application on the review-validate screen closes, without refreshing the RV screen. (The functionality previously provided by the CLOSE button on the dialog box)
Any other string No change (Dialog box will not disappear)

How to use the External Application:

It is expected that the external application would play around with the data contents of the documents (i.e. the batch) presently being displayed on the review validate screen. (For which the external app is fired)

Hence Ephesoft Application provides the external application the following two parameters appended in its URL, using which the external app can fetch/modify/delete the contents of the batch:

  • Path of the batch.xml for the current batch: The batch.xml contains the information regarding the batch. Ephesoft application provides the batch.xml path, which the external app can parse and play around with as and when it likes.

The parameter is specified in the URL by: “batch_xml_path”
Encoding of Batch xml path parameter:

The batch xml path is encoded using java.net.URLEncoder and UTF-8 encoding.

 

  • Document Identifier: The identifier of the document in focus is also passed onto the external application. The parameter is specified in the URL by: “document_id”

Sample URL fired for an external app by Ephesoft:

{Ext. App URL}&document_id={Document Identifier}&batch_xml_path={Path of batch.xml}&ticket={Security Token}

Or

{Ext. App URL}?document_id={Document Identifier}&batch_xml_path={Path of batch.xml}&ticket={Security Token}

 

External Application and Security:

Ephesoft application generates a dynamic token for every External Application window which is opened via Ephesoft Application. This token is sent to the External App by appending another parameter “ticket” in the External App URL. Once this token is received by the External App, it can hit the below provided URL for checking the authenticity of the token. (Note: Please send the token as received by the application)

http://{EphesoftServerIP}:{port}/dcma/authenticate?ticket={ticket}

Ephesoft Server in response will send a status code as to whether this ticket is valid or not:

Status Code 200 – Authorized

401 – Unauthorized

The token is issued as soon as the user opens the external application window. A valid token becomes invalid once:

  • Token has already been sent to the Ephesoft server for authentication.
  • One hour after this token has been issued.

Configuring the Title of External Applications through the admin UI

Ephesoft application has eliminated the Application URL from the title of External App Window. Title is now configurable through the Admin UI.

 

Learning

Overview

A well-formed set of HOCR xml files which are placed in a hierarchical structure such as: Batch Class > Document type > Page type, is used for the purpose of registering few standard HOCR xml documents with Lucene search engine. This process is called learning because it is like feeding the xml files into Lucene memory so that when a batch instance comes, it can be compared with these memorized documents to find a best match. Note that learning is a one-time-process for all batch instances and is really helpful in making the process fast.

Steps of learning

  • First create document type that Ephesoft has to recognize. Suppose user has created HUD-1 document type in batch class BC1.
  • Edit BC1 and click on ‘Generate Folder’ button.

LearningGenerateFolder.jpg

  • Browse “Ephesoft-install-dir\SharedFolders\BC1\lucene-search-clasification-sample” folder. There will be following three subfolder –
    • HUD-1_First_Page
    • HUD-1_Last_Page
    • HUD-1_Middle_Page

LearningSharedFolders.jpg
The first and last page of the document goes in the HUD-1_First_Page and HUD-1_Last_Page respectively and all other pages of the document go in HUD-1_Middle_Page.

In the provided sample, image 000001 is the first page and image 000002 is the last page of the document type HUD-1. All other pages are different document types. The sample does not have middle pages for HUD-1 document type.

 

  • Click the Learn Files button.

Learning LearnFiles.jpgThe Ephesoft software is now ready and has learned the document type of HUD-1.

Troubleshooting

Following are few common error messages received due to mal-functioning of the learning:

 

S no. Error message Possible root cause
1 Problem occurred while learning/Problem learning files.
  • Network connection failure.
  • Multiple networks connected to system. E.g. LAN and WLAN connected at a same time.
  • License is not installed or invalid.
  • Tomcat is not up.

 

Multi word data population in a DLF

Overview

This functionality allows a user to select multiple words from the image on Validation screen to be populated into the selected DLF. It works similar to the usual word processing applications where Control is used to select multiple values at once and Shift to select an area of values.

Characteristics

  • Control functionality allows the user to select multiple words from distant places in the image.
  • Shift functionality allows a user to select entire text occurring between 2 selected words.
  • User can also define a rectangular area over the 3rd panel image using mouse right click and the entire text present inside or overlapping the defined area will be populated in the selected DLF.
  • This functionality can be helpful if for a particular field, the data in the image is divided in various parts.
  • Functionality works similar to the usual word processing applications.

Steps of execution

  • When a user opens a batch on Validate screen, he can use any of the above defined methods to populate multiple data in selected DLF.
  • For this, user first needs to select the DLF (Document Level Field) to which the data needs to be populated. After that, user can use any of the 3 options:
    • Control functionality:

User needs to press the Ctrl key first and then select multiple values from distant places in the image. Selected values will be displayed populated in the selected DLF (See screenshot)

MultiwordDataInDLF.jpg

In above example, user first presses Ctrl key and then using mouse, he clicks ‘Field1’ and then ‘456’. Resultant data gets populated in the selected DLF.

Note: Data which is selected after Ctrl key press will only be displayed populated in DLF. Once a user releases Ctrl key, no other data selected later (even after pressing Ctrl key again) will be concatenated to the existing data.

    • Shift functionality:

User needs to press the Shift key first and then select 2 values from distant places in the image. All data occurring in between the selected values will be displayed populated in the selected DLF (See screenshot)

MultiwordInDLF ShiftFunctionality.jpg

In above example, user first presses Shift key and then using mouse, he clicks ‘Field1’ and then ‘456’. Resultant data gets populated in the selected DLF.

    • Area selection:

To populate all the data occurring inside a particular area, user can draw a rectangle over the 3rd panel image. All overlapping data will be populated in the selected DLF as displayed in screenshot.

MultiwordInDLF AreaSelection.jpg

Performance Reporting

Overview

This standalone module enables the user to generate execution reports on the basis of the batch class, user etc. Reports can be calculated per module, plugin or user basis. There are two ways of generating reports

  • Through UI
  • Manually through scripts

The module aggregates the report-data on the basis of user’s choice parameters.

Admin has the options of generating reports for module, plugin or all users.

Admin can:

  • Get reports per page for a Workflow Type for a specified time.
  • Get reports per page for a User for a specified time.
  • Get total records for a Workflow Type for a specified time
  • Get total records for a User for a specified time.

Configuration

Property File Configurations

Property file: ‘{Ephesoft-Home}/WEB-INF/classes/META-INF/dcma-data-access/dcma-db.properties’:-

 

Configurable property Type of value Value options Description
hibernate.connection.username String Root Database’s username
hibernate.connection.password String **** Database’s password
hibernate.connection.url String jdbc:mysql://localhost:3306/report DBMS specific connection URL
hibernate.connection.driver_class String com.mysql.jdbc.Driver DBMS specific class driver
hibernate.show_sql Multi select True/False Option to sql command to logs file
hibernate.dialect String Integer value DBMS specific query dialect

Property File: ‘{Ephesoft-Home}/WEB-INF/classes/META-INF/dcma-performance-reporting/dcma-report-db.properties’

This property file needs to be configured for connecting to reports database. By default, it is configured to point to the reports database created by Ephesoft. If user wants to use different database, this property file needs to be configured accordingly:

 

Configurable property Type of value Value options Description
hibernate.connection.password String NA Password of reports database.
hibernate.connection.username String NA Username of reports database.
hibernate.connection.url String NA Connection string for reports database. Example: jdbc:mysql://localhost:3306/reports
hibernate.connection.driver_class String For MySQL: com.mysql.jdbc.DriverFor mssql:jdbc:jtds:sqlserver://localhost;databaseName=reports;user=ephesoft;password=Password## Driver required for connecting to the database. Example: for MySQL, it should be set tocom.mysql.jdbc.Driver

Property file: ‘{Ephesoft-Home}/WEB-INF/classes/META-INF/application.properties’:-

 

Configurable property Type of value Value options Description
report.ant.buildfile.path String {Report}\ephesoft-reporting\build.xml This property defines the absolute path of build.xml file that is bundled with the stand alone java program for reports.
enable.reporting String
  • True
  • False
Whether or not reports UI will be displayed to the user. If set to True, reports UI will be displayed otherwise not.

Reports generation

Reports can be generated by user either by user interface provided for reporting or by executing reporting scripts manually.

Reports generation from UI

In order to run Ephesoft reporting UI, user needs to click on syncDB button.

Select any option from modules, plugin and user, start and end date, then click on GO.

Reporting data will be displayed in tabular format.
ReportsGeneration.jpg

Manually reports generation through scripts

For running the stand alone java program to load the report, please perform the following steps for the first time installation:-

  • Application needs a new database to load the report data. Run the “init-data.sql” found at

“{Report}\ephesoft-reporting\META-INF\dcma-performance-reporting\init-data.sql”.
Change the database username and password in these files:

  • {Report}\ephesoft-reporting\META-INF\dcma-performance-reporting\hibernate.cfg.xml. This will point to the new database “reports” just created.
  • {Report}\ephesoft-reporting\META-INF\dcma-performance-reporting\hibernate-dcma.cfg.xml. This will point to the existing Ephesoft database.
  • Set the environment variable ANT_HOME and corresponding CLASSPATH in environment variables. Not required in case user want to run the ant from {ANT_HOME}/bin/* directory.

Perform the following steps every time the scripts are required to run:-

  • Navigate to the installation directory. Run “ANT” with either of these targets:
    • ANT start-report-generator
      • This starts the scheduler based service. This is the default behavior of the script.
    • ANT stop-report-generator
      • This stops the scheduler based service if already started.
    • ANT manual-report-generator
      • This runs the specified service for just one time and exits.
  • The ANT Command window needs not to be closed if the scheduled service needs to be run in background. If closed, the scheduler service is stopped. In order to start it again, please delete the following directory “C:\ephesoft-data\report-data\lock” and then invoke the ant start target.
  • The scheduler service is scheduled to run at 1 a.m. every day by default.

User can change the configuration from this file: “{Report}\ephesoft-reporting\META-INF\dcma-performance-reporting\dcma-report-scheduling.properties” file.

Read Only Document Level Fields

Overview

When a batch class owner needs to extract a document level field but do not want to change its extracted value at any stage and also do not want to allow any user (review /validation user) to change its value, then the owner can make that field ‘read-only’.

  • By making a field read-only, batch class admin can restrict its value to be non-modifiable at any stage.
  • When a field is set as read-only, no regular expression can be applied for that field. KV-Extraction and advanced KV-Extraction rules can be applied to the read-only fields as it can be done for any regular field.
  • This feature is useful for those document level fields which do not require user intervention at validation stage .Fields whose values are so obvious and do not need to be changed.

Setting the read-only flag

User can make a field as read-only by selecting the checkbox ‘isReadonly’ which is displayed on editing a document level field as shown in screenshot below:

ReadOnlyDLF.jpg
After selecting the read-only attribute, the selected fields will be non-editable in review Validate screen as shown in screenshot below(Invoice Date and State was non-editable):

ReadOnlyDLF RVScreen.jpg

 

Recostar Design Studio

Overview

Ephesoft allows extraction of fixed form documents that contain zonal Barcode, OCR, ICR and OMR fields. Ephesoft makes it easy to configure fixed forms by following below easy steps:

  • Create/Edit Batch class.
  • Create/Edit Document Type. Please note that fixed form processing naming convention does not allow spaces in the form (Document Type). Forms should not start with numeric values either.
  • Create/Edit Index fields. Please note that fixed form processing does not allow spaces in the field name. Fields should not start with numeric values.
  • Design a Recostar Project file, .RSP.
  • Copy the RSP file into the batch class folder, \\SharedFolders\{Batch Class}\recostar-extraction.
  • Assign RSP file to document type using Ephesoft Admin Module (Edit Document Type)

Recostar Design Studio

RecoStarDesignStudio.exe can be found in “{Ephesoft- Home}\native\RecostarPlugin\RecoStarDesignStudio” in default configuration. This tool allows the Ephesoft admin to define the zonal areas for OCR, ICR or OMR field extraction.

Steps of execution

Please follow the steps below. Here, Tax return form called BOE is used. The same techniques can be applied to any fixed forms application.

Sample files covered here can be downloaded.

  • Batch Class
  • Recostar Project File definition
  • Launch RecoStarDesignStudio.exe

RecostarDesignStudio Run.jpg

  • Select New Project

RecostarDesignStudio SelectNewProject.jpg

 

  • Select Single Form

RecostarDesignStudio SelectSingleForm.jpg

  • Give a name and select a project location. In this example BOE is used as a project name and project is saved at C:\Fixed Form Projects folder. User can save them to any location. Later, application will move these files to Batch class folder.

RecostarDesignStudio NewProjectFileName.jpg

  • Project name as BOE is seen below.

RecostarDesignStudio ProjectName.jpg

  • Select some sample images so that zones can be drawn.

RecostarDesignStudio DrawZones.jpg

 

  • Right click on the image area and select the desired images by using “Add Files”

RecostarDesignStudio AddFiles.jpg

  • In this example, one image is selected.

RecostarDesignStudio SelectFile.jpg

 

  • Once the image is selected, click “Next”.

RecostarDesignStudio WorkingImageFiles.jpg

  • On this step, select the country user is operating in. Multiple countries can be selected. Once user clicks on the USA which is default country, menu option (…) appears where user can select more countries.

RecostarDesignStudio MandatoryParameter.jpg

 

  • Project file can now be created with one form and one ICR field. Click Finish to proceed.

RecostarDesignStudio ReadyToInstall.jpg

  • Initial Project has been created.

RecostarDesignStudio InitialProject.jpg

 

  • Rename the form to doc type called BEO. This has to match with the document type in Ephesoft Admin module.

RecostarDesignStudio RenameForm.jpg

  • Rename index field to Year.

RecostarDesignStudio RenameIndexField.jpg

 

  • Re-Arrange zone

RecostarDesignStudio RearrangeZone.jpg

  • Field is now renamed to Year

RecostarDesignStudio FieldRenamed.jpg

 

  • Zoom into the image

RecostarDesignStudio ZoomImage.jpg

  • After zooming in, arrange the zone so it covers only the value 2006

RecostarDesignStudio ArrangeZones.jpg

 

  • Add “Remove Lines” options

RecostarDesignStudio AddRemoveLines.jpg

  • Remove Lines option is added

RecostarDesignStudio RemoveLinesAdded.jpg

 

  • Add new field called Account Number

RecostarDesignStudio AddFieldAccountNumber.jpg

  • New field has been added

RecostarDesignStudio AccountNumberFieldAdded.jpg

 

  • Zoom into the new field

RecostarDesignStudio ZoomIntoAccountNumberField.jpg

  • Adjust field Location/Zone

RecostarDesignStudio AddLocationField.jpg

 

  • Set field properties. i.e. Font should be Machine Type.

RecostarDesignStudio SetFieldProperties.jpg

  • Run selected images to see the results

RecostarDesignStudio RunSelectedImages.jpg

 

  • View Results.

RecostarDesignStudio ViewResults.jpg

  • Copy the RSP file into the batch class folder, <Ephesoft-Shared-Folder>\{Batch Class}\recostar-extraction. Please note that BOE.rsp file needs to be on the <Ephesoft-Shared-Folder>\{Batch Class}\recostar-extraction folder. There should not be another folder layer.
  • Map RSP file to Document Type. Logon to the Ephesoft Admin Module, Batch Class Management. Navigate to the Document Type BOE and select BEO.rsp file from the Fixed Form Project File drop down menu. If BEO RSP file can’t be seen, please check if an extra layer of folder is not copied in step 2 above.

RecostarDesignStudio MapRSPToDocumentType.jpg

  • Save the batch class and run documents through the system.

Review Document

Overview

This document defines the operations that can be done on a batch in review state. During this stage, the user can perform various operations on the batch like classifying, splitting, copying, deleting the document etc. The document also explains the various plugin properties that can be set for batches that are in review state. Whenever batch comes to review state, its status is changed to “Ready for Review” and it needs to be reviewed by the user manually, if it is not reviewed automatically (i.e. its confidence score is less than the specified threshold). After the review, the batch processing continues until it reaches validation stage.

To access a Batch in Review state, user needs to hit the URL {http://localhost:8080/dcma/ BatchList.html}, Click on Review sub tab and then click on a batch displayed in the grid.

BatchList ReviewSubTab.jpg

This will open a batch on Review screen (see below screenshot)

BatchList BatchDetailTab.jpg

Configuration

Please follow the below steps to set Review plugin properties:

  • Login to the Ephesoft Admin Module (Batch Class Management).
  • Navigate to Batch Class -> Modules -> Review Document module -> Review Document plugin.

BatchClassManagement ReviewDocumentPlugin.jpg

Properties:

 

Configurable property Type of value Value options Description
External Application Switch List of values
  • ON
  • OFF

 

This field is used to develop external modules or applications and integrate them to work together with Ephesoft.
X Dimension Integer Integer value To specify the x-dimension of the external application in pixels.
Y Dimension Integer Integer value To specify the y-dimension of the external application in pixels.
URL1 Title, URL2 Title, URL3 Title and URL4 Title String N-A These properties hold titles for the external application.
URL1 (Ctrl+4), URL2 (Ctrl+7), URL3 (Ctrl+8) and URL4 (Ctrl+9) String N-A To fire the specified External Application for a batch on the Review Validate UI. URL of the external application is specified here which can be accessed via shortcut keys (Ctrl+4, etc.) as well as by pressing buttons defined. (App 1, App4, App2, App 3 as can be seen in the below UI).

Review screen with External Application switch ON will look something like this:

BatchDetail ExternalAppSwitchON.jpg

Features List

There are three panels in the review screen.

  • Left-most-panel or 1st panel – showing document tree having all classified and unclassified Ephesoft documents in a batch.
  • Middle-panel or 2nd panel – contains the review panel and the thumbnail images of next and previous document pages. Review panel contains the list of document types and the list of documents available for merging. Thumbnail images of the previous and the following document, w.r.t current selected document, are shown.
  • Right-most-panel or 3rd panel shows the enlarged image of the selected document.

BatchDetail ThreePanels.jpg
In the document tree, there are classified as well as unclassified documents. Classified documents are marked by a green tick on its right-top. Unclassified documents are marked by a red question-mark on its right-top.

Clicking on shortcuts will open a table of shortcuts for operations like saving, splitting, merging, deleting the document etc.

BatchDetail KeyboardShortcuts.jpg
The top-most-panel contains the buttons/shortcuts for splitting, deleting, rotating the document, etc.

BatchList TopPanel.jpg

 

Fresh Installation Steps

Overview

This document describes step by step procedure of installing Ephesoft on a machine. This document should be referenced when user is going to install Ephesoft through Ephesoft installer setup for very first time.

 

Steps of execution

Following are the steps for fresh installation with Installer version 3.0.3.4 or later:

 

  • Run command prompt as administrator and execute the command as shown in below

screenshot:
Fresh install cmd.png
In this command “D:\Ephesoft_3.0.3.4.msi” is the path of Ephesoft installer setup on user’s system.

 

  • The above command will initiate Ephesoft installer setup on machine and below screen will be displayed to user:

Fresh install welcome screen.png

 

  • On clicking ‘Next’ button, following screen will be displayed to user:

Fresh install license agreement.png

 

  • Accept Ephesoft end user agreement by clicking check box on UI and then click ‘Next’ button. Following screen will be displayed:

Fresh install dotnet.png
If .Net framework 4.0 is not installed on machine, ‘Next’ button remains disable and a button with title ‘Download’ will appear on UI. Click this button to download .Net framework 4.0. Clicking this button opens appropriate web link from where user can download .Net framework 4.0. Download and install .Net framework 4.0 and then re-run Ephesoft installer setup.
If .net framework is installed on machine above screen will appear with ‘Next’ button enabled. Simply click on next button in this case.

 

  • After clicking ‘Next’ button on .NET  Framework installation screen, following

screen will be displayed to user:
Fresh install prerequisites.png
If C++ redistributables are not installed on system, Installer setup will first install C++ redistributables and then enable ‘Next’ button. Now click on ‘Next’ button.

 

  • After clicking ‘Next’ button on Ephesoft Prerequisites Installation screen, following screen will be displayed to user:

Fresh install select database.png
o Select radio button 1 if user either wants to install a new instance of MySQL server or want to configure existing MySQL installation.
Installer will just update properties file with MySQL server configuration information but will not create Application and report database on MySQL server. Run {Ephesoft-install-directory}\ Dependencies\MySQLSetup\ephesoft-mysql-config.sql manually on remote or local MySQL server.
Fresh install select database mysql.png
Following are the screens through which user can install new MySQL server instance on his machine:
Fresh install select install mysql.png
Fresh install install mysql config.png
Following are the screens through which user can configure existing MySQL server:
Fresh install select configure mysql.png
Fresh install configure mysql config.png
Please enter all server configuration information correctly installer will use this information in properties files. Make sure DB names should unique.
o Select radio button 2 if user either want to install a new instance of MS SQL server or want to configure existing MS SQL installation (local\remote).
If MS SQL server is installed on local machine then installer can configure local or remote MS SQL server and if MS SQL server is not installed on local machine and user want to configure remote MS SQL server instance then user has to create Application and report database manually. Run {Ephesoft-Home}\ Dependencies\MsSQLSetup\ ephesoft-mssql-config.sql manually on remote MS SQL server.
Fresh install select database mssql.png

 

  • Following are the screens through which user can install new MS SQL server instance  on his machine:

Fresh install install or configure mssql.png
Fresh install install mssql config.png
When user will click next button then Ephesoft installer setup will start MS SQL installation in silent mode.
Following are the screens through which user can configure existing MS SQL server :
Fresh install configure mssql.png
Fresh install configure mssql config.png
Please enter all server configuration information correctly installer will use this information in properties files. Make sure DB names should be unique.

 

  • After database configuration or installation following screen will be displayed to user:

Fresh install registration info.png
Fill all the information and then click on ‘Next’ button.

 

  • After Ephesoft Registration Information following screen will be displayed to user:

Fresh install shared folder config.png
o Select radio button 1 if user is not creating multi-server environment or does not has existing shared folder. Selecting this radio button will also install shared folder along with application setup.
Fresh install shared folder no.png
Fresh install destination folder.png
o Select radio button 2 if user is creating multi-server environment or user has existing shared folder. Selecting this radio button will not install shared folderIn this case Shared folder path is the path of parent directory of Shared Folders directory. For example if existing Shared Folders directory is inside a folder named as share and this folder is shared on system named EPHESOFT then shared folder path will be \\EPHESOFT\share not\\EPHESOFT\share\SharedFolders .
Fresh install shared folder yes.png
Fresh install shared destination folder.png

 

  • After completing all these steps below screenshot will be displayed to user:

Fresh install ready to install.png
Click on install button and installer will do rest of the work.
Fresh install read me.png
Fresh install finish.png

 

  • After complete installation, please restart the machine.

User Management

Overview

This module is responsible for handling the user’s connectivity to the application. It handles authentication as well as authorization process for the user.

Configuration

Login configuration

For a user to login into Ephesoft, user need to configure “server.xml” file located in the {Ephesoft-Home}\JavaAppServer\conf folder.

The admin will configure a tag named “Realm” located in server.xml. The tag can be located at following structure:

<Server>

<Service>

<Engine>

<Host>

<Context >

<Realm />

</Context>

</Host>

</Engine>

</Service>

</Server>
The realm tag has many configurable parameters. The use and need of these parameters depends upon the type of authentication server used by the user.

Various implementations can be configured at once. Please refer to this link for configuring the Realms according to your requirements. [#Standard_Realm_Implementations Tomcat Realms]

The commonly used realm configurations are:

The user which tries to login to the application, the username and password are verified against the mentioned authentication server using the specified configuration properties.

Ephesoft user roles handling

Ephesoft, on the basis of the roles of the user logged in to the application, decides the following:

  • Batch classes the user will be allowed to view on the batch class management view.
  • Batch instance the user will be allowed to view batch instance management view.
  • Folders the user is allowed to view on the folder management view.
  • Scanner profiles and other configurations on the web scanner view.

The user roles for the logged in user will be verified from authentication server configured in the property file{Ephesoft-Home}\Application\WEB-INF\classes\META-INF\dcma-user-connectivity\user-connectivity.properties:

Following is the list of the configurable properties for this properties file

 

  • LDAP configurable properties

 

Configurable property Type of value Value options Description
user.ldap_url String A valid URL to connect to LDAP server. The connection URL for LDAP type configuration in the “ldap://<server_address>:<port_number>” format.
user.ldap_config String N-A Class name for the LDAP context factory.
user.ldap_domain_component_name String N-A The domain component name for the LDAP configuration.
user.ldap_domain_component_organization String N-A The domain component organization name for the LDAP configuration.
user.ldap_username String A valid username to connect and access LDAP server. The username of the user responsible for interacting with the server. Only required if LDAP is configured.
user.ldap_password String A valid password to connect and access LDAP server. The password of the user responsible for interacting with the server. Only required if LDAP is configured.
user.ldap_user_base String N-A The relative path under which all the users information will be located. This path will be relative to the domain components specified by the user.
user.ldap_group_base String N-A The relative path under which all the groups/roles information will be located. This path will be relative to the domain components specified by the user.
  • MS-Active Directory configuration

 

Configurable property Type of value Value options Description
user.msactivedirectory_url String A valid URL to connect to Active directory server. The connection URL for msactivedirectory type configuration in the “ldap://<server_address>:<port_number>” format.
user.msactivedirectory_config String N-A Class name for the user-connectivity configuration.
user.msactivedirectory_context_path String N-A The directory path where the intended user resides.
user.msactivedirectory_domain_component_name String N-A The domain component organization name for the msactivedirectory type configuration.
user.msactivedirectory_domain_component_organization String N-A The domain component organization name for the msactivedirectory type configuration.
user.msactivedirectory_user_name String A valid username to connect and access Active directory server. The username of the user responsible for interacting with the server. Only required if Active Directory is configured.
user.msactivedirectory_password String The password corresponding to connect and access Active directory server. The password of the user responsible for interacting with the server. Only required if Active Directory is configured.
user.msactivedirectory_group_search_filter String N-A This filter defines can have |(OR), &(AND) and !(NOT) e.g. ((!(cn=a*))(|(cn=ephesoft*)(&(cn=b*)))
  • Tomcat specific configuration

 

Configurable property Type of value Value options Description
user.tomcatUserXmlPath String N-A The directory path where the tomcat configuration xml file resides.
  • Connection choosing configuration

 

Configurable property Type of value Value options Description
user.connection List of values
  • 0
  • 1
  • 2

 

The type of connection user wants for the application.

  1. for LDAP
  2. for MS Active Directory
  3. for Tomcat

 

Examples

LDAP

Realm

<Realm className=”org.apache.catalina.realm.JNDIRealm” debug=”99″

connectionURL=”ldap://localhost:389″

connectionName=”cn=Manager,dc=ephesoft,dc=com”

connectionPassword=”********”

userPattern=”cn={0},ou=people,dc= ephesoft,dc=com”

roleBase=”ou=groups,dc= ephesoft,dc=com” roleName=”cn”

roleSearch=”uniqueMember={0}”/>

user-connectivity.properties

  • user.ldap_url=ldap://localhost:389
  • user.ldap_config=com.sun.jndi.ldap.LdapCtxFactory
  • user.ldap_domain_component_name= ephesoft
  • user.ldap_domain_component_organization=com
  • user.ldap_username=cn=Manager,dc=ephesoft,dc=com
  • user.ldap_password=*******
  • user.ldap_user_base=ou=people
  • user.ldap_group_base=ou=groups
  • user.connection=0

MS-Active Directory

Realm

<Realm className=”org.apache.catalina.realm.JNDIRealm” debug=”99″

connectionURL=”[ldap://172.16.1.68:389 ldap://localhost:389]”

connectionName=”administrator@ephesoft.com

connectionPassword=”********”

userBase=”cn=Users,DC=ephesoft,DC=com”

userSearch=”(&(objectClass=person)(sAMAccountName={0}))”

userSubtree=”true”

roleBase=”cn=Users,DC=ephesoft,DC=com”

roleName=”cn”

roleSubtree=”true”

roleSearch=”member={0}” referrals=”follow” />

user-connectivity.properties

  • user.msactivedirectory_url=ldap://172.16.0.191:389
  • user.msactivedirectory_config=com.sun.jndi.ldap.LdapCtxFactory
  • user.msactivedirectory_context_path=CN=Users
  • user.msactivedirectory_domain_component_name= ephesoft
  • user.msactivedirectory_domain_component_organization=com
  • user.msactivedirectory_user_name=CN=Administrator,CN=Users,DC= ephesoft,DC=com
  • user.msactivedirectory_password=*******
  • user.connection=1 (for fetching group and user from active directory)

Multiple realm example

<Realm className=”org.apache.catalina.realm.CombinedRealm” >

<Realm className=”org.apache.catalina.realm.JNDIRealm” debug=”99″

connectionURL=”[ldap://172.16.1.68:389 ldap://172.16.1.68:389]”

connectionName=”administrator@ephesoft.com

connectionPassword=”********”

userBase=”cn=Users,DC=ephesoft,DC=com”

userSearch=”(&(objectClass=person)(sAMAccountName={0}))”

userSubtree=”true”

roleBase=”cn=Users,DC=ephesoft,DC=com”

roleName=”cn” roleSubtree=”true”

roleSearch=”member={0}” referrals=”follow” />
<Realm className=”org.apache.catalina.realm.JNDIRealm” debug=”99″

connectionURL=”[ldap://172.16.1.68:389 ldap://172.16.1.68:389]”

connectionName=”administrator@ephesoft.com

connectionPassword=”********”

userBase=”ou=test1,DC=ephesoft,DC=com”

userSearch=”(&(objectClass=person)(sAMAccountName={0}))”

userSubtree=”true”

roleBase=”ou=test1,DC=ephesoft,DC=com” roleName=”cn”

roleSubtree=”true” roleSearch=”member={0}” referrals=”follow”/>
<Realm className=”org.apache.catalina.realm.JNDIRealm” debug=”99″

connectionURL=”[ldap://172.16.1.68:389 ldap://172.16.1.68:389]”

connectionName=”administrator@ephesoft.com

connectionPassword=”********”

userBase=”ou=test,DC=ephesoft,DC=com”

userSearch=”(&(objectClass=person)(sAMAccountName={0}))”

userSubtree=”true” roleBase=”ou=test,DC=ephesoft,DC=com”

roleName=”cn” roleSubtree=”true”

roleSearch=”member={0}” referrals=”follow” />

</Realm>

 

Global realm example

<Realm className=”org.apache.catalina.realm.JNDIRealm” debug=”99″

connectionURL=”ldap://172.16.1.68:3268″

connectionName=”administrator@ephesoft.com

connectionPassword=”********”

userBase=” DC=ephesoft,DC=com”

userSearch=” (sAMAccountName={0})”

userSubtree=”true” roleBase=”ou=test,DC=ephesoft,DC=com”

roleName=”cn” roleSubtree=”true”

roleSearch=”member={0}” referrals=”follow” />

 

Validate Document

Overview

This document defines the operations that can be done on a batch in validation state. During this stage, the user can perform various operations on the batch like classifying, splitting, copying, deleting the document etc., along with the ability to change the value of the document level fields which have been extracted. The document also explains the various plug-in properties that should be set for batches that are in validation state. With the help of these properties Ephesoft facilitates fuzzy search option, suggestion box facility, and development of external modules or applications and integrate them to work together with Ephesoft. Whenever batch comes to validate state, its status is changed to “Ready for Validation” and it needs to be validated by the user manually, if it is not validated automatically.

Below is the screen shot of the BatchList page which contains a tab for the list of all the batches present in “READY_FOR_VALIDATION” state?

BatchList ValidationSubTab.jpg

Configuration

Please follow the below steps to set the validation plug-in properties:

  • Login to the Ephesoft Admin Module (Batch Class Management).
  • Navigate to Batch Class -> Modules -> Validate Document module -> Validate Document plugin.

BatchClassManagement ValidateDocumentPlugin scrollup.jpg

BatchClassManagement ValidateDocumentPlugin scrolldown.jpg
Properties:

 

Configurable property Type of value Value options Description
Field Value Change Script Switch List of values
  • ON
  • OFF

 

If the switch is enabled, then every time the field values are changed, the field value change script runs.Default OFF.
Fuzzy Search Switch List of values
  • ON
  • OFF

 

If the switch is enabled, then fuzzy search facility is enabled.Default ON.
Suggestion box Switch List of values
  • ON
  • OFF

 

If the switch is enabled, then suggestions for alternate values for document level fields are available.Default OFF.
External Application Switch List of values
  • ON
  • OFF

 

This field is used to develop external applications and integrate them to work together with Ephesoft.Default OFF.
Fuzzy Pop Up X Dimension (in px) Integer Integer value To specify the x-dimension of the fuzzy search result pop-up in pixels.
Fuzzy Pop Up Y Dimension (in px) Integer Integer value To specify the y-dimension of the fuzzy search result pop-up in pixels.
Validation Script Switch List of values
  • ON
  • OFF

 

If the switch is enabled, then whenever the batch in validation state is saved, the specified script runs.Default OFF.
External Application X Dimension (in px) Integer Integer value To specify the x-dimension of the external application in pixels.
External Application Y Dimension (in px) Integer Integer value To specify the y-dimension of the external application in pixels.
URL1 Title, URL2 Title, URL3 Title and URL4 Title String N-A These properties hold titles for the external application.
URL1 (Ctrl+4), URL2 (Ctrl+7), URL3 (Ctrl+8) and URL4 (Ctrl+9) String N-A To fire the specified External Application for a batch on the Review Validate UI. URL of the external application is specified here which can be accessed via shortcut keys (Ctrl+4, etc.) as well as by pressing buttons defined. (App1, App4, App2, App 3 as can be seen in the below UI).

External application on Review Validate Screen

BatchDetail ExternalAppOnRVScreen.jpg

Features List

There are three panels in this screen.

  • Left-most-panel or 1st panel – contains a document tree for the classified and unclassified Ephesoft documents.
  • Middle-panel or 2nd panel – contains the review panel and facilitates fuzzy search option. Review panel contains the list of document types and the list of documents for merging. Below review panel there is fuzzy search textbox, document level fields (with their extracted value) are present for the corresponding document.
  • Right-most-panel or 3rd panel shows the enlarged image of the selected document.

BatchDetailTab ThreePanels.jpg

Left-most-panel

In the document tree, there are classified as well as unclassified documents. Classified documents are marked by a green tick on its right-top. Unclassified documents are marked by a red question-mark on its right-top.

Middle-panel

Document level fields with their extracted values are displayed in the middle panel. In the below UI document level field is Invoice and extracted value is 5432000.Value of document level field can be populated by selecting overlay from image in right-most-panel

BatchDetail MiddlePanel.jpg

Clicking on the table view button opens another panel that contains a table corresponding to the selected document. This option only comes when there is some table configuration given for the batch class. If there is no table configured then this option doesn’t appear. The table should contain valid data. If any cell in the table contains any invalid data, then the batch is not validated.

Values in table can also be populated by selecting overlay from Right-most-panel.
Table on Review Validate Screen

BatchList TableOnRVScreen.jpg

Fuzzy search option returns table data that match a pattern approximately. Every document is mapped to a table in database. Data from the table in database is returned corresponding to the pattern specified in the fuzzy search textbox. A particular row from that table can be selected for populating data into document level fields.
BatchDetail FuzzyDBSearchResult.jpg
BatchList DLFFilledOnRVScreen.jpg

Right-most-panel

The right-most-panel contains the buttons for splitting, deleting, rotating the document, etc. These buttons can be used to perform some functionality given in the shortcuts tab. User can select any page from any document and use these buttons to perform the functionality shown in the screen shot below:

BatchDetail RightmostPanel.jpg

Clicking on shortcuts will open a table of shortcuts for operations like saving, splitting, merging, deleting the document etc. Following shortcuts are explained at this location: [#Keyboard_Shortcuts http://www.ephesoft.com/wiki/index.php?title=User_Manual#Keyboard_Shortcuts]

BatchDetailTab KeyboardShortcuts.jpg

 

Web Based Folder Management

Overview

It provides all web users to maintain ‘ephesoft shared folder’ including batch class folders, java script files and other configuration files. It has following below listed features:

  • Super Admin user can execute his batch using unc folder for required batch class.
  • New files can be uploaded as samples.
  • Old samples can be deleted.
  • New folders can be created.
  • Admins would be relieved from accessing server folder structure every time they desire to make some changes.
  • Super Admin users can view batch back up xml files as well and anlayze output of various executed plugins.
  • Super Admin users can view the final output PDF/TIFF generated by the application.
  • Users can configure some batch level property configurations.

To access it, user can hit the URL {http://localhost:8080/dcma/FolderManager.html} or click on the newly added tab displayed in the following image:

FolderManagementTab.jpg

Features list

The structure of this folder management feature has been designed in a way similar to make the job of Admins and Super Admins simpler.

  • FOLDER SELECTION WIDGET: This is a list box containing a list of batch class folders available for selection.

FolderManagement FolderSelectionWidget.jpg

The options appearing in the list of available folders depends upon the role assigned to the user who has logged-in:

  • Super Admin Users: The shared folder appears in the dropdown list. Also all the batch class folders present in the shared folders location will appear in the dropdown.
  • Admin Users: Only those Batch class folders (from the shared folder) appear in the list for which the user has permissions to access.

The first option in the list is selected by default. Sub folders are displayed in the Tree hierarchy on the left hand side of the screen.

  • Folder Structure Tree: All the subfolders are listed in the tree format as shown in the image below. User can click on the node to expand and select a sub folder as well.
  • Folder Content Table: The sub folders as well files contained are listed here.

FolderManagement FolderStructure&ContentTable.jpg

 

  • OPTIONS PANEL: There are a number of options available in the Options Panel above the table displaying the folder content:

Folder Options Panel:

  • New Folder: This creates a new folder under the currently selected folder. Each time a new folder is created, it is automatically assigned a name “New Folder” with an index appended at the end.
  • Up: This allows a user to go one level up in the folder structure.
  • Refresh: This refreshes the content of the folder currently selected in the folder tree.

Upload Options:

  • Browse: This allows user to browse through folder structure and select multiple files to upload.
  • Upload: Attached multiple files are uploaded to the selected folder on the click of this button.
  • View attached files: Just next to the Upload button is a link to view (and even remove) files attached using the browse button.

FolderManagement ViewOrRemoveAttachedFile.jpg

Multi-select Options: These options operate on the basis of files selected from the checkboxes available besides each file in the folder content table.

The options available are: Cut, Copy, Paste and Delete

Below the options panel, comes the table representing the table content. The table has the following sortable headers: Name, Modified on and Type.

FolderManagement ColumnSorting.jpg

For each entry in the table, the user is provided with the following options:

  • Double click on File/Folder: Double clicking any file/folder opens it. In case of a folder, the folder opens in the folder tree-table structure shown here itself. In case of a file, the file opens up in either the browser itself (if the browser supports it) or it prompts the user to open/save the file.
  • Right click on file/folder: Right clicking on a file/folder in the folder content table presents the user with various options:
    • Open: Open selected file or folder.
    • Cut: Cut the selected files or folders.
    • Copy: Copy the selected files or folders.
    • Rename: Opens a dialog box so that the file/folder can be renamed.
    • Delete: Delete the selected files or folders.
    • Download (Option only for files): Opens the browser dialog box to open/save the file.

FolderManagement RightClickOnFilleOrFolder.jpg
Please Note: Each of the above right click options are provided on individual files (on which the right click has been performed) and not on the selected folder or files.

Web Scanner Configuration

Overview

The purpose of this document is to show how to configure and run Ephesoft Web Scanner for the first time on any browser. Supported browsers are Firefox, Chrome and IE.

Configuration

Configuration steps are as follows:-

  • Enter the Ephesoft Web Scanner URL in address bar:-

For e.g.: http://localhost:8080/dcma/WebScanner.html

 

  • If login page appears than enter valid credentials and login to the application.

LoginScreen.jpg

 

  • Please check ‘Always trust content from this publisher’ on the security popup appearing on the screen (see below screenshot) when running the web scanner for the first time on any browser. After that, click ‘Run’ button.

WebScanner SecurityInformation.jpg

 

  • Refresh the browser now to use the Ephesoft Web Scanner.

Troubleshooting

  • If ‘Start’ button is not visible even after refreshing, please restart the browser.
  • If error message ‘Unable to perform action: INITIALZE on the Web Scanner Applet’ appear, please update browser’s Java plugin.

Workflow Configuration

Overview

This property file is used to configure the workflow i.e. how different services will be executed. This file consists of following four types of service configurations:

  • Pickup Service Configuration.
  • Resume Service Configuration.
  • Workflow Configuration.
  • Web Service Configuration.

Configuration

dcma-workflows.properties

Following is the list of configurable properties:

  • PickUpService Configuration

Pick up service runs as a scheduler service which keeps a watch on BATCH_INSTANCE table. Every time a batch is ready to be picked and its status is NEW/READY/LOCKED, this service takes a lock on that batch and triggers the workflow for that batch. Priority of batches to be picked up by pick up service is READY > LOCKED > NEW.

 

Configurable property Type of value Value options Description
dcma.pickup.cronjob.expression String Any valid cron expression.
Default Value:0 0/1 * ? * *(For this value, pick up service will be invoked every one minute.)
This parameter specifies the schedule for which Pick up service will run. This is specified as a cron job expression.
server.instance.max. process.capacity Integer Any Integer Value.
Default value: 5
This parameter specifies the max number of RUNNING batch instances that a server instance can process at a given instance of time.
server.instance.pick.capacity Integer Any Integer Value.
Default value: 3
This parameter specifies the maximum number of batches that a pickup service will pick in 1 round of execution, such that ‘batches picked up <=( max process capacity – RUNNING state batches)’
  • Resume Service Configuration

Resume service, like pick up service, also runs as a scheduler service. This service also keeps an eye on BATCH_INSTANCE table. It picks the batches that are locked by another server instance which is not active now or has gone down (as detected by HeartBeat service). It takes a lock on those batches and resumes the workflow for them.

 

Configurable property Type of value Value options Description
dcma.resume.cronjob.expression String Any valid cron expression.
Default Value:0 0/1 * ? * *(For this value, resume service will be invoked every one minute.)
This parameter specifies the schedule for which resume service will run. This is specified as a cron job expression.
server.instance.resume.capacity Integer Any Integer Value.
Default value: 4
This parameter specifies the max number of batches that a resume service can pick at one go i.e., in 1 iteration.
  • Other Configuration

Workflow configuration for deploying and sending mails when an error occurs in the workflow. When a batch instance goes into error, then a mail is sent via following configuration mentioned below:

 

Configurable property Type of value Value options Description
workflow.error.from_mail String Any valid email id.
Default Value: enterprise.support@ephesoft.com
This parameter specifies the e-mail id from which the mail should be sent, when some error occurs during batch processing.
workflow.deploy String * True

  • False

Default Value=true

This parameter re-deploys all the available workflows in the system. Should be set only once during the new installations or upgrades.
workflow.error.subject String Valid e-mail subject.
Default Value: Error in workflow execution!!
This parameter specifies the subject for mail to be sent when some error occurs during batch processing.
workflow.error.to_mail String Any valid email id.Default Value: enterprise.support@ephesoft.com This parameter specifies the e-mail id to which the mail should be sent when some error occurs while batch processing.
newWorkflows.basePath String Valid path of directory.
Default Value: {Application}\\SharedFolders/workflows
This parameter specifies the path where all jpdl(s) are placed when a new workflow or plugin is deployed.
  • WebService Configuration

These configurations should only be set in case of configuring Grid Computing workflow.

 

Configurable property Type of value Value options Description
wb.folderPath String Any valid path of ftp folder.
Default Value:test
This parameter specifies the folder path to be picked from the ftp location for processing batch in Grid Computing Batch Class.
wb.hostURL String Valid URL.
Default value:http://localhost:8080/dcma/rest
This parameter specifies the host URL for sending the batch instance from one Ephesoft instance to another.
dcma.batch.status.cronjob.expression String Any Integer Value.
Default value: 0 0/1 * ? * *
This parameter specifies the schedule for fetching the batch instance status of remote batch instance executing on another Ephesoft instance server. This is specified as a cron job expression.

Dependency

For e-mail on error functionality, it depends on the mail configuration done in “mail.properties” property file.

 

Workflow Management

Overview

This document covers all the aspects which user needs to configure a workflow. This document focusses on the preparation of content, i.e. plugins and their respective dependencies, needed by any workflow and criteria on which these plugin will be working.

Features

This tab is visible only to the admin and provides feature for adding a plugin and configuring its dependencies. On clicking the “Workflow Management” tab or accessing the “http ://<Server-name>:<port number>/dcma/CustomWorkflowManagement.html”, the user will see a screen containing the following as shown in the screenshot:

  • “Plugins List”: List of plugins already present.
  • “Add New Plugin” button: For adding or updating a plugin.
  • “Dependencies” button: On being clicked it will take user to a screen where it can manage the dependencies among the plugins.
  • “Help” button: User will be shown a pop-up message containing the information about the how to information on new plugin upload.

WorkflowManagement.jpg

Plugins list

Landing screen for the workflow management tab will contain the list of already installed plugins. The list displayed will support the default Ephesoft UI functionalities of pagination and sorting. The list will display the following for each plugin:

  • Plugin name
  • Plugin description

Add New Plugin

View

On clicking on “Add New Plugin” button, a file upload widget will open up with the following options. See the screenshot below:

WorkflowManagement AddNewPlugin.jpg

  • “Browse”: a file selection window will open up.
  • “Save”: this will save the plugin to the DB after validating the files.
  • “Cancel”: this will cancel the operation.

Working

Working of this functionality depends on the following conditions:

  • This widget will accept a .zip file for uploading. Contents of the zip file:
    • .Jar file: The Jar for the plugin to be added.
    • .Xml file: containing the plugin information. Please see below for structure of this XML file.
    • .Zip file must only contain these two files i.e. .Jar and .Xml.
    • .Zip file and .Jar file must have the same name.
    • .Jar file content cannot be verified, so the user must make sure that they are as required.
  • In order for this plugin to take effect, user needs to restart the tomcat server.
  • Note:
  • This Zip file after successful validation of its contents will be stored in the configurable location specified in “<Ephesoft installation path>\Application\WEB-INF\classes\META-INF\application.properties” file under the property named “plugin_upload_folder_path”.
    • The JPDL file for the uploaded plugin will be stored at <Ephesoft installation build>\Application \WEB-INF\classes\META-INF\dcma-workflows\plugins\<PLUGIN_NAME>

Plugin XML structure

Validation on XML

  • All tags are compulsory and will have any string value, except for “is-scripting”,” is-mandatory”, “is-multivalue” AND “override-existing” tags which will have Boolean values (TRUE, FALSE).
  • “jar-name” tag value must match the name of the jar file present in the zip file.
  • If “is-scripting” tag has a value “TRUE”, only then the values of “back-up-file-name” and “script-name” tag will be taken into account.
  • “plugin-property” and “dependency” tag can have multiple instances and have the values for plugin configs and dependencies respectively.
  • “override-existing” tag decides whether to add the new plugin or update an existing one. If value is “true”, then the existing plugin will be updated else it will be added as new.
  • Three operations can be done on the plugin properties and will be defined by the “operations” tag inside “plugin-property” tag. Following are the supported operations:
    • Add: adds a plugin property. An error is shown if it exists already.
    • Update: updates a plugin property identified by its name and if it doesn’t exist, creates a new one.
    • Delete: deletes a plugin property identified by its name. An error is shown if no such property exists.
  • Default values for the plugin properties:

 

Property data type Default value
String Default
Integer 0
Boolean Yes

These properties will be assigned for properties which are mandatory. Also if a property is multivalued, the first value from the list will be the default value.

Assumptions

  • “plugin-service-instance” and “method-name” tags must be correct as they cannot be validated.
  • “application-context-path” refers to the application context file name for the plugin.
  • For the dependencies tag:
    • For a new plugin, with dependencies as
    • ORDER_BEFORE : P2,P3/P4,P6/P7/P8
    • UNIQUE : TRUE

Dependencies management

On clicking the “Dependencies” button, the dependency management screen will open up. It contains the following:

  • “Plugin” List Drop Down: Allows the user to select the plugin whose dependencies it wants to see.
    • “Dependency List”: List of Dependencies with the following attributes:
      • Plugin Name: Name of the plugin.
  • Dependency Type: Type of dependency it shares with the dependent plugins.
  • Dependency: List of dependent plugins.
  • Add” Button: Allows the user to add a dependency for a plugin.
  • Edit” Button: Allows the user to edit an already existing dependency for a plugin.
  • Delete” Button: Allows the user to delete an already existing dependency for a plugin.
  • Save” Button: Saves the current state of dependencies of all the dirty plugins and takes the user to the “Workflow Management” Screen.
  • Apply” Button: Saves the current state of dependencies of all the dirty plugins and stays on the current Screen.
  • Cancel” Button: Discards the current state of dependencies of all the dirty plugins and takes the user to the “Workflow Management” Screen.

WorkflowManagement Dependencies.jpg

 

Add Dependencies

This screen shows the following:

  • Plugin Name: name of the plugin selected on the previous screen.
  • Dependency type: List of available dependency types. Only single select is allowed.
  • Dependencies List: this list contains the list of available plugins minus the plugin selected on the previous screen. This list will be enabled only when “ORDER_BEFORE” is chosen as dependency type. Only single select is allowed.
  • Selected Dependencies: List of dependencies selected by the user.
  • And Button: on being clicked, adds the dependency selected in the “Dependencies List” as an “and” dependency to the “Selected Dependencies” text box.
  • Or Button: on being clicked, adds the dependency selected in the “Dependencies List” as an “or” dependency to the “Selected Dependencies” text box.
  • Ok Button: Saves the dependency to the plugin.
  • Reset Button: Resets all the fields to their initial values.

WorkflowManagement EditDependency.jpg

Edit Dependencies

This allows the user to edit a particular dependency record.

Delete Dependencies

This allows the user to delete a particular dependency record.

Help Content

  • On being clicked it will display a pop-up message providing information on how to upload and use a new plugin.

WorkflowManagement Help.jpg

Dependencies database table structure

Id plugin_id dependency_type dependency
1 P1 ORDER_BEFORE P3,P5
2 P2 ORDER_BEFORE P1/P8
3 P3 ORDER_BEFORE P4
5 P1 UNIQUE
  • Fields:
    • Id:
      • Data type: Long
      • The unique Id for the table
    • plugin_id:
      • Data type: Long
      • Plug-in id mapped directly to the Plug-in table.
    • dependency_type:
      • Data type: ENUM(ORDER_BEFORE,UNIQUE)
      • Defines the type of dependency between the plug-in and list of plug-ins in dependency column.
    • dependency:
      • Data type: String
      • List of plug-ins which on which the plug-in depends.
      • Format of values in the delimiter separated values.

 

Delimiter Meaning
, AND
/ OR
  • Example: P1,P2,P3/P4,P5,P6/P7/P8
    • Above example means that plug-in needs the following:
      • P1
      • P2
      • P3 OR P4
      • P5
      • P6 OR P7 OR P8
    • NOTE: if dependency_type = UNIQUE, then this field will be empty.
  • Type of dependencies:
    • Ordering of plugins:
      • This type of dependency signifies a dependency where a plugin requires a plugin to run before it.
      • Example:
        • For 1st plugin in workflow: no dependency.
        • For any other plug-in: All of its ancestor plug-ins.
  • Uniqueness:
    • This type of dependency signifies a plugin’s uniqueness in the workflow, i.e. it should only run once in the workflow. E.g. clean up plug-in

Automated Regex Validation Plugin

Overview

This plugin performs the functionality of validating the documents with respect to the given regex pattern. The regex pattern described in the Regular Expression Listing is used to validate the documents. The given regex pattern is matched with respect to all the values in each document for all the document level fields present, if all are matched then that document is marked as valid i.e. their valid tag is set to true and if out of all, any document level field doesn’t match then that document is set as invalid i.e. their valid tag is set to false.

Configuration

Steps for configuring the plugin

  • User can select the batch class module and create the regex pattern by navigating to Regular Expression Configuration page as shown below:

BatchClassManagement RegularExpressionConfiguration.jpg

  • User can create multiple regex patterns for each document level field. This is shown below in the screenshot:

BatchClassManagement RegularExpressionListing.jpg

Steps of execution

  • Plug-in uses the regex pattern defined for each document type in document level fields.
  • It matches all the regex defined with each document level fields from batch.xml. If all the values of document level fields are matched with regex defined then that document’s “Valid” tag is set to true, otherwise it is set to false.
  • The documents that are valid do not need validation but those which are set as false for valid tag are to be validated during Validation.

Troubleshooting

Following are few common error messages seen due to mal-functioning of the plugin:

 

S no. Error message Possible root cause
1 Invalid initialization of field service. No field type initialized in a document.
2 Invalid input pattern sequence. Regex pattern is not supplied for required field.

Barcode Extraction Plugin

Overview

This plug-in performs the functionality populating the field type value when barcode type is given. When the plug-in switch is ON then the barcode value extracted is saved as the value for the field type. If the switch is OFF for the plug-in then it doesn’t perform anything.

Configuration

Steps for configuring the plugin

  • User can select the batch class module and navigate to barcode extraction plug-in configuration page as shown below:

BatchClassManagement BarcodeExtractionPlugin.jpg

The User can edit the above settings by clicking on “Edit” in order to change the settings for their requirements.

 

Configurable properties

Following are the configurable properties available for the Barcode Extraction plugin:

 

Configurable property
Type of value
<center>Value options <center>Description
Barcode Extraction Switch List of values
  • ON
  • OFF

 

Switch to decide whether or not to perform barcode extraction.Default ON.
Barcode Extraction Maximum Confidence Integer 0-100 The maximum confidence value that is used for extraction.
Barcode Extraction Minimum Confidence Integer 0-100 The minimum confidence value that is used for extraction.
Barcode Extraction Reader Types Multi select
  • CODE39
  • QR
  • DATAMATRIX
  • CODE128
  • CODE93
  • ITF
  • CODABAR
  • PDF417
  • EAN13

 

All the barcode extraction types that is present. Following list of preset barcode extraction types are present: – CODE39, QR, DATAMATRIX, CODE128, CODE93, ITF, CODABAR, PDF417 AND EAN13.
Barcode Extraction Valid Extension Multi select
  • Tiff
  • gif

 

It is used to configure all the possible types of files that will be used for extraction.

Steps of execution

  • Plug-in uses the type of barcode given in the field type listings. While creating the field type user can enter the type of barcode that has to be used for classification. This is shown in the following screen shot:-

BatchClassManagement BarcodeType.jpg

  • While executing, if there is any barcode present on the document, then the value extracted from the document for barcode is used to populate the value of the document level field.
  • If there is no barcode given then it will not set the value for document level field.

Dependency

  • There must be document level field present and barcode type must be selected in the field type configuration.
  • Files must have required extensions only which are configured.

Troubleshooting

Following are few common error messages seen due to mal-functioning of the plugin:

 

S no. Error message Possible root cause
1 No valid extensions are specified in resources. There are no values for valid extensions.
2 File has invalid extension. If the files present has extension other that the given valid extension. For example if valid extension given is “gif” and document being processed is “tif”.

Barcode Reader Plugin

Overview

Barcode Reader Plugin is used to read barcode from the input images using zxing. Barcode Reader plugin is used to read the following barcode types:

  • CODE39
  • CODE93
  • CODE128
  • ITF
  • PDF417
  • QR
  • DATAMATRIX
  • CODABAR
  • EAN13

Any barcode detected on the images using barcode reader plugin then that barcode decoded value will be consider as document type name for the barcode classification in Document Assembler.

Barcode values should be document type value used in the batch class having this plugin.

Configuration

Configurable Properties

Following are the list of configurable properties for the plugin:

 

Configurable property Type of value Value options Description
Barcode Valid Extensions String .tif; .gif These are the valid extension of the input images file for decoding the barcode.
Barcode Max Confidence Integer NA This is max confidence to be set if the barcode is decoded on the input images.
Barcode Min Confidence Integer NA This is the min confidence to be set if the barcode is not found on the input images.
Barcode Classification Switch String
  • ON
  • OFF

 

Switch is used ON/OFF the barcode reader plugin. Default ON
Barcode Reader Type String
  • CODE39
  • CODE93
  • CODE128
  • ITF
  • PDF417
  • QR
  • DATAMATRIX
  • CODABAR
  • EAN13

 

These values are used to decode the barcode type using the barcode reader plugin.

This is shown in the screen shot given below:

BarcodeConfigurableProperties.jpg

Steps of execution

  • This plug-in works in the page process phase of the application when all the import processing on the batch has been done and it’s ready to be page processing.
  • The plug-in decodes the barcodes on the input images.
  • After all the work is done, it writes the information into batch.xml file for the barcode being decoded.

Dependency

The plugin assumes the import processing of the batch has been done properly and after this plugin will decode the barcode from the input images.

Troubleshooting

Following are few common error messages received due to mal-functioning of the plugin:

 

S no. Error message Possible root cause
1 No pages found in batch XML. Invalid Batch.xml present in Batch Instance Folder.
2 No valid extensions are specified in resources. No “Barcode Valid Extensions” extensions found in the database.
3 File {image name} has invalid extension. This image file is having invalid file extension for processing Barcode Reader Plugin.

Clean Up Plugin

Overview

This plug-in is used to delete system files and UNC folder data once all the processing on the batch is complete. This plugin also removes all the content associated with a particular batch instance. Using this plugin, all the content associated with batch instance is also removed.

Steps of execution

  • This plug-in ideally works after the export phase of the application when all the processing on the batch has been done. I.e. the desired results have been exported and the batch instance’s content is no longer required by ephesoft application.
  • The plug-in takes the identifier of a batch instance and removes all the contents and it’s sub-files from the following paths:
    • <SHARED_FOLDER_PATH>\<BATCH_CLASS_UNC_FOLDER>\<BATCH_INSTANCE_FOLDER_NAME>: This folder is deleted always.
    • <LOCAL_FOLDER_PATH>\<BATCH_INSTANCE_IDENTIFIER>: This folder is deleted only if plugin configuration “Delete System Folder Information” is “TRUE”.
    • Also deletes “<BATCH_INSTANCE_IDENTIFIER>.ser” file from the <LOCAL_FOLDER_PATH> \properties folder. This file is deleted only if plugin configuration “Delete System Folder Information” = TRUE.

Configuration

Configuration screenshot

BatchClassManagement CleanupPlugin.jpg

Configurable Properties

Following are the list of configurable properties for the plugin:

 

<center>Configurable property
Type of value
Value options
Description
Delete System Folder Information List of values
  • TRUE(Default)
  • FALSE

 

Defines whether or not the “<Local folder>\<Batch instance>” folder and its contents are to be deleted or not.

Dependency

  • The plugin depends on “IMPORT BATCH FOLDER” as it considers a batch to be imported first before its associated files are cleaned up.
  • This plugin should ideally occur in the workflow only once and should be the last plugin for the workflow. If not, then it will remove the resources to be used by the other which run after it and hence will cause the batches to go into error.
  • The plugin assumes the extraction for the incoming batch has been done properly and just changes the results of provided batch.xml in a desired format.

Troubleshooting

Following are few common error messages seen due to mal-functioning of the plugin:

 

S no. Error message Possible root cause
1 Unable to delete Folder Folder could not be deleted. Due to one of the following reasons:

  • It is locked by some other process
  • It is opened by user on explorer

 

2 Not enough permission to delete folder Security exception occurred. The user/process/JVM does not have sufficient rights to delete the folder to be cleaned.

CMIS Export Plugin

Overview

This plug-in is used for uploading PDF/TIFF file being generated as final output of batch execution to a CMIS compliant repository as ‘Document’ object. Currently the application supports Alfresco, Nuxeo, SharePoint, Documentum and IBM CM repository.

 

Configuration

Ephesoft Configurable Properties

Edit the configurations in CMIS Export Plugin as follows:

Cmis Export Plugin.jpg

Following is the list of configurable properties for the plugin:

 

Configurable property Type of value Value options Description
CMIS Root folder Name String N-A Name of the folder at CMIS repository.
CMIS Upload File Extension List of values
  • pdf
  • tiff
The extension of the file being uploaded.
CMIS Server URL String For example : http://{Server_ip}:{port_number}/alfresco/service/cmis The URL of the CMIS repository server. This URL is varies for different repository like “alfresco”, “share point”, “Nuxeo”, “Documentum” etc.
CMIS Server User Name String For example:“admin” The username for CMIS repository server login.
CMIS Server User Password String For example:“admin” The password for CMIS repository server login.
CMIS Server Repository Id String For example: “83b9c8bb-415e-46fd-9feb-c9fb8e4e2122” Id of the CMIS repository used for uploading files.
CMIS Server Switch ON/OFF List of values
  • ON
  • OFF
This property enables/disables CMIS Export Plugin.
Aspect Switch List of values
  • ON
  • OFF
This property is specific to Alfresco repository. This property enables/disable
Aspects on the document.
CMIS Export File Name String For example: “$EphesoftBatchID && _ &&$Ephe softDOCID” The name of the file to be uploaded.

  • Must contain one or more parameters out of – EphesoftBatchID/ EphesoftDOCID /a document level field name.  A parameter name must begin with ‘$’

symbol.  Different fields must be separated by ‘&&’.

  • If none specified, name of the local folder to be exported is used

to get filename to be exported.

 

Documentum Repository Configurable Properties

  • CMIS Server URL: http://<host address>:<port_of_emc-cmis>/emc cmis/resources/repositories/Repository_id

 

  • Repository id is the repository name and can be extracted from a xml file which can be downloaded by hitting URL:  http://<host address>:<port_of_emc-cmis>/?repositoryId=RepositoryName

 

  • To create/edit configuration types in Documentum, use Eclipse plugin for WebTop Development Kit  (Please refer following link:

http://marketplace.eclipse.org/content/documentum-webtop-development-kit). Create a type corresponding to Ephesoft’s Document Type to be used in CMIS as a child of dm_document type and add attributes which corresponds to DLFs in Ephesoft’s Document type.

 

  • Following are the steps for viewing Documentum Repository configurations:

o Users use Documentum Administrator to explore uploaded files and to create types in Documentum Repository.
o URL to access Documentum Administrator is http[[://<host address>:< port_of_da>/da]]
o It will ask for login credentials to repository.
o After successful login, user can select a type under Repository/Administration/Types to view its properties and attributes :

Cmis Access Properties.jpg
Figure: Showing access to properties of a type in Repository

 

  • DLF-Attribute-mapping.properties(located at [EphesoftInstallationDirectory]\SharedFolders\[Batch-class-Folder]\cmis-plugin-mapping):

DocumentTypeName=DocumentumTypeName
DocumentTypeName.FieldTypeName1=Documentum’sType’sAttributeName1
DocumentTypeName.FieldTypeName2=Documentum’sType’sAttributeName2
DocumentTypeName.FieldTypeName3=Documentum’sType’sAttributeName3
A sample file content is:
INV=invoice
INV.inv_num=inv_num
INV.inv_amount=inv_amount
INV.inv_date=inv_date

 

  • Uploaded documents can be viewed at Repository/Administrator/Cabinets/*location in Documentum Administrator:

Cmis Uploaded Doc.jpg
Figure: Showing an uploaded document via Ephesoft

 

  • The properties of uploaded batch can be viewed by following by right click on uploaded file and selecting Properties.

Cmis Access Props Of Uploaded Doc.jpg

Figure:Showing access to properties of an uploaded document

Cmis Props Of Uploaded Doc.jpg

Figure: Properties of an uploaded document.

 

  • Properties in dcma-cmis.properties file located at  [EphesoftInstallationDirectory]\

Application\WEB-INF\classes\META-INF\dcma-cmis\* are similar to for Alfresco Repository (Refer ‘dcma-cmis.properties  for Alfresco ‘specified below).
 (NOTE: If wssecurity is used, URL that returns a page that containing a list of web services is: http://<host address>:< port_of_emc-cmis >/emc-cmis/services/RepositoryService)

Alfresco Configurable Properties

There are configuration files which should be placed at the Alfresco installation directory’s following path :< Alfresco installation path>\tomcat\shared\classes\alfresco\extension

 

  • There are three configuration files used in Ephesoft to map parameters:

web-client-config-custom.xml”’:   Alfresco automatically looks for this file on the class path in the alfresco.extension package for configuration.
ephesoft-model-context.xml”’:  To tell the location of the custom configuration file (Any file ending with “-context.xml” is used to tell the location of the custom configuration file).
ephesoftModel.xml”’:  The custom configurations file for stating the parameters (Document level index fields) that will be mapped with alfresco repository parameters.
Sample entries in ephesoftModel.xml file:-
<type name=”ephesoft:ephesoft”>
<title>Ephesoft Document Procedure</title>
<parent>cm:content</parent>
<properties>
<property name=”ephesoft:invoiceDate”>
<type>d:text</type>
</property>
<property name=”ephesoft:partNumber”>
<type>d:text</type>
</property>
<property name=”ephesoft:invoiceTotal”>
<type>d:text</type>
</property>
<property name=”ephesoft:state”>
<type>d:text</type>
</property>
<property name=”ephesoft:city”>
<type>d:text</type>
</property>
</properties>
</type>
o A default xml file is available with the Ephesoft release.

 

  • DLF-Attribute-mapping.properties:

o The properties file used to get the mapping of parameters onto alfresco custom parameters i.e. mapping of Ephesoft specific Document Types to Alfresco Document Types and Ephesoft specific Document Level Fields to Alfresco specific Document Level Fields.
o Sample entries in properties file:
Application-Checklist=D:ephesoft:ephesoft
Application-Checklist.InvoiceDate=ephesoft:invoiceDate
Application-Checklist.PartNumber=ephesoft:partNumber
Application-Checklist.InvoiceTotal=ephesoft:invoiceTotal
Application-Checklist.State=ephesoft:state
Application-Checklist.State=ephesoft:city
Application-Check\ list.State=ephesoft:city
o Note: In case there is space in the name of the Document or in Document Level Fields, then escape it with “\ “character.
o A default properties file is available with the Ephesoft release starting from 2.5 latest versions.

 

  • Properties for dcma-cmis.properties:

A property is given as property_name=value

 

Configurable property Type of value Value options Description
cmis.document_versioning_state List of values · NONE:  The document will be created as a non-versionable document.
· CHECKEDOUT: The document MUST be created in the checked-out state.
· MAJOR: The document MUST be created as a major version.
· MINOR: The document MUST be created as a minor version.
This is the document versioning state for uploading.
Default or in case of invalid option: CHECKEDOUT
cmis.security.mode List of values · “basic” for HTTP Basic Authentication (default)
· “wssecurity” for WS-Security Username Token based security.
Specify the security mode employed by the CMIS endpoint.
cmis.repo.create_batch_subfolders List of values · true
· false
This is to specify whether or not a subfolder should be created for the batch within the configured target repository folder.  If invalid or missing it is true.
cmis.aspect_mapping_file_name String For example:
aspects-mapping.properties
This is the name of the aspect properties file present in “\META-INF\dcma-cmis\dcma-cmis.properties”.
This is to add aspects to documents being uploaded on CMIS repository via Ephesoft for alfresco repository.

Specify the WSDL URL’s for each of the CMIS services if “wssecurity” is specified for the value of the “cmis.security.mode” property. The text {serverURL} may be inserted into the path if you wish to have the batch class configured server URL to be used for part of the URL.
For example:
o cmis.url.acl_service=http://hostname:8080/alfresco/soap/ACLService?wsdl
Or
cmis.url.acl_service={serverURL}/ACLService?wsdl, where {serverURL} is the CMIS server URL configured within the batch class.
Similarly following properties are set for wssecurity:
o cmis.url.discovery_service=http://localhost:8181/alfresco/cmisws/DiscoveryService?wsdl
o cmis.url.multifiling_service=http://localhost:8181/alfresco/cmisws/MultiFilingService?wsdl
o cmis.url.navigation_service=http://localhost:8181/alfresco/cmisws/NavigationService?wsdl
o cmis.url.object_service=http://localhost:8181/alfresco/cmisws/ObjectService?wsdl
o cmis.url.policy_service=http://localhost:8181/alfresco/cmisws/PolicyService?wsdl
o cmis.url.relationship_service=http://localhost:8181/alfresco/cmisws/RelationshipService?wsdl
o cmis.url.repository_service=http://localhost:8181/alfresco/cmisws/RepositoryService?wsdl
o cmis.url.versioning_service=http://localhost:8181/alfresco/cmisws/VersioningService?wsdl

 

  • Mappings of Data types defined in Ephesoft and at Alfresco Server

Reference Links:- [[1]], [[2]]

 

Ephesoft Data Type Alfresco data type Alfresco Property Type mapping (Internally Converted to) Comments (If any)
STRING d:text String
INTEGER d:int Integer
FLOAT d:float Decimal
DOUBLE d:double Decimal
DATE d: datetime DateTime
BOOLEAN d: boolean Boolean
LONG d: long Integer Max allowed values: 999-999-999

Checklist

  • Mapping of document types in DLF-Attribute-mapping.properties file should be equivalent to type defined in ephesoftModel.xml file in Alfresco repository.
DLF-Attribute-mapping.properties ephesoftModel.xml
Application-Checklist=D:ephesoft:ephesoft <type name=”ephesoft:ephesoft”>
  • Data Type of document level fields defined in DLF-Attribute-mapping.properties file should be equivalent to the types of document attributes defined in ephesoftModel.xml file in Alfresco repository.
DLF-Attribute-mapping.properties ephesoftModel.xml
Application-Checklist.InvoiceDate
The datatype from Ephesoft Application should be of type “String”
<type>d:text</type>
  • Screenshot from Ephesoft application for Data Types.

Cmis Ephesoft Data Types.jpg

Aspect switch configuration

Below is the requirement to add aspect:

 

  • To add aspects to the file being uploaded:

Aspects will be added to the document file being uploaded. This will be done according to its document type defined in its batch.xml file.  To know which aspect is to be added to documents of which document type (when uploading), there has to be a mapping of document type v/s aspects.

 

  • Add values to properties defined by an aspect:

These values will be the values of document level fields that have been mapped to that property.

 

Mapping Properties

Path of mapping properties file:

There has to be a mapping defined for the above two requirements. This will be done in Ephesoft with the help of a property file.
The absolute path of the file is specified by the following steps:
o The folder name in which this property file resides inside the batch class folder of ephesoft-data is specified through the “batch.cmis_plugin_mapping_folder_name” property in the file: “\META-INF\dcma-batch\dcma-batch.properties”.
e.g.:  batch.cmis_plugin_mapping_folder_name=cmis-plugin-mapping property
P.S: This is the same property that defines the folder path of the property file for CMIS content type mapping.
o The name of this property file is specified by a new property “cmis.aspect_mapping_file_name” in the property file: ”\META-INF\dcma-cmis\dcma-cmis.properties”.
e.g.:   cmis.aspect_mapping_file_name=aspects-mapping.properties
The above defined property file contains the entire mapping associated with aspects.

 

Content of mapping properties file

It is needed to add mapping for:
Mapping document types to aspects:
User can map document types to multiple aspects (i.e. the aspects user  intend to add to documents of a certain document type).
This will be done through adding the name of the document type as key and aspects as the value (each aspect separated with a semi-colon “;”)
e.g.:  Application-Checklist=P:cm:titled;P:cm:taggable
In this example user is adding two aspects:  “P:cm:titled” and “P:cm:taggable” to all documents with document type “Application-Checklist”.
Mapping document level fields to aspect properties:
User can map document level fields to aspect properties.
This can be done by using the key as “{DocumentType}.{DocumentLevelFieldName}” and the value as the property to be mapped to.
e.g. :  Application-Checklist.State=cm:description
In this example user is specifying that for all documents with document type “Application-Checklist” he/she will be populating the value of document level field “State” into the aspect property “cm:description”.
In case of an error encountered while adding aspects to a uploaded document, the user will have to restart the batch after correcting the errors due to which the error was being encountered, and the document will be uploaded again.
For more information on aspects, please refer to the link: [[3]]

Dependency

The plugin runs after Create Multi Page Files Plugin in Export Module. The plugin assumes that the multipage tiff/pdf has been successfully generated for the batch and uploads the multipage tiff/pdf to the CMIS repository.

 

Troubleshooting

Following are few common error messages received due to mal-functioning of the plugin:

 

S no. Error message Possible root cause
1 com.ephesoft.dcma.core.DCMAException: Not Found Alfresco server URL is invalid.
2 com.ephesoft.dcma.core.DCMAException: Repository not found! Repository ID is invalid.
3 Cannot initialize Web Services service object[org.apache.chemistry.opencmis.binding.webservices.RepositoryService]: Failed to access the WSDL at:http://localhost:8181/alfresco/cmisws/RepositoryService?wsdl. It failed with:Connection refused: connect. Invalid URL for wssecurity either updates it to basic or corrects the URL in {dcma-cmis.property} file.
4 com.ephesoft.dcma.core.DCMAException: Unauthorized Invalid user name or password.
5 Server URL is null/empty from the data base. Invalid initializing of properties. Server URL is empty or not mapped to database.
6 Server User Name is null/empty from the data base. Invalid initializing of properties Username is Empty or not mapped to database.
7 Server User Password is null/empty from the data base. Invalid initializing of properties. Password is empty or not mapped to database.
8 UploadFileTypeExt is null/empty from the data base. Invalid initializing of properties Upload file type extension is empty or not mapped to database.
9 RootFolder is null/empty from the data base. Invalid initializing of properties. Root Folder is empty or not mapped to database.
10 org.apache.chemistry.opencmis.commons.exceptions.CmisConstraintException: Conflict Files already exist in the specified folder hierarchy. Please try deleting old files.
11 java.lang.IllegalArgumentException:Object Id must be set! Unable to create folder in the specified hierarchy.
12 CMISExporter- Bad Request issue Mapping defined in DLF-Attribute-mapping.properties file is not the same as mapping defined in content model at Alfresco repository.NOTE: Detailed description of error #12 is below.
13 CMISExporter- Property ‘ephesoft:partNumber’ is a String property” from Alfresco repository. Mismatch in the type of Document Level fields defined in Ephesoft application andthose defined in the Alfresco content model.
NOTE: Detailed description of error #13 is below.

Description of error #12

· User  may define in properties file a mapping as follows:-
o Application-Checklist=D:ephesoft:document
o Application-Checklist.InvoiceDate=ephesoft:invoiceDate
o Application-Checklist.PartNumber=ephesoft:partNumber
o Application-Checklist.InvoiceTotal=ephesoft:invoiceTotal
· At Alfresco repository, however, it  may be defined as follows:-
<type name=”ephesoft:ephesoft”>
<title>ephesoft Document Procedure</title>
<parent>cm:content</parent>
<properties>
<property name=”ephesoft:invoiceDate”>
<type>d:text</type>
</property>
<property name=”ephesoft:partNumber”>
<type>d:text</type>
</property>
<property name=”ephesoft:invoiceTotal”>
<type>d:text</type>
</property>
<property name=”ephesoft:state”>
<type>d:text</type>
</property>
<property name=”ephesoft:city”>
<type>d:text</type>
</property>
</properties>
</type>
· This mismatch would give “Bad Request” error from CMIS plugin while it tries to upload the document.

Description of error #13
· User may have following mappings defined in Alfresco content model:-
<property name=”ephesoft:partNumber”>
<type>d:text</type>
</property>
<property name=”ephesoft:invoiceTotal”>
<type>d:int</type>
</property>
o “partNumber” may be mapped as “text” type.(Let us say, this is of type LONG in Ephesoft application)
o  “invoiceTotal” may be mapped as “int” type.( Let us say, this is of type DOUBLE in Ephesoft application)
Above mismatch gives the “CMISExporter – Property ‘ephesoft:partNumber’ is a String property” from Alfresco repository.
Correction: Update the content model in Alfresco repository with appropriate data types. (For further reference of data type mappings, please refer following link. http://wiki.alfresco.com/wiki/Data_Dictionary_Guide#Data_Types)
<property name=”ephesoft:partNumber”>
<type>d:int</type>
</property>
<property name=”ephesoft:invoiceTotal”>
<type>d:double</type>
</property>

Copy Batch XML Plugin

Overview

Overview

Copy batch xml plugin is an export plugin available in Ephesoft application. It allows us to export the metadata generated by processing a batch to any location on the file system. Using this plugin, we can export the generated batch.xml and output document files(PDF and/or TIFF files). The following configurable parameters are available:

  • Base export Folder for all the batches.
  • Naming pattern of the export folder for a batch.
  • Naming pattern of the document files to be copied.
  • Type of output document files to be copied(PDF and/or TIFF files).

Plugin Properties

  • Final Export Folder: Folder path where all the output files (batch xml, multipage pdf, multipage tiff) are to be exported.
  • Export To Folder Switch: Switch to decide whether or not to copy the batch output files to the Final Export Folder. It switches the Copy Batch XML plugin *ON/OFF. Default value is ON.
  • Export Folder Name: Folder with the specified name will be created in the Final Export Folder and all the document output files (multipage pdf(s) and tiff(s)) will be copied into this folder.
  • Export File Name: All the document output files (multipage pdf(s) and tiff(s)) will be renamed based on the parameters specified in this property.
  • Batch XML Export Folder: Folder where batch xml file will be moved. It’s possible values are :
  • Batch Instance Folder: Batch XML files will be copied to batch instance folder (BIXX) in the Final Export Folder.
  • Final Export Folder: In this case, batch xml’s will be copied directly in the Final Export Folder.

Export Folder Name and Export File Name configuration

  • Values specified in these fields can be either document field name, EphesoftBatchID or EphesoftDOCID.
  • Each parameter (document field name, EphesoftBatchID or EphesoftDOCID) should be preceded by ‘$’.
  • Example: $Invoice Date && _ && $Invoice Total && _ && $EphesoftBatchID. If batch xml has following values for these parameters :
    • Invoice Date : 13 Jan
    • Invoice Total : 22.22
    • EphesoftBatchID : BIA

then folder or file names will be named as “13 Jan_22.22_BIA”.

  • && will be used as a separator between parameters.
  • If any invalid character is entered by user (for either Export Folder Name or Export File Name) or value for any of the parameters specified contains invalid character, it will be replaced by replace_char which is configurable from properties file.

Note Invalid character is the one that cannot be used for a file or folder name.

(e.g., / \ : * < > ? ” | for windows)

  • In case document doesn’t contains any of parameter specified (e.g., $Invoice Date && _ && $Invoice Total) or doesn’t contain value for any of the parameter,
    • Document files (multipage pdfs and tiffs) names will be retained as it is, in case of Export File Name.
    • Document files will be moved to folder named Unknown, in case of Export Folder Name.

Configuration

UI Configurations

User can configure Copy batch xml plugin from UI:

{Batch Class List} -> {Batch Class} -> Export -> COPY_Batch_XML
BatchClassManagement CopyBatchXMLPlugin.jpg
Properties Description:

 

Configurable property Type of value Value options Description
Final Export Folder String NA Folder Path where all the files (batch xml, multipage pdf, multipage tiff) are to be exported.
Export To Folder Switch String
  • ON
  • OFF

 

Switch to decide whether or not to copy the batch files to the Final Export Folder. Default ON.
Export Folder Name String NA Folder with this name will be created in the Final Export Folder and all the document files (multipage pdf(s) and tiffs) will be copied in this folder. Refer to Guidelines for entering Export Folder Name and Export File Name.
Export File Name String NA All the document files (multipage pdf’s and tiff’s) will be renamed based on the parameters specified in this property. Refer to Guidelines for entering Export Folder Name and Export File Name.
Batch XML Export Folder String
  • Batch Instance Folder
  • Final Export Folder

 

Folder where batch xml file will be copied. Possible Values:Batch Instance Folder: Batch XML files will be copied to batch instance folder (BI??) in the Final Export Folder.Final Export Folder: In this case, batch xml’s will be copied directly in the Final Export Folder.

Property File Configurations

Configuration for Replacing Invalid Character:

Property File Name: dcma-export.properties

Property file location: {Ephesoft_Home}/WEB-INF/classes/META-INF/dcma-export/*
Properties Description:

 

Configurable property Type of value Value options Description
export.invalid_file_name_characters String NA Semi-colon separated list of characters that will be treated as invalid characters for file names. Default value is /;\\;\:;*;<;>;?;”;| for windows environment.
export.replace_char String NA Invalid characters will be replaced by export.replace_char.

Dependencies

CREATEMULTIPAGE_FILES plugin: This plugin is responsible for creating multipage pdf and tiff files which are copied by Copy batch XML plugin to Batch XML Export folder.

Multipage tiff files will be created only if Create Multipage Tiff Switch is ON in this plugin.

Troubleshooting

S no. Error message Possible root cause
1 Could not create folder. Batch instance folder could not be created in Final Export Folder. Check for permission on this folder.
2 Folder does not exist. Folder specified for Final Export Folder doesn’t exist.

Create Multipage Files Plugin

Overview

The Create Multipage Files plugin by default is a part of export module. This plugin generates multipage PDF and TIF files for each document type of a batch inside final drop folder. This final drop folder path is a configurable property defined inside Copy Batch XML Plugin.

This plugin also generates colored, searchable and optimized PDF depending upon the configuration made.

Configuration

UI Configurations

Following are the list of configurable properties from UI:-

BatchClassManagement CreateMultipagefilesPlugin.jpg

 

Configurable property Type of value Value options Description
PDF Optimization switch List of values
  • ON
  • OFF

 

This switch is used to create optimized PDF by adding web –view to PDF. This feature currently only works with Ghostscript.
Create Multipage Tiff Switch List of values
  • ON
  • OFF

 

This switch is used to create multipage tiff files with the help of Imagemagick when the switch is turned ON.
Multipage File Export Process List of values
  • ITEXT
  • ITEXT-SEARCHABLE
  • HOCRtoPDF
  • IMAGE_MAGICK
  • GHOSTSCRIPT

 

This option provides user an option to select API to create multipage files.
Colored Output PDF List of values
  • TRUE
  • FALSE

 

This option provides the user an option to generate colored PDF as output.
Searchable Output PDF List of values
  • TRUE
  • FALSE

 

This option provides the user an option to create searchable PDF when this option is set to true.
PDF Creation Parameters String NA This option provides the user an option to define ghostscript parameters for creating PDF.
PDF Optimization Parameters String NA This option provides the user an option to define ghostscript parameters for creating optimized PDF.

Property File Configurations

Following are the list of configurable properties from property file located at ‘{Ephesoft-Home}/WEB-INF/classes/META-INF/dcma-imagemagick/imagemagick.properties’:-

 

Configurable property Type of value Sample Value Description
imagemagick.tif_compression String LZW This property defines the compression mode to be used while creating multipage tiff.
imagemagick.pdf_quality int 100 This property defines the quality of PDF which can vary from 0-100.
imagemagick.colored String True This property is used to define whether multipage tiff will have colored or monochrome images
imagemagick.pdf_compression String LZW This property defines the compression mode to be used while creating multipage PDF.
imagemagick.display_image_output_parameters String -colorspace gray -alpha off This property defines imagemagick output parameters to be used while generating multipage tiff
imagemagick.max_files_processed_per_gs_cmd Integer 75 This property defines number of maximum files ghostscript can process to generate multipage PDF
imagemagick.height_for_pdf_page Integer 792 This property defines height of PDF page while generating PDF using iText
imagemagick.width_for_pdf_page Integer 612 This property defines width of PDF page while generating PDF using iText
imagemagick.max_files_processed_per_im_cmd Integer 100 This property defines number of maximum files imagemagick can process to generate multipage tiff

Steps of execution

  • This plug-in works in the export phase of the application when all processing on the batch has been done and it’s ready to be exported.
  • The plug-in creates multipage tiff or PDFs in the final drop folder for all document types in a batch.
  • After all the work is done, batch.xml is updated and batch is passed to other export plugins.

Dependency

This plugin requires hocr.xml file for creating searchable PDF. It has a dependency on one of the plugins from: ‘Recostar HOCR’/ ‘Tesseract HOCR’.

Troubleshooting

Following are few common error messages received due to mal-functioning of the plugin:

 

S no. Error message Possible root cause
1 IM4JAVA_TOOLPATH is not set for converting images using image magic Environment variable for Image Magick is either not set.
2 Environment Variable GHOSTSCRIPT_HOME not set. Environment variable for Ghost Script is either not set.

Create Thumbnails Plugin

Overview

This plug-in is used to create the thumbnail image of the batch images. Two types of thumbnails will be generated by this plugin:

Display Thumbnails: These thumbnails are displayed in Review and Validate screen, where pages in the documents are shown as thumbnails under the document name.

Compare thumbnails: These thumbnails are used by classify images plugin to classify the pages.

By default, this plugin is added in the page process module.

Configuration

Setting the plugin configuration

The above mentioned configurable properties can be edited at following UI:

Edit Batch Class  Edit Page Process module Edit CREATE_THUMBNAILS Plugin BatchClassManagement CreateThumbnailsPlugin.jpg

 

Configurable Properties

Following are the list of configurable properties for the plugin:

 

Configurable property Type of value Value options Description
Create Thumbnails Switch String
  • ON
  • OFF

Default Value: ON

Setting it to ON/OFF to determine whether compare thumbnails will be created or not. Display thumbnails will always be created.
Create Thumbnails Display Thumbnail Type String .png Determines in which format the thumbnail image should be displayed. It’s a non-editable property.
Create Thumbnails Compare Thumbnail Type String .tif The format of the image type created used for comparing with the display thumbnail type. It’s a non-editable property.
Create Thumbnails Display Image Height Integer Any Integer value.
Default value: 200
Sets the height of the thumbnail image.
Create Thumbnails Display Image Width Integer Any Integer value.
Default value: 150
Sets the width of the thumbnail image.
Create Thumbnails Compare Image Height Integer Any Integer value.
Default value: 200
Sets the height of the compare thumbnail image.
Create Thumbnails Compare Image Width Integer Any Integer value.
Default value: 150
Sets the width of the compare thumbnail image.
Create Thumbnails output Image Parameters String Valid parameters for Image Magick.
Default Value: -colorspace gray
This property is used if the user wants to input something additional, to be processed by Image Magick

Dependency

The plugin is dependent on Import Batch Folder Plugin. Import Batch Folder plugin copies the batch files from UNC folder to the Ephesoft System Folder.

Troubleshooting

S. No.
Error Message
Description
1 Problem generating thumbnails. Check if IM4JAVA_TOOLPATH environment variable is set correctly.

Db Export Plugin

Overview

This plug-in is responsible for saving the data of document level fields for a particular batch instance to the external or same database. It takes the mapping file provided for the plugin and creates a SQL query to insert the mapped document level field into the mapped table.

Configuration

Configurable properties screenshot

BatchClassManagement DBExportPlugin.jpg

 

Configurable properties

Following are the configurable properties available with the plugin:

 

Configurable property
Type of value
Value options
Description
Database Export Switch List of values * ON

  • OFF

 

The switch that defines whether this plugin will run or not. Default value is “OFF”
Database Connection URL String A valid database connection URL. The database connection URL corresponding to the selected driver.
Database Driver List of values * net.sourceforge.jtds.jdbc.Driver

  • com.microsoft.jdbc.sqlserver.SQLServerDriver
  • com.mysql.jdbc.Driver

 

Type of driver to be used for database connection.
Database User Name String A valid username value to connect to database SQL account username.
Database Password String A valid password value to connect to database SQL account password.

Mapping File

  • Mapping file for this plugin is stored for each batch class at the following path:
    • <SHARED_FOLDER_PATH>\<BATCH_CLASS_IDENTIFIER>\db-export-plugin-mapping\db-export-mapping.properties
  • Its contents should in the following syntax:
    • <Document Type>.<Document Level Field Name>=<Database Table Name>:<Database Table Column Name>
    • For e.g.:
      • Invoice.type=testTable:invoiceType
      • Invoice.sender=testTable:invoiceSender
      • Invoice.receiver=testTable:invoiceReceiver
      • Invoice.total=testTable:invoiceTotal

Dependency

The plugin requires the following prerequisites:

  • Plugin does depend on any other plugin. But desired output comes only when the document level field has some extracted value.
  • A table with name as provide in the mapping file must be created with the following structure:

 

Field Name
Null allowed
BATCH INSTANCE ID
NO
BATCH CLASS ID
NO
DOCUMENT TYPE
NO
DOCUMENT LEVEL FIELD
NO
VALUE
YES
  • If the “Database Export Switch” is ON, then the mapping provided should be correct. Invalid mapping will result in batch going to error.

Troubleshooting

Following are few common error messages seen due to mal-functioning of the plugin:

 

S no.</center> Error message</center> Possible root cause</center>
1 Error in parsing DB Export Plugin mapping file, FileNotFoundException The “db-export-mapping.properties” file is not located at the “<SHARED_FOLDER_PATH>\<BATCH_CLASS_IDENTIFIER>”
2 Error in parsing DB Export Plugin mapping file, NoSuchElementException One or more properties “db-export-mapping.properties” is in incorrect syntax.
3 Problem occurred in updating database table One of the following reasons caused this:

  • Database connection setting is incorrect
  • Error occurred while writing values to DB.

 

4 Error in initialising Hibernate Connection Database connection settings are invalid.

Document Assembler Plugin

Overview

This Plugin is responsible for forming documents from single pages. This plugin reads all the pages present at the document type “Unknown” and on the basis of page level fields, creates new documents. The create document plug-in will review page level field results and decide which page is the first page and what is the document type based on page_level_index fields.

Ephesoft supports 5 types i.e. barcode, search, and image, automatic and searchable PDF classification. It also assumes that only one type of classification can be applied at a time for a batch. Also, User can select ‘Automatic Classification’ which should operate like Search classification but it should include top results from Barcode and Image classification as well. Default configuration provided in property file in the order starting from Barcode, then Image and then Lucene search classification.

  • Barcode classification: In barcode classification, Ephesoft are forming document type on the basis of the bar code present in the processing document and document provided for sample on the time of learning.
  • Search classification: In search classification, Ephesoft are forming document type on the basis of text found on the images using lucene. While learning HOCRing is done of the image samples provided in the batch class data. Data is compare of the HOCR files and the sampled HOCR files.
  • Image classification: In Image classification, Ephesoft are forming document type on the basis of their image samples provided on the learning time. Image search classification is done using superimposing of two images and fetches the best match for it.
  • Automatic classification: In Automatic classification, Ephesoft are forming document type on the basis of top results from Barcode and Image classification as well. Default configuration provided in property file in the order starting from Barcode, then Image and then search classification.
  • Searchable PDF classification: In Searchable PDF Classification, this classification is only for searchable batch class. Ephesoft are assuming Searchable batch class having single document type if not than first document type is set to all the documents and merged into single document.

Configuration

Property File Configuration

Property file: {Ephesoft-Home}/WEB-INF/classes/META-INF/dcma-docassembler/dcma-document-assembler.properties

 

Configurable property
Type of value
Value options
Description
da.barcode_classification
String
Barcode (default)
This field is used to specify the barcode plugin name.
da.lucene_classification
String
Search_Engine_Classification (default)
This field is used to specify the Search Engine Classification plugin name.
da.image_classification
String
Image_Compare_Classification (default)
This field is used to specify the Image Compare Classification plugin name.
da.automatic_classification
String
Automatic_Classification (default)
This field is used to specify the Automatic Classification plugin name.
da.first_page
String
First_Page (default)
This field is used to specify the First Page name.
da.middle_page
String
Middle_Page (default)
This field is used to specify the Middle Page name.
da.last_page
String
Last_Page (default)
This field is used to specify the Last Page name.
da.automatic_include_list
String
Barcode;Image_Compare_Classification;Search_Engine_Classification
This field is used to specify the order of classification type using semicolon separator.

UI Configuration

Document Assembler plugin can be configuring from at following UI:

BatchClassManagement DocumentAssemblerPlugin.jpg

 

Configurable property
Type of value
Value options
Description
DA Barcode confidence
Integer
0-100
This field is used to specify the barcode confidence.
DA Rule First-middle-last Page
Integer
0-100
This field is used to specify the confidence for first, middle and last page.
DA Rule First Page
Integer
0-100
This field is used to specify the confidence for first page.
DA Rule Middle Page
Integer
0-100
This field is used to specify the confidence for middle page
DA Rule Last Page
Integer
0-100
This field is used to specify the confidence for last page.
DA Rule First-last Page
Integer
0-100
This field is used to specify the confidence for first and last page.
DA Rule First-middle Page
Integer
0-100
This field is used to specify the confidence for first and middle page.
DA Rule Middle-last Page
Integer
0-100
This field is used to specify the confidence for middle and last page.
DA Classification Type
List of values
* Search Classification

  • Barcode Classification
  • Image Classification
  • Searchable Pdf Classification
  • Automatic Classification

 

This value decides the document classification type to be used for classification.
DA Merge Unknown Document Switch
List of values
* ON

  • OFF

 

This value decides the weather the unknown document to be merged with pre classified document or not.

Steps of execution

    • This plug-in works in the document assembler phase of the application when the entire page processing on the batch has been done and it’s ready to be exported.
    • The plug-in use the page classified in the page processing module as an input and generates the merged and classified document as an output.
    • After all the work is done, if DA Merge Unknown Document Switch is ON, it merged the unknown document left due to lesser confidence to the previous classified document.

Dependency

The plugin assumes the page processing for the incoming batch has been done properly. Afterwards this plugin will merge the page and create the document for the classified pages into the page processing module.

 

Troubleshooting

Following are few common error messages received due to mal-functioning of the plugin:

 

S no. Error message Possible root cause
1 Invalid format of page level fields. Doc Field Type found for {Document Assembler Classification Type} classification is null.
  • Page level fields weren’t present on the batch.
  • Page processing module didn’t work properly.

 

2 Document Type name is not found in the data base for the page type name Barcode decoded value is not found as document type in the Ephesoft Application database.
3 No Document type defined for batch instance Batch class doesn’t have document type for classification.
4 Invalid integer for barcode confidence score in properties file. Invalid value for “DA Barcode confidence” at Ephesoft Admin Screen Configuration.

Docushare Export Plugin

Overview

This plug-in is used for exporting zipped file for a batch. It transforms the batch xml to another xml format acceptable by Docushare CMS and zips it along with multipage pdf to Docushare export folder location.

Steps of execution

    • This plug-in works in the export phase of the application when all the processing on the batch has been done and it’s ready to be exported.
    • The plug-in makes use of a predefined xml to convert the batch xml file into a Docushare supported format. And name the new xml file according to the user specified value.
    • It then group pdf file associated with the batch.
    • After all the work is done, it makes a zip file of all the content and name the file according to the user specified value.

Configuration

Property File Configuration

Property file: {Ephesoft-Home}/WEB-INF/classes/META-INF/dcma-docushare-export/dcma-docushare-export.properties

Following are the list of configurable properties for the plugin:

 

Configurable property
Type of value
Value options
Description
docushare.final_export_folder String docushare-export-folder (default folder is <Shared folder directory>\ SharedFolders \DOCUSHARE-export-folder) This field stores a string value of folder in which the zipped file will be exported after transformation in desired format.
docushare.final_xml_name String _docushare.xml This value holds name of the batch xml finally created.
docushare.zip_file_name String _docushare.zip This value holds name of the zip file finally created.
docushare.switch List of values * OFF

  • ON

 

This property determines whether the plug-in will run or not.

Dependency

The plugin assumes the extraction for the incoming batch has been done properly and just changes the results of provided batch.xml in a desired format.

Troubleshooting

Following are few common error messages received due to mal-functioning of the plugin:

 

S no. Error message Possible root cause
1. Problem in zipping directory Export folder name is invalid, i.e. either

  • not present
  • is not a directory

 

2. Could not find xsl file Xsl file is not present in classpath resource
3. Problem occurred in transforming
  • Xsl file is not present
  • Problem in transforming

 

Fuzzy Db Extraction Plugin

Overview

Fuzzy DB plugin is used to extract the document level fields of a document from records in the database on the basis of the matched value of the HOCR content or the previously extracted value of a document level field. This plug-in involves creation of search engine based indexing and extracting document level field value based on fuzzy match of HOCR content against index. User can configure any Vendor database in order to capture Vendor name, Vendor ID or any other field from the incoming invoices. This can be done simply by mapping the document to the Vendor database table and the index fields of the document to the columns in the database table. The plugin will find the matching vendor from the database and update the fields in the document.

Configuration

Configurable properties

Following are the configurable properties available for the Fuzzy Db plugin:

 

Configurable property
Type of value
Value options
Description
Minimum Word Length Integer N-A The minimum word length below which words will be ignored from the HOCR content.
Minimum Term Frequency Integer N-A The frequency below which terms will be ignored in the source document.
Minimum Doc Frequency Integer N-A Sets the frequency at which words will be ignored which does not occur in at least this many documents
Maximum Query Terms Integer N-A The maximum number of query terms that will be included in any generated query.
Database Password String A valid password value to connect to database The password for connecting to the user SQL account.
Database User Name String A valid username value to connect to database The username for connecting to the user SQL account.
Database Driver List of values * net.sourceforge.jtds.jdbc.Driver

  • com.microsoft.jdbc.sqlserver.SQLServerDriver
  • com.mysql.jdbc.Driver

 

The database driver to be used, this will DBMS specific.
Database Connection URL String A valid database connection URL. The database connection URL required for connection, this will DBMS specific.
Minimum Confidence Threshold Integer N-A Minimum threshold value required for a Fuzzy Db row to be selected for Fuzzy Extraction.
Date Format String N-A Date format to be used for identifying the date field
No Of Pages Integer N-A Maximum Number of pages to be included while querying for the content
Option To Include Pages List of values * ALLPAGES

  • FIRSTPAGE

 

Determines whether all the pages or the first page of the document will be chosen for fetching the HOCR content.
FuzzyDB Extraction switch List of values * ON

  • OFF

 

Determines whether or not the fuzzy extraction should work or not.
Query Delimiters String N-A Delimiters to be used while using the fuzzy text search in the validation phase.
Ignore Words List Multi select * Name

  • Title

 

List of words to be ignored from HOCR content
Fuzzy Extraction Search Columns based on Fields String N-A This property defines the name of the Document Level Field for which the user wants to search. E.g. for “$City, $State” The values of the “City” and “State” DLFs would be queried in the learnt indexes and appropriate row for database table is returned. DLFs for the concerned document are populated accordingly.
Fuzzy Extraction HOCR Switch List of values * ON

  • OFF

 

This property defines if no value corresponding to the above mentioned column is found, whether or not to continue searching the complete HOCR content. ON signifies whether to continue searching with HOCR content in case the value specified in “Fuzzy Extraction Search Columns based on Fields” is not found. OFF signifies to search on the values extracted by previous extraction plugin in case the value specified in “Fuzzy Extraction Search Columns based on Fields” is not found.

Steps for configuring the plugin

  • User can select the batch class module and navigate to fuzzy DB plugin configuration page as shown below:

BatchClassManagement FuzzyDBPlugin scrollup.jpg

BatchClassManagement FuzzyDBPlugin scrolldown.jpg

The User can edit the above settings by clicking on “Edit” in order to connect to the vendor database.

  • User can map the document type to a database table by clicking on “Mapping” as shown below:
    • The document type can be mapped to a database table (having data records to be indexed) for the list of tables provided.

BatchClassManagement FuzzyDB DatabaseMapping.jpg

    • The document level fields can be mapped to table columns for extraction.

BatchClassManagement FuzzyDB DatabaseMapping TableMapping.jpg

  • Once the mapping is defined, the user can click on “Learn DB” to create indexes of all the records present in the database.
    • Lucene indexing is generated against all database records belonging to all document types which have been mapped for current batch class. Only mapped columns are indexed.
    • Indexes are built on a string which is the combined text of all the fields mapped to various columns of the database table.
    • Separate index directories are created to store indexes per document type per batch class. The hierarchy used for storing index files against each document level field is: <Shared-Folder-Path>\<Batch-Class>\fuzzydb-index\<Database-Name>\<Table-Name>.

Steps of execution

  • Plug-in uses HOCR content of a document and generate a query comprising of the keywords based on their occurrence in the document. It then compares the HOCR based query against indexes on DB table rows.
  • Lucene returns the matching records among which the record with the highest confidence score is selected. If the score is greater than the threshold then the corresponding values will be stored in document level fields’ values in batch xml file.
  • Following are cases that can occur in execution of the plugin:

 

“FuzzyDB Extraction switch” Value “Fuzzy Extraction Search Column” Value “Fuzzy Extraction HOCR Switch” Value Result
OFF N.A. N.A. No Fuzzy Extraction.
ON <Empty> N.A. Usual Fuzzy Extraction using HOCR content.
ON “$City,$State” OFF Searches the value of “City” and “State” document level fields extracted by previous extraction plugins and search for them in the learned Lucene content and if some data is found, it is used else the data from previous extraction remain.
ON “$City,$State” ON search the value of “City” and “State” document level fields extracted by previous extraction plugins and search for them in the learned Lucene content and if some data is found, it is used else the usual Fuzzy Extraction using HOCR content is done.

Dependency

  • Lucene engine is used over the SQL query for fetching every word in the html file as it provides an edge in terms of speed and efficiency. SQL query would be too slow and furthermore Lucene will provide results even if the OCR is not perfect on every character in the word.
  • It is possible that query might not give any results. In such cases, no document level field is updated.
  • It is possible that query might give multiple results. In such cases, the one with the highest confidence score entry will be used to populate document level fields.
  • The plug-in does not involve manual intervention and will be an automated step.

Troubleshooting

Following are few common error messages seen due to mal-functioning of the plugin:

 

S no. Error message Possible root cause
1 CorruptIndexException while reading Index The lucene indexes are either locked or corrupted.
2 The base fuzzy db index folder does not exist. So cannot extract database fields. Fuzzy database has not been learned yet.

HTML TO XML

Overview

Earlier to 3030 release HTML TO XML generation plugin creates HOCR xml file using HTML file created by RECOSTAR_HOCR/TESESERACT plugin. HOCR files are generated by thread pool executor but in 3030 RECOSATR_HOCR/TESSERACT plugins directly generate HOCR xml file corresponding to image file. So now this plugin is obsolete.

Other plugins use this HOCR xml file to read the image data.

Configuration

  • Property File:{Ephesoft-install-dir}/WEB-INF/classes/META-INF/dcma-core/dcma-core.properties/*
  • Property:thread.pool_size=5

 

Configurable property
Type of value
Value options
Description
thread.pool_size String Positive integer value This field stores a string value for thread.pool_size field. This property will govern how many files will be processed simultaneously.

Dependencies

One of the below two specified plugins must be ON to generate HOCR Xml files:

  • RECOSTAR_HOCR
  • TESSEARCT_HOCR

Classify Images Plugin

Overview

This plugins is responsible for classifying the Ephesoft documents using image comparison algorithm using imagemagick.

This plugin is working on the two stages for classification of document:

  • Learning: Learning process is done generating indexes for documents. Generated indexes will be used as classifying the document. For further information of learning, please refer the document “Learning document”.
  • Classification: While classifications a document using classify images plugin, learnt data is used as reference data for classification of document. While classification a document type, this plugin use the image for super impose on the learnt images and generate confidence on the basis of it.

Configuration

Configurable Properties

Following are the list of configurable properties for the plugin:

 

Configurable property Type of value Value options Description
Classify Image Switch String
  • ON
  • OFF

 

This property is used to ON/OFF the plugin.Default ON.
Classify Image Max Result Integer NA This property is used for storing the maximum result classified from the input image into the batch.xml
Classify Image Comparison Metric String Ex: RMSE This property is used to comparison the learnt images with the input images provided for classification.
Classify Image Fuzz Percentage Integer NA This property is used to fuzz distance approach while classification image using image-magick.

This is shown in the screen shot given below:

BatchClassManagement ClassifyImagesPlugin.jpg

Steps of execution

  • This plug-in works in the page process phase of the application when all the import processing on the batch has been done and it’s ready to be page processing.
  • Learning should be done on the batch class before using this plugin.
  • The plug-in classifying the input images via image based classification via imagemagick.
  • After all the work is done, it writes the information into batch.xml file for the document type being classified.

Dependency

This plugin is part of page processing module and working after successful completion of import module.

Troubleshooting

Following are few common error messages received due to mal-functioning of the plugin:

 

S no. Error message Possible root cause
1
<center>Exception while executing the compare command. <center>Configurable parameter is having invalid values.
2 Learning not done for batch class sample folder path. Learning is not done for the batch class.

Import batch folder plugin

Overview

Import Batch Folder plugin copies the batch files from <Ephesoft shared folder>\<batch class UNC folder > to the <Ephesoft local folder >. This plugin creates a folder in <Ephesoft local folder> with name batch instance folder (BI<Batch Instance identifier>) and copies the batch instance files to that folder.

Only files with valid extensions will be moved to the UNC folder.

BatchClassManagement ImportBatchFolderPlugin.jpg

Configuration

Properties File

Properties file location: <Ephesoft installation path>\Application\WEB-INF\classes\META-INF\dcma-import-folder\dcma-import-folder.properties.

Properties Description:

 

Configurable property Type of value Value options Description
import.invalid_char_list String N-A List of characters ignored for file name is defined separated by semi colon.

Configurable properties

Following are the configurable properties available with the plugin:

 

Configurable property Type of value Value options Description
Folder importer valid extensions String Defines a list of supported file extensions. Multiple values will be “;” separated. Default value “tif”.

Troubleshooting

Following are few common error messages received due to mal-functioning of the plugin:

 

S no. Error message Possible root cause
1 Invalid characters present in folder name. If folder/file name contains invalid character then batch will go into error.
2 Could not find valid Extensions properties in the property file. “Folder Importer Valid Extensions” property doesn’t exist for the plug-in.

Import multipage files plugin

Overview

Import Multipage Files plugin is required when running a batch on multipage images. This plugin will break the multipage pdf’s and tiffs into multiple single page tiffs. Multipage pdf’s will be converted to single page tiffs using ghostscript whereas multipage tiffs will be converted to page single page tiffs using imagemagick.

BatchClassManagement ImportMultipageFilesPlugin.jpg

 

Configuration

UI Configuration

IMPORT_MULTIPAGE_FILES properties can be edited at following admin UI:

 

Configurable property Type of value Value options Description
IM Convert Input Image Parameters String N-A Input parameters for imagemagick command that should be used for multipage tiff to multiple single page tiffs conversion.
Multi Page Import List of values
  • YES
  • NO

 

Switch for multipage files import plugin. If set to NO, multipage files (pdf and tiff) will not be converted to multiple single page tiffs.
IM Convert Output Image Parameters String N-A Output parameters for imagemagick command that should be used for multipage tiff to multiple single page tiffs conversion.
Ghostscript Image Parameters: String N-A Parameters for ghostscript command that should be used for multipage pdf to multiple single page tiffs conversion.

Property File Configuration

Property File location: <Ephesoft-Installation-Path>\ Application\WEB-INF\classes\META-INF\dcma-import-folder\dcma-import-folder.properties\*

 

Configurable property Type of value Value options Description
import.folder_ignore_char_list String N-A Semi colon separated of characters that are to be replaced in the file names encountered by the plugin.
import.ignore_replace_char String N-A Character specified here that will replace the characters mentioned in “import.folder_ignore_char_list” for the file names encountered by the plugin.

Optimization parameters and results

“-sDEVICE” parameter

  • -sDEVICE=tiff12nc

Produces 12-bit RGB output

  •  -sDEVICE=tiff24nc

Produces 24-bit RGB output

  •  -sDEVICE=tiff48nc

Produces 48-bit RGB output

  •  -sDEVICE=tiff32nc

Produces 32-bit CMYK output

  • -sDEVICE=tiff64nc

Produces 64-bit CMYK output

  • -sDEVICE=tiffscaled24 -sCompression=lzw

Produces a 24 bit RGB image and allows the use of a special compression tag along with it which allows us to compress the size of the image.

  • -sDEVICE=tifflzw

Produces black-and-white output and can be combined with various compression options.

  • Following are the results of images produced by splitting a PDF with the given specifications under different Ghost Script parameters:

Results

  • PDF Size: 514Kb
  • Number of pages in PDF: 26

Note: PDF contained mixture of colored and B/W images

 

-sDEVICE Type of output Size per image produced(in KB) Total images size(in MB)
tiff12nc Same type of images 12,241 325
tiff24nc Same type of images 25,446 626
tiff48nc Same type of images 51,148 1258
tiffscaled24 -sCompression=lzw Same type of images 250-400 6.75
tifflzw All images converted to B/W 50-90 1.4

Troubleshooting

Following are few common error messages received due to mal-functioning of the plugin:

 

S no. Error message Possible root cause
1 Invalid property file configuration The following properties located in “<Ephesoft-Installation-Path>\ Application\WEB-INF\classes\META-INF\dcma-import-folder\ dcma-import-folder.properties” file in are empty:

  • import.folder_ignore_char_list
  • import.ignore_replace_char

 

2 Converted Tiff files count not equal to the TIFF pages count. The number of pages in PDF/Multipage Tiff is not equal to the converted tiff files.

Key Value Learning Plugin

Overview

This plugin is used to generate Advanced KV pairs to make the data extraction more appropriate based on past data extracted by the user manually. It keeps track of the data which is extracted manually by the user by populating DLFs directly from the 3rd panel image. Based on this, it generates advanced KV pairs using regular expressions defined in property files and save it for corresponding document types. Its properties can be configured using an ON/OFF switch from admin UI and property files: ‘dcma-key-regex.properties’, ‘dcma-key-value-location.properties’ and ‘dcma-value-regex.properties’ defined in META-INF.

This plugin will iterate over each document level field of each document. First, it will match the value of document level field with the regex patterns defined in the properties file. Most matched regular expressions will become the value pattern for that field which is picked from the properties file. This document level field value is then searched in the OCR data {HOCR file} for that page of the document.

If value is found successfully, it will search key value in all the eight directions as a location and try to match it with the regex patterns defined in the properties file. Most matched regular expression will become the key pattern and as it is found in the left of value (i.e., value exists in right of the key), location will be set as RIGHT. If no value is present in left, plugin will consequently search its top, right, bottom and other locations and match it to the regex patterns in the properties to get the key pattern and accordingly set the location.

Note: Location is set here for processing purpose only. This location has no link with the ‘Location’ field displayed in Advanced KV pairs. Location field value will always be empty for generated advanced KV pairs.

  • If any value is not matched to any of the regex pattern, value itself will be set as the key pattern of this field.
  • Application will search the key locations in below order that can be configured through semi colon separated in the property files. As soon as it will able to find first value it will take that location:
  • LEFT
  • RIGHT
  • TOP
  • BOTTOM
  • TOP_RIGHT
  • TOP_LEFT
  • BOTTOM_RIGHT
  • BOTTOM_LEFT

Multi word support for KV Learning

Key Value Learning plugin in Export module automatically creates a Key Value field corresponding to a document level field.

This enhancement allows multi words to be used for generation for key pattern in Key Value Learning plugin in Export module. If any word is found close to the key, it will be appended to the key and will be used for the key pattern generation.

Note:

Keys will be appended left for location LEFT, BOTTOM, TOP, BOTTOM_LEFT and TOP_LEFT, and appended right for location BOTTOM_RIGHT and TOP_RIGHT.

Configuration

Property File Configuration

Property file: {Ephesoft-Home}/WEB-INF/classes/META-INF/dcma-key-value-learning/dcma-key-value-location.properties

 

Configurable property Type of value Value options Description
key_value.location_order String LEFT;RIGHT;TOP;TOP_LEFT;TOP_RIGHT; It is a semi-colon separated list of location. It represents the order of location in which key will searched in the image. Locations specified are of key with respect to value.
key_value.max_number_record Integer NA It represents maximum number of key value pairs that can be present for any DLF. If any DLF has already this maximum number of key value fields defined, this plugin will not add any more key value pair to this DLF. Default Value is 50
key_value.tolerance_threshold Integer
  • A
  • B
  • C

 

Length and width of the value rectangle created by the plugin will be increased by this tolerance value (width + (width*tolerance)/100). For example, if calculated width of plugin is 100 pixels and tolerance specified is 10, resultant width will be 110 pixels.
key_value.multiplier Integer Integer value This property holds an integer value which decides on <some logic>. (Also mention range if applicable)
key_value.fetch_value String
  • FIRST
  • LAST
  • ALL

 

Fetch value for key value field that is being created by the plugin. Default Value supplied is FIRST.
key_value.min_key_char_count Integer NA Minimum number of characters that must be present in the extracted key. Default value is 4.
key_value.gap_between_keys Integer NA Any word found left or right (depending on the location of Key found with respect to Value) will be considered for key depending on its distance with respect to the key. Default value is 50. See below example.

Example:

Consider image contains following data:

Invoice Date: 05/02/2012Invoice Number: 99888888

Following is the location order specified in property file:

LEFT; RIGHT; BOTTOM_LEFT; BOTTOM_RIGHT; TOP; BOTTOM; TOP_RIGHT;

If 99888888 is a value for Invoice Number document level field, “Number” will be first extracted as a key. Algorithm will search for left of “Number”, if gap between “Invoice” and “Number” is less than the value specified forkey_value.gap_between_keys“Invoice Number” will be used for key pattern generation, and else only “Number” will be considered.

Property file: {Ephesoft-Home}/WEB-INF/classes/META-INF/dcma-key-value-learning/dcma-key-regex .properties

This property file contains regular expressions that can be used for key pattern generation.

Property file: {Ephesoft-Home}/WEB-INF/classes/META-INF/dcma-key-value-learning/dcma-value-regex .properties

This property file contains regular expressions that can be used for value pattern generation.

UI Configuration

Key value learning can be turned ON/OFF from at following UI:

BatchClassManagement KeyValueLearningPlugin.jpg

 

Configurable property Type of value Value options Description
Key Value Learning Switch List
  • ON
  • OFF

 

Set it to ON/OFF depending on whether plugin needs to be executed or not.

Dependencies

Key value learning plugin depends on following two plugins:

  • RECOSTAR_HOCR
  • TESSERACT_HOCR

One of the above plugins must be ON for key value learning as these plugins extract data from the image and create HOCR file which is required for the Key Value learning.

Frequently Asked Questions

Question: Key value field not added to the document level field after plugin execution.

Answer: There could be multiple reasons for key value field not created after plugin execution:

Reason 1: Maximum key value fields have been already been added to the document level field.

Solution: Check the value for key_value.max_number_record. Default value provided is 50.
Reason 2: Key found during extraction has less number of characters than minimum number of characters required for key.

Solution: Check for the key_value.min_key_char_count property. Its default value supplied is 4.

Reason 3: The key value location order property is not defined.

Solution: Check for the value of property key_value.location_order. It should have required location specified.
Question: Key value field added but is not accurate.

Reason: One possible reason for such an issue is location order specified is not as per the requirement.

Solution: Check for the key_value.location_order property. Most probable value for key with respect to value should be specified first in the list.

 

Key Value Extraction plugin

Overview

‘Key-value pair’ based extraction plug-in will be responsible for extracting document level index field values based on relative location of ‘value’ against a specified key. There are two modes for KV extraction: Simple and Advanced KV Extraction.

Plugin working

Input and output parameters

Input

  • Document Pages and corresponding HOCR
  • Document level fields
  • Plug-in Configuration
    • Key (Regular Expression)
    • Value (Regular Expression)
    • Location (left, right, top, bottom, top left, top right, bottom left, bottom right)

Output

Document level fields, values and alternative values updated in batch.xml.

Steps of execution

Plug-in execution for a batch instance will consist of following steps:

  • Extraction plug-in will iterate over all documents belonging to a batch instance and for every document, based on ‘document type’ it will fetch the list of document level index field. Every document level field will have association to multiple instances of extraction filters.
  • Pages (HOCR corresponding to page) belonging to that document will be parsed to generate an in-memory matrix having all word and corresponding co-ordinates, against each page (exact structure of matrix will be figured out while doing detailed design). Intent of generating this matrix is to improve performance, as this matrix will be generated once for all pages of a document and will be used for key / value pattern matching for all document level fields belonging to that document.
  • For every document level field, regular expression against ‘KEY’ will be searched against page level in-memory matrix already created in previous step
  • If regular expression based search for “KEY” returns one or more matched words, regular expression against ‘VALUE’ is evaluated against words located at specific relative location to the key (as governed by LOCATION attribute of extraction filter or by value zone created at advanced KV screen). This is done for all occurrence of KEY on every page level matrix.
  • Zero or more match found against VALUE regular expression will be used to update batch.xml as Document level field value (and alternate values).

Simple KV Extraction

Here ‘key’ and ‘value’ both are regular expression in itself.

Each key value field consists of following attributes in simple KV extraction:

  • Key Pattern: Regular expression pattern for the key.
  • Value Pattern: Regular expression pattern for the value.
  • Location: Specifies the location of value with respect to key. Possible values are left, right, top, bottom, top left, top right, bottom left, bottom right.
  • No of words: Specifies number of words that will be extracted to the right of value that is extracted by the value regular expression.

Example: Suppose there is document level field Date, and image contains following data:

Date: 01/01/2012

While defining the simple key value field for Date,

  • Date should be entered as key pattern.
  • [0-9]{2}/[0-9]{2}/[0-9]{2,4} should be entered as a value.
  • Location should be entered as right.

Advanced KV Extraction

Admin user can also define KV pair patterns using rectangular coordinates from Admin UI. Admin is provided with ‘Advanced Add’ and ‘Advanced Edit’ buttons to define and modify the KV patterns.
As soon as user will click on any of the above specified buttons, another UI will open up with following options with text boxes and labels displayed:
  • Key Pattern (regex or other pre-defined field)
  • Value Pattern (regex)
  • Multiplier (0 to 1; multiplied with confidence score value to calculate new confidence score)
  • Fetch Value (First, Last or All)
  • Page Value (First, Last or All)
  • Length of the rectangle (in pixels)
  • Width of the rectangle (in pixels)
  • x-offset (in pixels)
  • y-offset (in pixels)

Out of the above properties, the Key Pattern, the Value Pattern, the Multiplier (0 to 1), the Fetch Value and the Page Value are to be defined by user whereas, the length of the rectangle, width of the rectangle, x-offset and y-offset are auto generated.
Also there will be Capture Key and Capture Value buttons to define relative key and pattern coordinates respectively.
Page Value: User can specify following page value while defining advanced key value pair:

  • ALL: KV Extraction will be performed on all pages of the document.
  • FIRST: KV Extraction will be performed on first page of the document.
  • LAST: KV Extraction will be performed on last page of the document.

Fetch Value: User can specify following fetch value while defining advanced key value pair:

  • First: to extract only first data from the value zone matching the value pattern specified.
  • Last: to extract only last data from the value zone matching the value pattern specified.
  • All: to extract only all data from the value zone matching the value pattern specified.

Capturing Key and Value

Using browse button image can be uploaded for which coordinates of key and value are defined.

Table KVExtraction.jpg Overlay for key and value is captured using “Capture Key” and “Capture Value” button

On the basis of relative key and pattern coordinates, Document level field is extracted by KV extraction plugin.

Anchor Key Value

This functionality is added as an enhancement to existing advanced KV extraction. It aims to utilize the result of previously extracted document level fields for extraction of other document level fields. User can use previously defined field as a key while defining advanced key value field for some other document level field.
User can use previously defined field as a key while defining advanced key value field for some other document level field.

  • There is a “Use Existing Field For Key” checkbox present on advanced KV extraction UI.

UseExistingFieldForKey.jpg

  • On checking this, a list will be populated with the names of document level fields that can be used as a key.

DLFUsedAsKey.jpg

User can select any of those fields as key.

Note: Only those document level fields will be shown in drop down whose field order number is less than the field order number of the field for which key value pair is being defined.

  • While defining the advanced key value pair for the document level field, user needs to capture key and value rectangles.
  • If “Use Existing Field For Key” check box is selected, value of the field selected as key should be captured. This is required to calculate the X-Offset and Y-Offset for the KV field.

Example: Suppose there are two document level fields State and City, and image contains following data:

State: CALIFORNIA

City: LA

While defining the advanced key value field for City,

  • Use existing field for key should be checked.
  • State should be selected from the drop down for key pattern.
  • CALIFORNIA should be captured as key.
  • LA should be captured as a value.

Editing Overlays in Advanced KV Extraction:

Functionality to edit key and value overlays on the Advanced KV Extraction Screen is also there.

Once the key has been captured using the Capture Key button, the Edit Key button gets enabled. Similarly, once the value has been captured using the Capture Value button, the Edit Value button gets enabled.

Once “Edit Key” or “Edit Value” has been clicked, all the other options become disabled on the screen.

While editing overlays for key and value, only one side of the rectangle forming the overlay becomes free for editing. Hence, there are four sides (of the rectangle) that can now be edited. To edit any side, the user now needs to click closest to that side and in the area formed by the parallel lines formed by extending its adjacent sides.

The following snapshots explain a use case where a user intends to edit the right hand side of the overlay formed for the key:

The following snapshot shows a captured Key and Value pair:

KVExtraction CapturedKey.jpg

To edit the Key overlay the user will click on the “Edit Key” button and the screen will appear as shown in the following snapshot:

KVExtraction EditKey.jpg

User can now click on any side of the key to adjust its size. Similarly, value rectangle zone can be adjusted.

 

Configuration

These are the following configurable property for KV extraction

 

<center>Configurable property
Type of value
Value options
Description
Regex Confidence Score String 0 to 100 Regex confidence score for key value extraction
KV Extraction switch Multi select
  • ON
  • OFF

 

KV extraction switch

KVExtractionSwitch.jpg

Simple KV Extraction

Admin can configure the simple KV extraction rule by clicking Add or Edit from following UI:

KeyValueFieldsListingAddOrEdit.jpg

These are the following configurable property for simple KV extraction

 

Configurable property
Type of value
Value options
Description
Key Pattern String NA Regular expression pattern for the key
Value Pattern String NA Regular expression pattern for the pattern
Location Integer 0 to 100 Specifies the location of value with respect to key. Possible values are left, right, top, bottom, top left, top right, bottom left, bottom right.
No of words integer Integer value Specifies number of words that will be extracted to the right of value that is extracted by the value regular expression.

As soon as add or edit button is clicked, following screen is shown where user enter value for different fields. KVExtractionConfiguration.jpg

  • Key Pattern: Enter regular expression in text box.
  • Value Pattern: Enter regular expression in text box.
  • Location: Select location from drop down.
  • No of words: Enter an integer value in the text box. (Default value is 0)

Advanced KV Extraction

These are the following configurable property for advance KV extraction

 

Configurable property
Type of value
Value options
Description
Use existing Field For Key checkbox NA Enable to use value of other field defined as key
Key Pattern String NA Regular expression pattern for the key
Value Pattern String NA Regular expression pattern for the pattern
Multiplier Integer 0 to 100 Non-mandatory field that can have values between 0 and 1. Its value is multiplied with confidence score value to calculate new confidence score during extraction.
Fetch Value String ALL, FIRST, LAST Drop down with following possible values: ALL, FIRST, LAST. Default value: FIRST
Page Value String ALL, FIRST, LAST Drop down with following possible values: ALL, FIRST, LAST. Default value: FIRST

Advanced KV extraction field is configurable from following UI:

KVExtractionConfigurationFromUI.jpg

To capture key and value, draw a rectangle on image using right button click of mouse. Overlay will be drawn at UI. After drawing a rectangle, user can need to click on Capture Key/Capture Value button.

Please note that user needs to capture key first before capturing value. If he attempts to capture value before capturing key, following message will be displayed:

Key not finalized. Finalize key first.

As soon as user captures both key and value, following fields will be populated automatically:

  • Length
  • Width
  • X-Offset
  • Y-Offset

Dependencies

Either one of the following must be on for KV extraction:

  • RECOSTAR_HOCR
  • TESSERACT_HOCR

Above specified plugins generate the HOCR content for an image which is used by KV extraction for extraction.

FAQs

Question: Data not extracted or incorrect data extracted for the field for which existing field is being used as key.

Answer: Check for value extracted for the field which is used as a key for this field. If incorrect value is extracted, correct the key value pair defined for that field. This can be tested via Test Adv. KV button on the Advanced KV screen.

 

Recostar Extraction plugin

Overview

The Recostar extraction plugin by default is a part of the extraction module. This plugin extracts the data for the document level fields for the particular document classified in the document assembler plugin.

Using these plugin document level fields is populating via reading XML file generated by the RSP project file with Recostar tool.

RSP file is has the following format:

RSPFileFormat.jpg

  • User should map the document level fields in the RSP file where the above screen shot having oval mark.
  • User can find further details for creating the RSP file for extraction in “Recostar Design Studio and Fixed Form documentation”.

Steps of execution

    • This plug-in works in the extraction processing phase of the application when all the document classification on the batch has been done properly.
    • This plugin extracts the document level field’s data of the image using Recostar tool.
    • This plugin uses the RSP file present on the <Ephesoft Shared Folder>\{Batch Class}\recostar-extraction\*.rsp otherwise file present in the bin folder of the {Application}\native\RecostarPlugin\bin\*.rsp file will be used.

Configuration

Configurable Properties

Following are the configurable properties available for the Fuzzy Db plugin:

 

Configurable property
Type of value
Value options
Description
Recostar color switch List of values
  • ON
  • OFF

 

If color switch is ON then PNG file will be used for OCRing.
Recostar Auto Rotate switch List of values
  • ON
  • OFF

 

This property is used to auto rotation of the input images on the basis of orientation provided by the recostar.
Recostar Extraction Switch List of values
  • ON
  • OFF

 

This switch is used to turn this plugin ON/OFF.

Dependency

Apart from the above mentioned properties, there is a major configuration associated with this plugin. Recostar extracts values depending on the project file being used. Hence the project file is the important file for this plugin.

Since the project file maps document level fields with appropriate values (or patterns or barcodes), for extraction, it is purely document type specific. Hence instead of specifying the project file name at the plugin level, one needs to specify the project file name for each document type.

This mapping of each document type with the project file is provided in the BatchClassList>>BatchClass>>DocumentTypes on the Batch Class Management screen. Any “.rsp” file inside the “recostar-extraction” folder inside the batch class folder in shared folders appears in the dropdown and one can select the appropriate project file (.rsp file) in the following property: ‘Form Processing Project File’(See below):

DocumentType.jpg

This plugin only requires an image as an input (which is a PNG if color switch is ON and a TIFF if color switch is OFF). Hence one would require one of the plugins from: ‘Create OCR Input Plugin’/ ‘Create Display Image Plugin’ to run before it.

However, it is important that the document tag has been created in the batch.xml and also the document type has been selected appropriately for the batch. Hence, one should ideally place this plugin after page processing and document classification plugins are done with their processing and the manual Review stage has been crossed.

Dependency on shared folders

The batch class folder inside the main shared folder contains a folder by the name: ‘recostar-extraction’

This folder contains the project files from which a user can map the document type (for Recostar extraction).

Troubleshooting

S no. Error message Possible root cause
1. Invalid License. So could not be verified.
  • Network connection failure.
  • Recostar command is not valid.
  • License is not installed or invalid.
  • Tomcat server is not started.

 

2. Problem in verifying License Unable to connect with Ephesoft license server or some error occurred at Ephesoft license server side.
3. Unable to load Fpr.rsp file RSP file used for processing is invalid.
4. Exception while reading from XML Unable to process batch xml file or batch xml is invalid.
5. Image Processing or XML updating failed Unable to update batch xml.
6. File has invalid extension File processed by recostar has invalid extension.
7. Document Type could not be found for Page Invalid document being used for processing.
8. Unable to parse Orientation tag in Recostar xml file. Recostar xml file has invalid value for Orientation tag.
9. Unable to rotate the file:according to the values specified in its xml Recostar xml file has invalid value for rotation.

Recostar HOCR Plugin

Overview

The Recostar HOCR plugin by default is the part of page processing module of Ephesoft application. This plugin uses Recostar for generating HOCR files. This plugin reads the image files listed in the batch xml of a batch instance and generates HOCR file for each one of them.

Barcode values can be decoded with this plugin using the barcode enabled project file.

Steps of execution

  • This plug-in works in the page processing phase of the application when all the import processing on the batch has been done.
  • This plugin extracts the contents of the image using Recostar tool.
  • This plugin uses the RSP file present on the <Ephesoft Shared Folder>\{Batch Class}\recostar-extraction\*.rsp otherwise file present in the bin folder of the {Application}\native\RecostarPlugin\bin\*.rsp file will be used.
  • If barcode switch is ON, then the RSP file should be barcode enabled.

Configuration

Configurable Properties

Following are the list of configurable properties for the plugin:

BatchClassManagement RecostarHOCRPlugin.jpg

 

Configurable property
Type of value
Value options
Description
Recostar Project File Name List of values
  • Fpr.rsp
  • Fpr_MutliLanguage.rsp

 

This option is used to specify the project file name used to be performing OCRing.
Recostar color switch List of values
  • ON
  • OFF

 

If color switch is ON then PNG file will be used for OCRing.
Recostar Auto Rotate switch List of values
  • ON
  • OFF

 

This property is used to auto rotation of the input images on the basis of orientation provided by the recostar.
Recostar Switch List of values
  • ON
  • OFF

 

This switch is used to turn this plugin ON/OFF.
Barcode Switch List of values
  • ON
  • OFF

 

This property is used to read the barcode from the input images using the barcode enabled recostar project file e.g. “FPR_Barcode.rsp”
Recostar Valid Extensions List of values
  • tif
  • gif
  • png

 

Recostar can allow the above three formats for OCRing. One can configure the allowable format of image for OCRing in that plugin.

Dependency

This plugin only requires an image as an input (which is a PNG if color switch is ON and a TIFF if color switch is OFF). Hence one would require one of the plugins from: ‘Create OCR Input Plugin’/ ‘Create Display Image Plugin’ to run before it.

Dependency on shared folders

The batch class folder inside the main shared folder contains a folder by the name: recostar-extraction. This file contains the “Recostar Project file” as specified by the first property. If the file selected does not exist, the default file by the selected name present inside Recostar will be used for Recostar OCRing.

 

Troubleshooting

Following are few common error messages received due to mal-functioning of the OCRing:

 

S no. Error message Possible root cause
1. Invalid License. So could not be verified.
  • Network connection failure.
  • Recostar command is not valid.
  • License is not installed or invalid.
  • Tomcat server is not started.

 

2. Problem in verifying License Unable to connect with Ephesoft license server or some error occurred at Ephesoft license server side.
3. Unable to load Fpr.rsp file RSP file used for processing is invalid.
4. Exception while reading from XML Unable to process batch xml file or batch xml is invalid.
5. No valid extensions are specified in resources No valid extension is selected.
6. Image Processing or XML updating failed Unable to update batch xml.
7. File has invalid extension File processed by recostar has invalid extension.
8. Unable to parse Orientation tag in Recostar xml file. Recostar xml file has invalid value for Orientation tag.
9. Unable to rotate the file:according to the values specified in its xml Recostar xml file has invalid value for rotation.

Regular Regex Extraction Plugin

Overview

This plug-in performs the functionality of extracting the document level field’s value according to the regex pattern given. User can give a set of values as the regex pattern separated by semicolon. While extracting data, plugin breaks the regex pattern with respect to semicolon and the last part is treated as the pattern. It first matches the last part, if it matches with some value found then all the other parts are searched going from right to left to the left of the value found. While the last part is compared as regex pattern, rest of the parts is compared as words. When all the parts are found then the value is extracted. If even any one value is not found then the value is not extracted.

Example

Consider following value is specified for the pattern field of a document level field:

Invoice;Date;\d{1,2}[/]\d{1,2}[/]\d{2,4}

Plugin will use last value in the semi-colon separated list, i.e., \d{1,2}\d{1,2}\d{2,4} for value extraction.

Consider following data is supplied as input data, i.e., present in an image:

Case 1: Input Data: Invoice Date 21/03/2012

Result: This will extract 21/03/2012 successfully as Date and Invoice both are found to the left of extracted value 21/03/2102.

Case 2: Input Data:Date 21/03/2012

Result: Regex pattern will be matched in this case but data won’t be extracted as Invoice is not found to the left of Date.

Configuration

Plugin Configurations

Regular regex extraction can be configured at following UI:

BatchClassManagement RegularRegexExtractionPlugin.jpg

Properties description:

 

Configurable property Type of value Value options Description
Regular Regex Extraction Switch String
  • ON
  • OFF

 

The switch that describes that plug-in has to run or not.Default ON.
Regular Regex Confidence Score Integer 0 – 100 Acts as a multiplier for the confidence score calculated by matching regex.

To add/edit the regular expression required for the Regular Regex Extraction, user needs Add/Edit the corresponding document level field at following UI:

AddOrEditDLF.jpg

Upon Adding/Editing the document level field, following screen will be presented where regular expression can be entered in Pattern field:
AddOrEditDLF PatternField.jpg

Troubleshooting

Following are few common error messages seen due to mal-functioning of the plugin:

 

S no. Error message Possible root cause
1 Invalid input pattern sequence. This occurs if the entered regex pattern is not a valid pattern or is not of proper format.
2 No FieldType data found from data base for document type This happens when there are no field types initialized in a document.

Scripting Plugin

Overview

This plug-in reads the batch’s batch.xml file and works upon the given document as per the scripts given in the scripts folder. All the scripts are placed inside “<Ephesoft shared folder>\batch class folder (ex:-BC1)\scripts”.

For any script there are two ways to write it, either it could be written in “IScript” or “JDOM”. For running any type of script user needs to place the script inside the “scripts” folder inside batch class folder for the respective batch class on which the script needs to be run.

Configuration

Configurable properties

Following are the configurable properties available for the Scripting plugin in the dcma-scripting-plugin properties file in META_INF\dcma-scripting-plugin:

 

Configurable property Type of value Value options Description
Script Parser Type String
  • jdom
  • iscript

 

This value defines the type of scripts that will run. There could be two types of scripts that could be run i.e. JDOM and ISCRIPT. For script to run in JDOM user has to give the parser type as “jdom”. For script to run in ISCRIPT user has to give the value as “iscript”.Default jdom.
Script Switch String
  • ON
  • OFF

 

This switch is used to set the execution of scripts on or off. If this switch is off then no script will run otherwise scripts will run.Default ON.

This is shown in the screen shot given below:

BatchClassManagement ScriptingPluginConfiguration.jpg

Steps for configuring the plugin

  • User can set the script switch to on/off for running the scripts and for skipping the execution of scripts respectively.
  • If the script switch is on then the parser type mentioned in the “Script Parser Type” property defines the type of scripts given.
  • If the parser type is jdom then the JDOM scripts will run and if any script is present that runs for ISCRIPT then it will give errors and vice versa.

Steps of execution

  • Configure the plugin switch in the below configuration file i.e.

META-INF/dcma-scripting-plugin/dcma-scripting-plugin.properties file. Also give the parser type for the script to run.

  • Enter the desired script in the scripts folder of the batch class in which user wants to run the script in. There are predefined scripts present in the scripts folder for each batch class. These are the dummy scripts.
  • There is a set format for the naming of the scripts which will be picked as their names are configured. Therefore the names of the scripts need to be the same as in the scripts folder. For running any custom script, user needs to make changes to the present script or make its own custom script with same name as predefined scripts and replace the existing script.

Dependency

There is only one dependency of this plug-in. The “import-batch-folder” plug-in needs to be executed before “scripting-plugin” to generate the files required for processing of “scripting-plugin”. If the batch goes into “Error” state then proper logs will be generated in log file kept at {Application}\dcma-all.log.

NOTE: There are some scripts placed in the “scripts” folder which are required for the system.

 

Troubleshooting

Following are few common error messages seen due to mal-functioning of the plugin:

 

S no. Error message Possible root cause
1 Script having invalid parser type or invalid arguments. Throwing workflow in error This occurs if the entered parser type and the present script types does not match for ex. If parser given is dom and user puts in an iscript script then this error occurs.
2 Script error out. Throwing workflow in error. This happens when the custom script that has been put error out and needs to be corrected.

Search Classification Plugin

Overview

This plugins is responsible for classifying the Ephesoft documents using lucene based indexing for batch class.

This plugin is working on the two stages for classification of document:

  • Learning: Learning process is done generating indexes for documents. Generated indexes will be used as classifying the document. For further information of learning, please refer the document “Learning document”.
  • Classification: While classification a document using search classification plugin, learnt data is used as reference data for classification of document. While classification a document type, this plugin use the extracted HOCR content from the image and verifying the HOCR content to the learnt data in previous stage.

Using this plugin HOCR content should be generated in HOCR Generation plugin like “Recostar HOCR” and “Tesseract HOCR”.

Configuration

Configurable Properties

Following are the list of configurable properties for the plugin:

 

Configurable property Type of value Value options Description
Lucene Valid Extensions: String Ex: html, xml These are the valid extension of the input file for classification document type from specified file format.Default html, xml
Lucene Min Term Frequency Integer NA The frequency below which terms will be ignored in the source document.
Lucene Min Document Frequency Integer NA Sets the frequency at which words will be ignored which does not occur in at least this many documents.
Lucene Min Word Length Integer NA The minimum word length below which words will be ignored from the HOCR content.
Lucene Min Query Terms Integer NA The minimum number of query terms that will be included in any generated query.
Lucene Top Level Field String NA This property is used to configure default field for query terms.
Lucene No Of Pages Integer NA This property is used to specify the number of documents to be returned in a query search.
Lucene Index Fields String Ex: summary This property is used as index field for searching document type using lucene.
Lucene Stop Words String Ex: name; title This property is used to ignoring the word while classification of document.
Search Classification Switch String
  • ON
  • OFF

 

This property is used for ON/OFF the search classification plugin.Default ON
Search Classification Max Results Integer NA The maximum number of results will be generated from query.
First Page Confidence Score Value Integer NA This property is used for updating confidence score on the basis of the first page type.
Middle Page Confidence Score Value Integer NA This property is used for updating confidence score on the basis of the middle page type.
Last Page Confidence Score Value Integer NA This property is used for updating confidence score on the basis of the last page type.

This is shown in the screen shot given below:

BatchClassManagement SearchClassificationPlugin.jpg

Steps of execution

  • This plug-in works in the page process phase of the application when all the import processing on the batch has been done and it’s ready to be page processing.
  • Learning should be done on the batch class before using this plugin.
  • The plug-in classifying the input images via lucene based indexing.
  • After all the work is done, it writes the information into batch.xml file for the document type being classified.

Dependency

This plugin is dependent on the HOCR Generation plugin like Recostar, Tesseract. This plugin takes the HOCR file generated from Recostar and Tesseract as an input for Search Classification Plugin.

Troubleshooting

Following are few common error messages received due to mal-functioning of the plugin:

 

S no. Error message Possible root cause
1 No index files exist inside folder Learning is not done for the batch class.
2 Page Types not configured in Database. Invalid indexes present in the index data for the batch class.
3 CorruptIndexException while reading Index. Index data being corrupted in the index folder for the batch class.
4 IOException while reading Index Index data is unable to open due to get index file corruption or having lock on it.
5 No valid extensions are specified in resources Page contains invalid HOCR file for processing.
6 No pages found in batch XML. Pages tag not found the input batch.xml

Table Extraction Plugin

Overview

The plug-in is responsible for extracting data from the batch involving tabular data in the form of tables.

Table extraction can be performed using either one or combination (AND/OR) of following extraction techniques:

  • Column Header Validation
  • Column Coordinates Validation
  • Regex Validation

User can select which of the above extraction technique is to be used for table extraction.

User need to specify start and end pattern for the table. Data between the start and end pattern will be considered as table data. If no data is found matching the start pattern specified, no data will be extracted.

Characteristics

  • Every document will have one or more pages in it and algorithm will extract all tables present on document.
  • Document is parsed to identify tables starting from the first page to the last page of the document.
  • One table may span one or more pages.
  • User will provide some start and end pattern which will decide the data that is to be considered for table extraction.
  • Based on the table extraction API specified, extraction will be done using one or more of the following extraction methods:
    • Column Coordinates Validation
    • Column Header Validation
    • Regex Validation

Column Header Based Extraction

To extract data using column header, admin needs to define Column Header Pattern parameter for the table column.

Based on the column header pattern specified by admin, plugin will first search the data matching that regex pattern and if found, all the data below that column header would get extracted for that particular column.

 

Column Coordinates Based Extraction

This extraction method will extract the data based on the column coordinates specified by the admin. Data below the column coordinates will get extracted for that column.

For this type of extraction, start and end coordinates for the column are need to be specified. Data between the

Regex Based Extraction

In case of regex validation, data will be extracted on the basis of regex patterns defined for that column i.e., Column Pattern, Between Right pattern and Between Left pattern. Data will be extracted between start and end pattern only.

  • Column Pattern: Data matching with this column pattern will be extracted for that column.
  • Between Right Pattern: Data that is extracted by the column pattern should have a data to the right matching this between right pattern. Pattern specified must be single word capturing pattern only.
  • Between Left Pattern: Data that is extracted by the column pattern should have a data to the immediate left matching this between left pattern. Pattern specified must be single word capturing pattern only.

Note

  • If between right or between left pattern is specified but is not matched with the immediate right or left data, data will be extracted as invalid data.
  • Only single word capturing patterns are allowed for between left and between right pattern.

Configuration

Table Configuration

Add/Edit/Delete Table Info

User can add/edit/delete any table information upon clicking the corresponding buttons at following UI:

TableInfoListing AddOrEditOrDelete.jpg

Upon clicking the Add/Edit button, following UI will be presented where user can enter values for any property:

TableInfoConfiguration.jpg

Configurable Properties

Following are the list of configurable properties for the plugin:

 

Configurable property
Type of value
Value options
Description
Name String Name for the data table involved
Start Pattern String A valid expression A keyword or expression marking the beginning of the table. Correct start pattern must be specified for table data to be extracted. It can be validated using the check button.
End Pattern String A valid expression A keyword or expression marking the end of the table. It can be validated using the check button.
Table Extraction API Decides which automatic extraction API/APIs are to be used.

Table Column Configuration

Add/Edit/Delete Table Column Info

Table column information can be added/updated/deleted by clicking corresponding button at following UI:

TablecolumnInfo AddOrEditOrDelete.jpg

  • Upon clicking the add/edit button, following UI will be presented where user can add/edit table column fields:

TablecolumnInfoConfiguration.jpg

 

Configurable Properties

Following are the list of configurable properties for the plugin:

 

Configurable property
Type of value
Value options
Description
Column Name String NA This will keep the name of the column.
Column Pattern Regular Expression Valid regular expression This will keep the regex pattern for the column data.
Between Left Regular Expression Valid regular expression This will keep the regex pattern for validation for left column of the actual search column.
Between Right Regular Expression Valid regular expression This column will keep the regex pattern for validation for right column of the actual search column.
Column Header Pattern Regular Expression Valid regular expression Header pattern for column.
Start Coordinate Integer NA Start Coordinate for the column.
End Coordinate Integer NA End Coordinate for the column.
Required Radio button TrueFalse If radio button checked, each table row extracted must contain some valid data for that column. If invalid data is extracted for the column, corresponding row will not be added to table data.

Column Header Based Extraction

Enter column header regex pattern from following UI:

[Batch Class List]>>[Batch Class]>>[Document Type]>>[Table Info]>>[Table Column Info]>>Edit
ColumnHeaderBasedExtraction.jpg

There is a configurable property for table extraction using column header in

[Ephesoft-home]\WEB-INF\classes\META-INF\dcma-table-finder\*

tablefinder.gap_between_column_words=40

This value should be specified in pixels. In addition to words that are below the column header, all words (to the left or right) will also be extracted for the column in case gap between them and the extracted data is less than the value specified for gap_between_column_word.

Column Coordinates Based Extraction

Admin can set the column coordinates by clicking on Set Coordinates button at following:

[Batch Class List]>>[Batch Class]>>[Document Type]>>[Table Info]>>[Table Column Info]

ColumnCoordinatesBasedExtraction.jpg
On clicking the Set Coordinates button, new UI will open where user can select an image and select column coordinates by drawing a zone on image.
SetCoordinatesUI.jpg

  • User need to draw a rectangle to select the start and end column coordinates for selected column.
  • To select coordinates for other columns, select that column from the drop down list on left hand side. This drop down contains names of all the table columns for selected columns.
  • Clear Button: On clicking Clear button, coordinates for selected table column will be cleared.
  • Clear All Button: On clicking Clear All button, coordinates for all the table columns of the selected table will be cleared.

Regex Based Extraction

User needs to enter valid regex patterns for table and table columns for regex based extraction. Table should have valid start and end patterns whereas column pattern, between left pattern and between right patterns need to be specified for tables column.

Select table extraction technique to be used

Select any of three extraction techniques with AND/OR between them as shown below:

[Batch Class List]>>Edit Batch Class>>Edit Document Type>>Edit Table
TableExtractionTechnique.jpg

Dependencies

Table extraction plugin has following dependencies:

  • RECOSTAR_HOCR
  • TESSERACT_HOCR

One of the above plugins must be ON for key value learning as these plugins extract data from the image and create HOCR file which is required for the table extraction.

Troubleshooting

Following are few common areas for troubleshooting for table extraction plugin:

 

S no. Error message Possible root cause
1 Table info list is null or empty. No table is configured for the document type.
2 TableColumnsInfo list is null or empty. No table column is defined for table.
3 Invalid input pattern sequence. Patterns defined for table extraction are not valid.
4 Skipping Table extraction. Switch set as off. Table extraction switch is set to OFF.

Tesseract HOCR Plugin

Overview

The Tesseract HOCR plugin by default is a part of page processing.

This plugin reads the image files listed in the batch xml (of a batch), generates HOCR file for each one of them and updates its batch.xml.

Configuration

Configurable Properties

Following are the list of configurable properties for Tesseract HOCR plugin from UI:-

BatchClassManagement TesseractHOCRPlugin.jpg

 

Configurable property Type of value Value options Description
Tesseract Switch List of values
  • ON
  • OFF

 

This switch is used to turn this plugin ON/OFF. If this switch is OFF, this plugin won’t do anything.
Tesseract color switch List of values
  • ON
  • OFF

 

Tesseract is unable to read colored TIFFs. Hence, in case of colored images (i.e. when one switches ON the color switch), it send the PNGs for OCRing instead.Hence switching the color switch ON would be helpful for batch classes where one expects to have colored TIFF images.
Tesseract Language String NA This option provides the user an option to select the language one wants to use for OCRing. At present Tesseract supports only single language per image file OCRing.E.g.: specify ‘eng’ for English, ‘tur’– for Turkish etc.
Tesseract Version String NA This option provides the user an option to define the Tesseract version installed in system.E.g.: specify ‘tesseract_version_3’ for Tesseract 3.0, ‘tesseract_version_2’– for Tesseract 2.0 etc.
Tesseract Valid Extensions Multi-select
  • tif
  • gif
  • png

 

This property holds an integer value which decides on <some logic>. (Also mention range if applicable)

Steps of execution

  • This plug-in works in the page process phase of the application when all the import processing on the batch has been done and it’s ready to be page processing.
  • The plug-in does OCRing for all the input images.
  • After all the work is done, it writes the name of each HOCR file in its batch.xml and generates HOCR output in the form of html and HOCR.xml.

Dependency

This plugin only requires an image as an input (which is a PNG if color switch is ON and a TIFF if color switch is OFF). Hence one would require one of the plugins from: ‘Create OCR Input Plugin’/ ‘Create Display Image Plugin’ to run before it.

Troubleshooting

Following are few common error messages received due to mal-functioning of the plugin:

 

S no. Error message Possible root cause
1 Tesseract Base path not configured. Environment variable for Tesseract is either not set or path is configured incorrectly.
2 Space found in the name of image: xyz.png. So it cannot be processed Please remove spaces from image name and restart the batch from page process module.
3 No valid extensions are specified in resources Valid Extensions for input image files is not specified.
4 Image Processing or XML updating failed for image: xyz Image file given as input is having an extension other than specified in property ‘Tesseract Valid Extensions’

Create Display Image Plugin

Overview

This plugin performs the functionality of creating the display png files for the images being processed. This plugin takes all the images and create png files for each to be shown on the UI. It uses ImageMagick for converting files to png which will be used for OCRing when color switch is “ON”.

 

Configuration

Steps for configuring the plugin

  • User can select the page process and navigate to create display image plugin configuration page as shown below:

PDCreateDisplayImagePlugin.jpg

These are the configurations that are required for creating display image. The properties are non editable as the files created by using are required for further plugins.

Configurable Properties

This plugin has no configurable properties either on UI or in META-INF.

 

Steps of execution

  • Plug-in uses the type of file extension given in plugin properties.
  • While executing, ImageMagick parameters are used to generate the display png thumbnail files and the tif files used for comparing and OCRing, if color switch is on.
  • These files are then copied to the batch instance folder and their respective entries are made into batch.xml file.

Troubleshooting

Following are few common error messages seen due to mal-functioning of the plugin:

S no. Error message Possible root cause
1 No valid extensions are specified in resources. There are some corrupt values present in the database forthe extension types give in the configuration.
2 Problem generating thumbnails. Setting batch Status to error state. If ImageMagick encountered any issue during converting files to desire file types.

Create OCR input Plugin

Overview

This plugin is used for generating PNG files corresponding to input files. These input files may be tiff files or multipage tiff files. These PNG files are used for further processing and OCRing.  It uses ImageMagick for converting files to PNG which will be used for OCRing.

 

Configuration

Steps for configuring the plugin

  • User can select Page Process module and navigate to Create OCR input plugin configuration page as shown below:

PDCreateOCRinputPlugin.jpg

User cannot edit the above settings by clicking on “Edit” in order to change the settings as per his requirements.

Configurable Properties

This plugin has no configurable properties either on UI or in META-INF.

Steps of execution

  • Plug-in uses tiff files as input.
  • While executing, ImageMagick parameters are used to generate the OCR display PNG thumbnail files and the tiff files used for comparing and OCRing.
  • These files are then copied to the batch instance folder and their respective entries are made into batch.xml file.

Dependency

The plugin assumes the incoming batch has been imported properly and batch.xml is created successfully.

 

Troubleshooting

Following are few common error messages received due to malfunctioning of the plugin:

S no. Error message Possible root cause
1. Problem in generating PNG files. Some error occurred in generating PNG files.
2. Improper Folder Specified folder name-> Batch instance folder name is incorrect or does not exist. Make sure that sharedfolder path is mentioned correctly.
3. Problem generating list of files Batch instance folder name or path is incorrect.
4. command cannot be run ImageMagick is not working or ImageMagick configuration is not correct.

CSV File Creation Plugin

Overview

This plugin enables users to export the extracted metadata for a batch in a CSV format. It captures the extracted document level fields like “subpeona” on per output document basis and some batch specific fields like the date of processing, document type name etc. The generated CSV file will be exported to location configured by the “CSV Creation Final Export Folder” property. If the switch is ON then a csv file for batch is created.

 

Configuration

Steps for configuring the plugin

  • User can select the Export module and navigate to CSV File Creation plugin configuration page as shown below:

PDCSVFileCreationPlugin.jpg

The User can edit the above settings by clicking on “Edit” in order to change the settings for their requirements.

Configurable Properties

Following are the configurable properties available for the CSV File Creation plugin:

Configurable property Type of value Value options Description
CSV Creation Final Export Folder String path to export folder Ex : C:\ephesoft-data\csv-export-folder Folder in which the csv file created will be exported.
CSV Creation Switch List of values
  • ON
  • OFF
  • Determines whether the plug-in will run or not.
  • Default OFF.

 

Steps of execution

  • This plug-in works in the export phase of the application when all the processing on the batch has been done and it’s ready to be exported.

 

  • The plugin works to form a csv file is the plugin switch is “ON” otherwise no csv file is created.

 

  • The plugin uses the batch.xml file to create the csv file for the batch. It exports the document level fields present in the batch.xml to csv file.

 

  • After creation the file is copied to the final “CSV Creation Final Export Folder” configured.

Dependency

The plugin assumes the extraction for the incoming batch has been done properly and just changes the results of provided batch.xml in a desired format.

 

Troubleshooting

Following are few common error messages received due to mal-functioning of the plugin:

S no. Error message Possible root cause

1 CSV File Creation Export Folder value is null/empty from the database. Invalid

initializing of properties.

If the path configured is not valid.

2 Batch Document List is null or empty. If there are no documents in the batch.

 

Filebound Export Plugin

Overview

This plugin is used for uploading data to the file bound content management solution. It transforms the batch xml to a document file and exports it to the configured repository. We can upload multipage tiff and multipage pdf to the file bound content management solution.

 

Configuration

Steps for configuring the plugin

  • User can select the export module and navigate to Filebound export plugin configuration page as shown below:

PDFileboundExportPlugin.jpg

The User can edit the above settings by clicking on “Edit” in order to change the settings for their requirements.

Configurable Properties

Following are the configurable properties available for the Filebound Export plugin:

Configurable property Type of value Value options Description

File Bound Connection URL String Ex : C:\ephesoft-data\csv-export-folder Location url for the filebound repository.

File Bound User Name String Ex : admin The username for the repository authentication.

File Bound Password String Ex : password The password for the repository authentication.

Filebound Project Name String NA The project name for which the filebound repository is used.

Filebound index field String NA The indexing field that will be used from batch.xml to create indexes.

Filebound division String NA The division type that will be used for crating the document.

Filebound separator String NA The separator that will be used for breaking the docment.

Filebound Export Format List of values
  • pdf
  • tif
  • Determines which format of files has to be exported.
  • Default pdf.

File Bound Switch List of values
  • ON
  • OFF
  • Determines whether the plug-in will run or not.
  • Default OFF.

 

Steps of execution

  • This plug-in works in the export phase of the application when all the processing on the batch has been done and it’s ready to be exported.

 

  • The plugin works to form a document file and export to the repository given if the plugin switch is “ON”.

 

  • Then the document is made by using the configured filebound export format i.e. tif or pdf.

 

  • The created document is indexed by using the configured index field.

 

  • The document is then exported to the given url repository.

Troubleshooting

Following are few common error messages received due to mal-functioning of the plugin:

S no. Error message Possible root cause

1 Document Level fields are null. So cannot upload documents of batch

instance

If there is no document level fields in the batch class.

2 Project Name must not be null If there is no project name configured.

3 Connection url must not be null If there is no Connection url configured for repository.

4 Username must not be null If there is no username configured for authentication.

5 Password must not be null If there is no password configured for authentication.

6 Index Field must not be null If there is no index field configured.

7 Division must not be null If there is no division configured.

8 Separator must not be null If there is no separator configured.

9 Non-zero exit value for filebound command found. If the export of document on the server is unsuccessful.

 

IBM CM Plugin

Overview

This plugin is used to export batch XML in IBM content management schema format. Basically this plugin transforms batch xml to another XML acceptable by IBM Content Management.

 

Configuration

Steps for configuring the plugin

  • User can select the Export module and navigate to IBM CM plugin configuration page as shown below:

PDIBMCMPlugin.jpg

Users can edit the above settings by clicking on “Edit” in order to change the settings as per their requirements.

Configurable Properties

Following are the configurable properties available for the IBM CM plugin:

Configurable property Type of value Value options Description

IBM CM Final Export Folder String path to export folder Ex -C:\Ephesoft\SharedFolders\ibm-cm-export-folder Folder in which the file will be exported after transformation in desired format.

IBM CM Switch List of values
  • ON
  • OFF
  • Determines whether the plug-in will run or not.
  • Default OFF.

Property File Configuration

Property file:  {Ephesoft-Home}/WEB-INF/classes/META-INF/dcma-ibm-cm/ dcma-ibm-cm.properties

Configurable property Type of value Value options Description

ibm.cmod_app_group String NA Value for setting parameter cmod app group’s value in XML.

ibm.cmod_app String NA Value for setting parameter cmod app’s value in XML.

ibm.user_name String NA Value for setting parameter user name’s value in XML.

ibm.email String NA Value for setting parameter email’s value in XML.

ibm.supplying_system String NA Value for setting parameter for DAT file name’s value in XML.

Steps of execution

  • Plug-in uses batch xml file inside batch instance folder.
  • Batch XML is transformed in IBM content management schema format .This format is acceptable by IBM Content Management system. This plugin creates 3 files as the result of processing. Plugin creates one ctl file, one dat file and one xml file. Name of these files will be as below given format-
“name of batch folder” + “_”+ “batch instance identifier”+.ctl/.dat/.xml 
  • These files are then copied in IBM CM final export folder in a fixed format. Ex- Let user has 5 batch folders to import, named as ABC1, ABC2, ABC3, ABC4, and ABC5. After batch processing, IBM CM plugin creates ABC folder inside IBM CM export folder and subfolders for each batch instance on the basis of batch instance identifier. So in this ABC folder will has 5 subfolders BC1, BC2, BC3, BC4 and BC5. Each of these subfolder will has one ctl one dat and one xml file.

Dependency

The plugin assumes the extraction for the incoming batch has been done properly and just changes the results of provided batch.xml in a desired format.

 

Troubleshooting

Following are few common error messages received due to malfunctioning of the plugin:

S no. Error message Possible root cause

1. IBM Content Management Export Folder value is null/empty from the database. Invalid initializing of

properties.

IBM CM Export folder path is incorrect in database.

2. Could not find xsl file in the classpath resource ibmCMTransform.Xsl file is not present in classpath resource. May be jar file is not valid.

3. targetXMLPath is null. Unable to create directory Unable to create file in specified folder. Check for permission issue.

4. Unable to create directory IBM CM Export folder is not present and unable to create the folder.

5. Error in creating output xml file File not found at specified place.

6. Could not transform ibmCMTransform.xsl file Error occurred while transforming batch xml file.

7. Failed exporting batch instance for IBM Content Management If any of the above error occurred then this message will be logged in files.

NSI Export Plugin

Overview

This plugin is used for exporting zipped file for a batch. It transforms the batch xml to another xml format acceptable by NSI CMS and zips it along with multipage tiff and multipage pdf to NSI export folder location. This plugin is used when we need batch instance folder and a specific format for the batch xml created. NSI export transforms the batch xml to a specific format which is specified by NSI CMS. The batch instance folder is then zipped and along with the formatted batch xml and exported to the export location given in “NSI Export Folder” parameter.

 

Configuration

Steps for configuring the plugin

  • User can select the export module and navigate to NSI export plugin configuration page as shown below:

PDNSIExportPlugin.jpg

The User can edit the above settings by clicking on “Edit” in order to change the settings for their requirements.

Configurable Properties

Following are the configurable properties available for the NSI Export plugin:

Configurable property Type of value Value options Description

NSI Export Folder String Ex : C:\ephesoft-data\NSI-export-folder Folder in which the zipped file will be exported after transformation in desired

format. If NSI State Switch is “ON”.

Final NSI XML Name String Ex : _NSI.xml Name of the batch xml finally created after transformation into desired format.

NSI State Switch List of values
  • ON
  • OFF
  • Determines whether the plug-in will run or not.
  • Default OFF.

Steps of execution

  • This plug-in works in the export phase of the application when all the processing on the batch has been done and it’s ready to be exported.
  • The plugin works only if “NSI State Switch” property is “ON”.
  • The plug-in makes use of a predefined xsl to convert the batch xml file into a NSI supported format. And name the new xml file according to the user specified value in “Final NSI XML Name” property.
  • The converted batch.xml and the batch instance folder are used for exporting. The export path is given in the following property: NSI Export Folder.

Dependency

The plugin assumes the extraction for the incoming batch has been done properly. It is also dependent on “Create multipage plugin”. The NSI Export plugin requires the processing of “Create Multipage Plugin” and extraction module.

 

Troubleshooting

Following are few common error messages received due to mal-functioning of the plugin:

S no. Error message Possible root cause

1 NSI Export Folder value is null/empty from the database. Invalid initializing of

properties.

NSI Export Folder is either null or empty

2 Could not find xsl file in the classpath resource NSITransform.xsl cannot be located within the classpath

 

Tabbed PDF Plugin

Overview

This plugin is used to merge all multipage PDFs to form a single tabbed PDF based on Placeholder. Basically this plugin creates bookmarked PDF in configured export folder by merging all multipage PDFs.

 

Configuration

Steps for configuring the plugin

User can select the Export module and navigate to Tabbed PDF plugin configuration page as shown below:

PDTabbedPDFPlugin.jpg

Users can edit the above settings by clicking on “Edit” in order to change the settings as per their requirements.

Configurable Properties

Following are the configurable properties available for the Tabbed PDF plugin:

Configurable property Type of value Value options Description

Tabbed PDF Switch List of values
  • ON
  • OFF
  • Determines whether the plug-in will run or not.
  • Default OFF.

Tabbed PDF Export Folder String(Folder path) Ex – C:\Ephesoft3\SharedFolders\tabbed-pdf-export-folder Folder in which the output tabbed PDF and multipage tiff file (if created) will be stored

Tabbed PDF Placeholder List of values
  • YES
  • NO
  • If this switch is YES this plug-in will create the document map on the basis of order or priority of document types defined in export-script.properties  file. In this case a tab will be created in

document map for each document type and load the error PDF if the document is not present in batch xml.

  • If this switch is NO this plugin will create a document map on the basis of documents present in batch xml. Tabs will be created only for documents that are present in batch xml.
  • Default NO.

Tabbed PDF Property file String(File path) Ex – C:\Ephesoft\SharedFolders\property\export-script.properties This property file has document types and their priority in a predefined format.

Tabbed PDF Creation Parameters String Ex. -q -dBATCH -dNOPAUSE -sDEVICE=pdfwrite These parameters will be used by ghost script at the time of PDF creation.

Tabbed PDF Optimization Parameters String Ex. -q -dNODISPLAY -P- -dSAFER -dDELAYSAFER — pdfopt.ps These parameters will be used by ghost script at the time of PDF optimization.

PDF Optimization switch List of values
  • ON
  • OFF
  • Determines whether the PDF optimization will be performed or not.
  • Default ON.

 

Property File Configuration

Property file: 
{Ephesoft-Home}/WEB-INF/classes/META-INF/dcma-tabbed-pdf/dcma-tabbed-pdf.properties

Configurable property Type of value Value options Description

tabbed_pdf.ghost_script_command String
  • gswin64c
  • gswin32c
Ghost script command for Linux and Windows.

tabbed_pdf.unix_ghost_script_command String
  • gs
Ghost script command for Unix.

 

Steps of execution

  • Plug-in uses batch xml file, multipage PDF files and multipage tiff files inside batch instance folder.
  • Batch XML is changed as per the processing done by the plug-in.
  • All multipage PDF files are combined to create single PDF document and then copied to tabbed-pdf-export-folder inside shared folders. Name of PDF file will be as below given format-

“name of batch folder” + “_”+ “batch instance identifier”+.pdf

 

  • Finally first multipage tiff will be copied to tabbed-pdf-export-folder inside shared folder and rest of multipage tiffs will be lost.

Dependency

The plugin assumes that multipage PDFs are created by batch preprocessing. So create multi page files plugin should be in workflow of batch class because create multi page files plugin is responsible for creating multi page PDFs. Besides this export-script.properties file should be at right place with correct information.

 

Troubleshooting

Following are few common error messages received due to mal-functioning of the plugin:

S no. Error message Possible root cause
1. {folderPath} is not a Directory. Folder path configured by user is not valid.
2. Property file for documents not valid. Same priority is defined for more than one document type.
3. File does not exist. File Name=”{file-name}” Invalid PDF file name is mentioned in batch xml.
4. Error in writing pdfMarks file. Some error occurred while processing.
5. Enviornment Variable GHOSTSCRIPT_HOME not set GHOSTSCRIPT_HOME is not set in environment variable or in startup.bat file inside

{Ephesoft-home}\JavaAppServer\bin folder.

6. No ghostcript command specified in properties file. Either dcma-tabbed-pdf.properties file is not present or properties are missing in properties file.
7. Sample PdfMarks file not provided. pdfmarks.dat is not present at specified location.

Was this article helpful to you?

Engineering

Comments are closed.