Overview

Ephesoft Mobile web service is backbone of Ephesoft SnapDOC Mobile Application. It is responsible for all the server side processing of the SnapDOC application starting from converting generated pdf files to single page TIF files, OCRing the TIF files, classifying the files into preconfigured document types and finally extracting document level fields value from the HOCR content.

Mobile web services are called by Ephesoft SnapDOC mobile application internally and result returned by the web service is rendered to the user on mobile application.

A final resultant batch.xml is generated and saved along with the generated PDF file to Ephesoft Batch Class UNC folder if user uploads the Batch after reviewing the extraction results.

 

Pre-requisites

Web Service License Purchased and ON. You may contact the licenses/sales department for more details.

Configuration

To configure mobile web service on an Ephesoft application server, following listed steps need to be followed:

  1. Enable CORS (Cross Origin Resource Sharing) on tomcat server.
  2. If web server is configured to access Ephesoft, CORS need to be enabled on webserver also.
  3. Deploy Mobile web service jar on Ephesoft server.
  4. Add following line to applicationContext.xml file after web service import.

<import resource=”classpath:/META-INF/applicationContext-mobilewebservice.xml” />

  1. Configure a batch class for Mobile web service. Since mobile web service uses KV extraction for extracting document level fields, the configured batch class needs to have KV rules defined for document level fields.

Steps for enabling CORS on tomcat server

  1. Stop tomcat server.
  2. Copy cors-filter-1.8.jar and java-property-utils-1.9.jar files to {Ephesoft_home}/JavaAppServer/lib/* folder. Download Link
  1. Add following lines to web.xml file present in {Ephesoft_home}/JavaAppServer/conf/* folder to enable CORS on the server.

<filter>

<filter-name>CORS</filter-name>

<filter-class>com.thetransactioncompany.cors.CORSFilter</filter-class>

<init-param>

<param-name>cors.allowOrigin</param-name>

<param-value>*</param-value>

</init-param>

<init-param>

<param-name>cors.supportsCredentials</param-name>

<param-value>true</param-value>

</init-param>

<init-param>

<param-name>cors.supportedHeaders</param-name>

<param-value>Content-Disposition,Content-Type,accept, authorization, origin</param-value>

</init-param>

<init-param>

<param-name>cors.supportedMethods</param-name>

<param-value>GET, POST, HEAD, OPTIONS</param-value>

</init-param>

</filter><filter-mapping>

<filter-name>CORS</filter-name>

<url-pattern>/*</url-pattern>

</filter-mapping>

  1. Start tomcat server.

Steps for enabling CORS on web server

  1. Stop apache server.
  2. Update <Ephesoft installation>\Apache2.2\conf\httpd.conf
    1. Add the required modules
      1. Add “LoadModule headers_module modules/mod_headers.so” statement to the load modules section in the file. Please check the file to avoid duplication of the statement.

    1.  Add the required headers
      1. Locate the “<Directory>” tag.
      2. Set its value to

Header set Access-Control-Allow-Origin “*”

Header set Access-Control-Allow-Methods GET, POST

Header set Access-Control-Allow-Headers Content-Disposition,Content-Type,accept,authorization,origin

Header set Access-Control-Allow-Credentials true

Options FollowSymLinks

AllowOverride None

Order deny,allow

Allow from all

      1.  “Header set Access-Control-Allow-Origin” can be set to allow the specific URL. E.g. Header set Access-Control-Allow-Origin http://foo.example
    1. Start the Apache server.
    2. A sample httpd.conf file is attached below.

Web services

Ephesoft SnapDOC uses following two mobile web services.

  1. Execute Mobile Upload.
  2. Save Mobile Batch.
  3. ocrClassifyExtract
  4. initiateOcrClassifyExtract
  5. checkWSStatus

Execute Mobile Upload web service

URL: http://<server_name>:<port_no.>/dcma/rest/executeMobileUpload

Input Parameters:

  1. batchClassIdentifier (Batch Class Id)
  2. Input files. (Single page pdf files)

Output result: An xml file (Ephesoft batch.xml)

Working: When documents are captured using SnapDOC mobile application, a pdf file is generated for each page that is scanned or captured using mobile camera. All those generated pdfs are submitted to mobile web service for further processing. Mobile web service performs following operation on the input files.

  1. Convert input pdfs into single page TIF files so that those TIF files can be processed further.
  2. After conversion of input pdf into TIF, OCRing is performed on those TIF files using configured OCR engine in batch class (Tesseract, Recostar and Nuance OCR engine).
  3. Once OCRing results are available, input files are clubbed and classified under different document type configured in the Batch Class.
  4. After input files are classified into different document type successfully, KV extraction is performed on OCR result for the classified document type.
  5. Result of KV extraction is sent back to the mobile application client in form of xml response.
  6. Extraction Result obtained from xml response is presented to user for verification.

As it is mentioned above, Mobile web service uses only KV extraction for extracting document level fields. The batch class that is to be used for Mobile upload processing must have KV extraction rules for document level fields.

Save Mobile Batch web service

URL: http://<server_name>:<port_no.>/dcma/rest/saveMobileUpload

Input Parameters:

  1. batchClassIdentifier (Batch Class Id)
  2. Input files (All input pdf files and xml file)

Working: This Web service copies input pdf files and xml file that was returned as response of previous web service call and subsequently modified by user to correct extracted document level fields value, to batch class UNC folder.

OcrClassifyExtract web service

Overview

The new API performs extraction on the input document PDF or a ZIP file (enclosed single page or multipage tiff/tif or pdf). Extraction plugins are fetched from the batch class corresponding to the input batch class identifier. The extraction will be performed based on the extraction plugins configurations and rules configured for the particular batch class.

If the document type is given as an input parameter then document classification is not performed and extraction is performed as per specified document type, otherwise classification and extraction is performed on the input to generate the results.

Input Parameters

Input parameters to the Web Service API would be

INPUT PARAMETERS

1. PDF File (single or multipage)/ ZIP File (zip file may contain single page or multipage tif/tiff or pdf)

2. batchClassIdentifier: String parameter for batch class identifier

3. docType (optional parameter) if user enters a docType then no document classification is performed otherwise classification of the document will be performed.

Output Parameters

Batch XML will be output for the web service.

Web Service URL

http://<HOSTNAME>:8080/dcma/rest/ocrClassifyExtract

Example-

localhost:8080/dcma/rest/batchClass/ocrClassifyExtract

Checklist:

  1. Extraction would be done only if Extraction module is configured for the particular batch class
  2. Extraction would be performed only for the plugins which have extraction switch ON in batch class configuration.

Sample client code using apache commons http client:-

private static void ocrClassifyExtract() {

HttpClient client = new HttpClient();

String url = “http://localhost:8080/dcma/rest/ocrClassifyExtract”;

PostMethod mPost = new PostMethod(url);

// Adding HTML file for processing

File file1 = new File(“C:\\sample\\US-Invoice.tiff”);

Part[] parts = new Part[2];

try {

parts[0] = new FilePart(file1.getName(), file1);

// Adding parameter for batchClassIdentifier

parts[1] = new StringPart(“batchClassIdentifier”, “BC1”);

MultipartRequestEntity entity = new MultipartRequestEntity(parts, mPost.getParams());

mPost.setRequestEntity(entity);

int statusCode = client.executeMethod(mPost);

if (statusCode == 200) {

System.out.println(“Web service executed successfully..”);

String responseBody = mPost.getResponseBodyAsString();

System.out.println(statusCode + ” *** ” + responseBody);

} else if (statusCode == 403) {

System.out.println(“Invalid username/password..”);

} else {

System.out.println(mPost.getResponseBodyAsString());

}

} catch (FileNotFoundException e) {

System.err.println(“File not found for processing..”);

} catch (HttpException e) {

e.printStackTrace();

} catch (IOException e) {

e.printStackTrace();

} finally {

if (mPost != null) {

mPost.releaseConnection();

}

}

}

initiateOcrClassifyExtract

Overview

The API performs extraction on the input document PDF or a ZIP file (enclosed single page or multipage tiff/tif or pdf). Extraction plugins are fetched from the batch class corresponding to the input batch class identifier. The extraction will be performed based on the extraction plugins configurations and rules configured for the particular batch class.

If the document type is given as an input parameter then document classification is not performed and extraction is performed as per specified document type, otherwise classification and extraction is performed on the input to generate the results. The web service returns a token that is used by other web service checkWSStatus to query the current running status of the web service.

Input Parameters

Input parameters to the Web Service API would be

1. PDF File (single or multipage)/ ZIP File (zip file may contain single page or multipage tif/tiff or pdf)

2. batchClassIdentifier: String parameter for batch class identifier

3. docType (optional parameter) if user enters a docType then no document classification is performed otherwise classification of the document will be performed.

Output

A numeric token that can be used to query the status of the running web service.

Web Service URL

http://<HOSTNAME>:8080/dcma/rest/initiateOcrClassifyExtract

Example-

localhost:8080/dcma/rest/batchClass/initiateOcrClassifyExtract

Checklist:

  1. Extraction would be done only if Extraction module is configured for the particular batch class
  2. Extraction would be performed only for the plugins which have extraction switch ON in batch class configuration.

Sample client code using apache commons http client:-

private static void ocrClassifyExtract() {

HttpClient client = new HttpClient();

String url = “http://localhost:8080/dcma/rest/ initiateOcrClassifyExtract “;

PostMethod mPost = new PostMethod(url);

// Adding HTML file for processing

File file1 = new File(“C:\\sample\\US-Invoice.tiff”);

Part[] parts = new Part[2];

try {

parts[0] = new FilePart(file1.getName(), file1);

// Adding parameter for batchClassIdentifier

parts[1] = new StringPart(“batchClassIdentifier”, “BC1”);

MultipartRequestEntity entity = new MultipartRequestEntity(parts, mPost.getParams());

mPost.setRequestEntity(entity);

int statusCode = client.executeMethod(mPost);

if (statusCode == 200) {

System.out.println(“Web service executed successfully..”);

String responseBody = mPost.getResponseBodyAsString();

System.out.println(statusCode + ” *** ” + responseBody);

} else if (statusCode == 403) {

System.out.println(“Invalid username/password..”);

} else {

System.out.println(mPost.getResponseBodyAsString());

}

} catch (FileNotFoundException e) {

System.err.println(“File not found for processing..”);

} catch (HttpException e) {

e.printStackTrace();

} catch (IOException e) {

e.printStackTrace();

} finally {

if (mPost != null) {

mPost.releaseConnection();

}

}

}

checkWSStaus

Overview

This web service API is used to query the current status of the web service. This web service is only useful in combination with initiateOcrClassifyExtract web service. Using this web service the user can figure out the completed state of the ocrClassifyExtract web service request. On running the initiateOcrClassifyExtract web service, a numeric token is returned. The checkWSStatus web service uses this token to determine the current operational state the initiateOcrClassifyExtract web service is into. The various states a web service can be into are

  • SPLIT_COMPLETED
  • OCRing_COMPLETED
  • CLASSIFICATION_COMPLETED

Or batch.xml after performing the extraction.

Input Parameters

Input parameters to the Web Service API would be a valid numeric token. ocrToken

Output Parameters

The outputs to the web service varies as follows.

When the initiateOcrClassifyExtract web service is performing split, OCR or classification results would be SPLIT_COMPLETED, OCRing_COMPLETED and CLASSIFICATION_COMPLETED respectively. When the web service completes the extraction, the result is a valid xml file with extracted fields and values.

Service Request : GET

Web Service URL

http://<HOSTNAME>:8080/dcma/rest/checkWSStatus

Example-

localhost:8080/dcma/rest/checkWSStatus

Sample client code using apache commons http client:-

private static void checkWSStatus () {

HttpClient client = new HttpClient();

String url = “localhost:8080/dcma/rest/checkWSStatus?ocrToken=1233232323”;

GetMethod getMethod = new GetMethod(url);

int statusCode;

try {

statusCode = client.executeMethod(getMethod);

if (statusCode == 200) {

System.out.println(“Web service executed successfully.”);

String responseBody = getMethod.getResponseBodyAsString();

System.out.println(statusCode + ” *** ” + responseBody);

} else if (statusCode == 403) {

System.out.println(“Invalid username/password.”);

} else {

System.out.println(getMethod.getResponseBodyAsString());

}

} catch (HttpException e) {

e.printStackTrace();

} catch (IOException e) {

e.printStackTrace();

} finally {

if (getMethod != null) {

getMethod.releaseConnection();

}

}

}

Dependency

The mobile web service need CORS to be enabled on tomcat server or web server if access to Ephesoft is configured through web server. Without CORS enabled an error will be displayed.

 

<Back| 4.0.0.0 Release Documentation

Was this article helpful to you?

wikiadmin

Comments are closed.