Overview

The new API performs extraction on the input document PDF or a ZIP file (enclosed single page or multipage tiff/tif or pdf). Extraction plugins are fetched from the batch class corresponding to the input batch class identifier. The extraction will be performed based on the extraction plugins configurations and rules configured for the particular batch class.

If the document type is given as an input parameter then document classification is not performed and extraction is performed as per specified document type, otherwise classification and extraction is performed on the input to generate the results.

Classification Type’s Supported by API

  1. SearchClassification
  2. MultidimensionClassification
  3. ImageClassification
  4. KeywordClassification
  5. AutomaticClassification
  6. BarcodeClassification

Input Parameters

Input parameters to the Web Service API would be

INPUT PARAMETERS

1. PDF File (single or multipage)/ ZIP File (zip file may contain single page or multipage tif/tiff or pdf)

2. batchClassIdentifier: String parameter for batch class identifier

3. docType (optional parameter) if user enters a docType then no document classification is performed otherwise classification of the document will be performed.

4. downloadHocr : if set to true, API pulls Batch.xml and HOCR file in a Zip file in web-service response.

Output Parameters

Batch XML will be output for the web service.

Web Service URL

http://<HOSTNAME>:8080/dcma/rest/ocrClassifyExtract

Example-

localhost:8080/dcma/rest/ocrClassifyExtract

Checklist:

  1. Extraction would be done only if Extraction module is configured for the particular batch class
  2. Extraction would be performed only for the plugins which have extraction switch ON in batch class configuration.

Sample client code using apache commons http client:-

private static void ocrClassifyExtract() {

HttpClient client = new HttpClient();

String url = “http://localhost:8080/dcma/rest/ocrClassifyExtract”;

PostMethod mPost = new PostMethod(url);

// Adding HTML file for processing

File file1 = new File(“C:\\sample\\US-Invoice.tiff”);

Part[] parts = new Part[2];

try {

parts[0] = new FilePart(file1.getName(), file1);

// Adding parameter for batchClassIdentifier

parts[1] = new StringPart(“batchClassIdentifier”, “BC1”);

MultipartRequestEntity entity = new MultipartRequestEntity(parts, mPost.getParams());

mPost.setRequestEntity(entity);

int statusCode = client.executeMethod(mPost);

if (statusCode == 200) {

System.out.println(“Web service executed successfully..”);

String responseBody = mPost.getResponseBodyAsString();

System.out.println(statusCode + ” *** ” + responseBody);

} else if (statusCode == 403) {

System.out.println(“Invalid username/password..”);

} else {

System.out.println(mPost.getResponseBodyAsString());

}

} catch (FileNotFoundException e) {

System.err.println(“File not found for processing..”);

} catch (HttpException e) {

e.printStackTrace();

} catch (IOException e) {

e.printStackTrace();

} finally {

if (mPost != null) {

mPost.releaseConnection();

}

}

}

Was this article helpful to you?

Engineering

Comments are closed.