The API performs extraction on the input document PDF or a ZIP file (enclosed single page or multipage tiff/tif or pdf). Extraction plugins are fetched from the batch class corresponding to the input batch class identifier. The extraction will be performed based on the extraction plugins configurations and rules configured for the particular batch class.
If the document type is given as an input parameter then document classification is not performed and extraction is performed as per specified document type, otherwise classification and extraction is performed on the input to generate the results. The web service returns a token that is used by other web service checkWSStatus to query the current running status of the web service.
Input Parameters
Input parameters to the Web Service API would be
INPUT PARAMETERS
1. PDF File (single or multipage)/ ZIP File (zip file may contain single page or multipage tif/tiff or pdf)
2. batchClassIdentifier: String parameter for batch class identifier
3. docType (optional parameter) if user enters a docType then no document classification is performed otherwise classification of the document will be performed.

4. downloadHocrif set to true, API pulls Batch.xml and HOCR file in a Zip file in web-service response.
Output Parameters
A numeric token that can be used to query the status of the running web service.
Web Service URL
http://:8080/dcma/rest/initiateOcrClassifyExtract
Example-
localhost:8080/dcma/rest/initiateOcrClassifyExtract

Checklist:
1. Extraction would be done only if Extraction module is configured for the particular batch class
2. Extraction would be performed only for the plugins which have extraction switch ON in batch class configuration.

Sample client code using apache commons http client:-

private static void initiateOcrClassifyExtract() {
HttpClient client = new HttpClient();
Credentials defaultcreds = new UsernamePasswordCredentials(“username”, “password”);
client.getState().setCredentials(new AuthScope(“serverName”, 8080), defaultcreds);
client.getParams().setAuthenticationPreemptive(true);
String url = "http://localhost:8080/dcma/rest/ initiateOcrClassifyExtract";
PostMethod mPost = new PostMethod(url);
// Adding HTML file for processing
File file1 = new File("C:\\sample\\US-Invoice.tiff");
Part[] parts = new Part[2];
try {
parts[0] = new FilePart(file1.getName(), file1);
// Adding parameter for batchClassIdentifier
parts[1] = new StringPart("batchClassIdentifier", "BC1");
MultipartRequestEntity entity = new MultipartRequestEntity(parts, mPost.getParams());
mPost.setRequestEntity(entity);
int statusCode = client.executeMethod(mPost);
if (statusCode == 200) {
System.out.println("Web service executed successfully..");
String responseBody = mPost.getResponseBodyAsString();
System.out.println(statusCode + " *** " + responseBody);
} else if (statusCode == 403) {
System.out.println("Invalid username/password..");
} else {
System.out.println(mPost.getResponseBodyAsString());
}
} catch (FileNotFoundException e) {
System.err.println("File not found for processing..");
} catch (HttpException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} finally {
if (mPost != null) {
mPost.releaseConnection();
}
}
}

Was this article helpful to you?

Engineering

Comments are closed.