These are the details on the batch.xml schema.

Batch Level Fields Description Created Module Assigned Module (Plugin)
<BatchInstanceIdentifier> Value of the [identifier] column in the batch_instance table. Each batch in Ephesoft has a unique batch Identifier. Folder Import Folder Import (Import_Batch_Folder_Plugin)
<BatchClassIdentifier> This is the value of the [identifier] column in the batch_class table. Each batch in Ephesoft is run under a batch class that is a single unit for all configurations and workflow definitions. A foreign key relation is established between the [ID] column of the batch_class table and the column [batch_class_id] in the batch_instance table. Folder Import Folder Import (Import_Batch_Folder_Plugin)
<BatchClassName> This is the value of the [batch_class_name] column in the batch_class table. Each batch in Ephesoft is run under a batch class that is a single unit for all configurations and workflow definitions. A foreign key relation is established between the [ID] column of the batch_class table and the column [batch_class_id] in the batch_instance table. Folder Import Folder Import (Import_Batch_Folder_Plugin)
<BatchClassDescription> This is the value of the [batch_class_description] column in the batch_class table. Each batch in Ephesoft is run under a batch class that is a single unit for all configurations and workflow definitions. A foreign key relation is established between the [ID] column of the batch_class table and the column [batch_class_id] in the batch_instance table. Folder Import Folder Import (Import_Batch_Folder_Plugin)
<BatchClassVersion> This is the version number of the batch class under which the batch was processed. This is the value of the [batch_class_version] column in the batch_class table. Each batch in Ephesoft is run under a batch class that is a single unit for all configurations and workflow definitions. A foreign key relation is established between the [ID] column of the batch_class table and the column [batch_class_id] in the batch_instance table. Folder Import Folder Import (Import_Batch_Folder_Plugin)
<BatchName> Value of the [batch_name] column in the batch_instance table. Folder Import Folder Import (Import_Batch_Folder_Plugin)
<BatchPriority> Value of the [batch_priority] column in the batch_instance table. Priority can be a value between 1 to 100 with the lower number having higher priority. If not assigned using custom code the batch priority will be the priority from the Batch Class which is assigned when the batch class is created or imported. Folder Import Folder Import (Import_Batch_Folder_Plugin)
<BatchCreationDate> This is the value of the [creation_date] column in the batch_class table. This is the date and time when the batch was created in Ephesoft. Folder Import Folder Import (Import_Batch_Folder_Plugin)
<BatchLocalPath> The is the Ephesoft system folder path where the Batch Instance folder will be available. This value will be the same across all batches in the system. Folder Import Folder Import (Import_Batch_Folder_Plugin)
<UNCFolderPath> This is the patch where the source file for the batch is available. This is a unique patch for each batch in the system. Folder Import Folder Import (Import_Batch_Folder_Plugin)
Document Level Fields
<Identifier> This is the Document Identifier for a document. The sequence for document numbering is DOC0, DOC1…. DOCnNote: Based on our current implementation where each file becomes a separate batch, there will be only one document in the XML after Folder Import. All pages will belong to this one document. The page to document grouping will change after the Document_Assembler_Plugin within the Page Process module is executed. The pages may be grouped into multiple documents. Folder Import 1. Folder Import (Import_Batch_Folder_Plugin)
2. Document Assembly (Document_Assembler_Plugin)
<Type> This is the Document Type assigned to the document.Note: Based on our current implementation where each file becomes a separate batch, there will be only one document in the XML after Folder Import. All pages will belong to this one document and named Unknown. The page to document grouping will change after the Document_Assembler_Plugin within the Page Process module is executed. The pages may be grouped into multiple documents and the document as which the pages were classified is assigned against this tag. The document types that belong to the batch class assigned to the batch is available in the database table document_type (field – document_type_name). This table has a foreign key reference to the [ID] column of the batch_class table that associates documents to batch class. Folder Import 1. Folder Import (Import_Batch_Folder_Plugin)
2. Document Assembly (Document_Assembler_Plugin)
<Description> This tag contains the corresponding document_type_description of the assigned document_type_name above. It is picked up from the database table document_type. Folder Import 1. Folder Import (Import_Batch_Folder_Plugin)
2. Document Assembly (Document_Assembler_Plugin)
<Confidence> This is the confidence with which the document was assembled. If the confidence is greater than the Minimum confidence threshold assigned to the document then the document is not marked for Operator REVIEW. Folder Import 1. Folder Import (Import_Batch_Folder_Plugin)
2. Document Assembly (Document_Assembler_Plugin)
<ConfidenceThreshold> Confidence threshold helps Ephesoft to decide if the document should skip document review automatically when the classification score is higher than the threshold. The document confidence threshold is available in the table document_type (field- min_confidence_threshold). The best practice is to set the threshold so that false positives are minimized. Document Assembly 1. Document Assembly (Document_Assembler_Plugin)
<Valid> This tag determines if the document would stop for Document Field Validation Review. Applicable only when data extraction is part of the batch class. False indicates that the document has fields that need to stop for Document Field Validation review. True indicates that all fields in the document were extracted with high confidence and need not stop for Document Field Validation review. The value is set to True after execution of Review_Document_Plugin if the extraction module is not configured in the batch class. Document Assembly 1. Document Assembly (Document_Assembler_Plugin)
2. Review Document (Review_Document_Plugin)
<Reviewed> This tag determines if the document would stop for Document Classification Review. False indicates that the document was assembled/classified with low confidence and needs to stop for Document Classification review. True indicates that the document was assembled/classified with high confidence and need not stop for Document Classification Review. The value is set to True after execution of Review_Document_Plugin. Document Assembly 1. Document Assembly (Document_Assembler_Plugin)
2. Review Document (Review_Document_Plugin)
<ErrorMessage/> A string that contains error message to be displayed on RV screen corresponding to a document. Value of this tag can be set using a scripting plugin. Ephesoft don’t set value for this field. Review/ Validation Review/ Validation
<DocumentDisplayInfo/> It can be used to provide customized names to documents on RV screen. Value of this tag can be set using a scripting plugin. Ephesoft don’t set value for this field. Review/ Validation Review/ Validation
Page Fields
<Identifier> This is the Document Identifier for a document. The sequence for document numbering is PG0, PG1…. PGnNote: The folder import module breaks up each page in the source PDF into individual TIFF files. Each TIFF file is a page in the XML. Pages can be grouped as documents. Folder Import Folder Import (Import_Batch_Folder_Plugin)
<OldFileName> This tag contains the name of the mapped individual TIFF file within the Input folder for the batch. The input folder path is available in the tag <UNCFolderPath> under batch level fields. Folder Import Folder Import (Import_Batch_Folder_Plugin)
<NewFileName> This tag contains the name of the mapped individual TIFF file within the Ephesoft system folder. The ephesoft system folder path is available in the tag <BatchLocalPath>. The path to the Batch instance folder is <BatchLocalPath>\<BatchInstanceIdentifier>. The name of the associated file to this page is a combination of the batch instance identifier and the page sequence. Folder Import Folder Import (Import_Batch_Folder_Plugin)
<HocrFileName> The Recostar_HOCR_Generation_Plugin extracts the contents of each page (individual TIFF) using Recostar. The contents are stored in an XML file which is located in the batch instance folder (<BatchLocalPath>\<BatchInstanceIdentifier>). This tag stores the name of the HOCR xml file for the corresponding page. Page Process Page Process (Recostar_HOCR_Generation_Plugin)
<ThumbnailFileName> The Image_Process_Create_Thumbnails_Plugin plug-in is used to create thumbnail images of the batch images. These thumbnails are displayed in Review and Validate screen, where pages in the documents are shown as thumbnails under the document name. The thumbnails are stored in the batch instance folder (<BatchLocalPath>\<BatchInstanceIdentifier>). This tag stores the name of the corresponding thumbnail for the page. Page Process Page Process (Image_Process_Create_Thumbnails_Plugin)
<DisplayFileName> The Image_Process_Create_Display_Image_Plugin performs the functionality of creating the display png files for the images being processed. This plugin takes all the images and creates png files for the corresponding pages and is displayed on the Review and Validate UI screens. The display images are stored in the batch instance folder (<BatchLocalPath>\<BatchInstanceIdentifier>). This tag stores the name of the corresponding display image for the page. Page Process Page Process (Image_Process_Create_Display_Image_Plugin)
<OCRInputFileName> This tag stores the name of the file that was used by the Recostar_HOCR_Generation_Plugin to extract the contents of the page. The image will be the corresponding individual TIFF for the page available in the batch instance folder (<BatchLocalPath>\<BatchInstanceIdentifier>). Page Process Page Process (Recostar_HOCR_Generation_Plugin)
<Direction> It tells the direction of rotated document. Folder Import Folder Import (Import_Batch_Folder_Plugin)
<IsRotated> It tells whether a document is rotated on RV screen or not. Folder Import Folder Import (Import_Batch_Folder_Plugin)
Page Level Fields
<Name> This tag contains the name of the classification used to classify this page. Page Process Page Process (Classification Plugins)
<Value> Each document_type within the batch class is sub divided into pages (FIRST, MIDDLE & LAST). This tag holds the document page that this page was classified as. Page Process Page Process (Classification Plugins)
<Type/> It is used in barcode classification only where it keeps information about barcode type. Page Process Page Process (Classification Plugins)
<Confidence> This tag holds the confidence score with which the page was classified. This confidence is used while assembling document in Document assembly module. Page Process Page Process (Classification Plugins)
<LearnedFileName> This tag holds the name of the lucene-search-classification-sample against which this page/image matched to. Page Process Page Process (Classification Plugins)
<AlternateValues> It tell about Alternate Values for a page level field. Generally, it store alternative classification information with its confidence. While classification, a page can be classified into 10 different types. The type having highest confidence value will be set in Page level field value and all other possible types for a page will present in alternative values. This tag will also contain the <LearnedFileName> tag for all the alternate values. Page Process Page Process (Classification Plugins)
PS: Page level field holds information about pages processed by Ephesoft. It generally holds information generated by classification plugins in page process module. Each classification plugin will have a page level field tag in batch xml. Document level field tags consists of information about the document level fields configured in batch class. Behind the scene Page level Fields and Document level Field has same schema that’s why both of them display same fields, but not all of them serve some purpose in both the cases. For example OcrConfidenceThreshold, OcrConfidence, FieldValueChangeScript fileds have their significance in document level fields only. LearnedFileName serves its purpose in Page level field only.

Was this article helpful to you?

J.D. Abbey

Comments are closed.