Reporting Revamp

The Reporting Module has been revamped for performance and long term reliability. It improved the overall backend workflow for the Reporting Module. The major improvement areas are:

  • Advanced Reporting Clean Up logic
  • ETL Engine Version Upgrade
  • Transactional Behaviour for Maria DB/MySQL
  • New Purging Mechanism

Changes/Improvements

Advanced Reporting Clean Up Logic

Based on the types of Batches being run, a large amount of data is processed and stored in the Database during the calculation of Advanced Reports. There are certain final tables where processed data is stored in its final form. And there are some intermediate tables which are used by the scripts while processing.

These intermediate tables tend to become extremely large and contain data that will not be used by the scripts in the future. Hence, these tables are now periodically cleaned up to make the database as clean as possible.

ETL Engine Version Upgrade

We have upgraded the ETL engine from version 5 to version 6. This has provided a significant boost in overall performance and memory/CPU consumption.

Transactional Behaviour for Maria DB/MySQL

Improvements have been made for scripts running with MariaDB/MySQL so that in the case of the server being shut down in the middle of any job execution, data consistency is maintained.

New Feature

Purging Mechanism

A Purging mechanism has been introduced to the Ephesoft Reporting Module. This feature will archive data periodically. It was observed that a heavy usage customer with a large cluster would generate a huge amount of Reporting Data. Over a period of several months, this dataset can be large enough to slow down the reporting scripts.

Hence, a mechanism has been developed that will periodically archive the data from the working Database to an archival Database. The frequency of this archival can be controlled by the user.

Also, the user can decide the extent of archival. This means the user can control the amount of data to be archived based on how old it is.

Configurations

The user can control the Purging mechanism using the following parameters:

Location: <Ephesoft_Home>\Application\WEB-INF\classes\META-INF\dcma-reporting

Filename: dcma-reporting.properties

Property name: “dcma.report.purging.cronExpression” (Default Value=Every Third month on the 1st Day at 12am)

This property can be configured to schedule the Purging Job using regular cron expressions.

Filename: etl-variables.properties

Property Name: “reporting.purge_duration” (Default Value = 90 days)

Suppose the value of this property is 90. This means that at every scheduled purge cycle, data OLDER than 90 days will be archived. Recent Data (<90 days) will be retained.

If a user wants to archive the complete data every time, they can set the value of the property to 0.

The “reporting.purge_duration” property denotes the time period (in days) prior to which all data will be purged.

E.g. If purge_duration = 5 days, batches with creation_date older than Current date will be purged.

Taking current date as 06-06-2016 5:00pm.

Batch Instance with creation date = 30-04-2016 1:00pm will be Purged

Batch Instance with creation date = 31-04-2016 1:00pm will be Purged

Batch Instance with creation date = 31-04-2016 8:00pm will NOT be Purged (number of days = 5, taking difference of 06-06-2016 5:00pm and 31-04-2016 8:00pm)

Batch Instance with creation date = 01-06-2016 1:00pm will NOT be Purged (number of days = 5, not greater than 5)

Batch Instance with creation date = 01-06-2016 5:30pm will NOT be Purged

Batch Instance with creation date = 03-06-2016 1:00pm will NOT be Purged

After purging, the Dashboard data is recalculated for remaining data.

Note: Batches that are in non-FINISHED state and are purged will be added back into original report DB. This is because the Status of these batches may change later on and reporting jobs need to process these changes. E.g. For a batch in the READY_FOR_REVIEW state, purging will copy the data into archive DB. But, in the next run of Dashboard job, this batch will be repopulated into Report DB. This is to make sure that once the state of batch changes to RUNNING, correct data is displayed on the reporting UI.

Steps to Use

  • Configure purge_duration according to requirement
  • Configure dcma.report.purging.cronExpression according to requirement
  • After every Purge Cycle, data will be moved to archive Database.
  • All remaining reports will be recalculated on the basis of remaining data in original Reports Database.

Was this article helpful to you?

wikiadmin

Comments are closed.