This document describes Ephesoft’s failover mechanism which provides high-availability support for services like Folder Monitor service across servers and helps recover servers from crashes. If one of the servers fails then failover mechanism initializes another live server in multi-server environment and starts to provide (e.g. folder monitor) service through newly initialized server. Thus user experiences minimum or no disruption in services. At present two services are managed by failover mechanism i.e. folder monitor and application script.
Some services should keep running on one server at least in a multi-server environment keeping track of in-active servers and distributing the service among active servers.
All the servers with in multi-server set up will share the single instance of Ephesoft database which will be the mode of synchronization between the servers.
Ephesoft Database will be running in all circumstances and will form the basis of all communication between the servers.
Ephesoft database will contain two tables to keep track of number of servers and also which server is responsible for providing which service.
The two tables are as follows:-
- server_registry: To keep record of active and inactive servers.
|Field Name||Data Type||Description|
|ID||BIGINT||ID is primary key of server_registry|
|ip_address||VARCHAR(255)||Ip_address is the IP address of the server|
|app_context||VARCHAR(255)||app_context is the application context like /dcma|
|port_number||VARCHAR(255)||Port_number is the port number on which application is running like: 8080|
|is_active||BIT||Is_active is 1 for active server and 0 for in-active server|
- service_status: To keep track of which server is providing which service.
|Field Name||Data Type||Description|
|ID||BIGINT||ID is primary key of server_status|
|service_registry_id||BIGINT||Service_registry_id is the foreign key to server_registry table|
|service_type||VARCHAR(255)||Service_type is the type of service provided by the server|
A heartbeat service will be running on all the servers at different time intervals which will serve following purpose:-
- It will ping the servers listed in server_registry table on regular basis and update their status by updating the is_active field of table.
- It will update the service_status table and persist which service is being executed on which server.
Above figure shows a setup of a multi-server environment.
Following are the cases which will ensure a high-availability support for folder monitor service:-
When the server starts initially: Heart beat service at each server will check the service_status table synchronously and folder monitor service will be registered to the server on first come first serve basis. For example, if there are three servers say A, B, C and if serverA starts first, then serverA will provide the folder monitor service .It will make an entry in service_status table and all other servers will wait until serverA is down.
Please refer below:-
Ephesoft database will look like this:-
|1||1 (for A)||FOLDER_MONITOR|
When all the servers are running: Heartbeat service will keep checking the status of other servers in multi-server environment and if it detects the one of server server is down then it will query the service_status table synchronously to check whether there was any service being provided by that server. If any such service is found then the server which detected other server’s status will take the responsibility of providing the service which the in-active server was providing. It will also update the service_status table and service_registry table to synchronize the state and communicate other servers that responsibility is now being handled by it. For example, server A goes down and server C detects first that server A is down then server C will take the responsibility of providing (folder monitor) service. Please refer below:-
|1||3 (for C)||FOLDER_MONITOR|
When other servers detect that one of the servers has become in-active, they will also query the service_status table to check whether any service was being handled by the in-active server. But if they found that there was no service for which in-active server was responsible then they will continue to keep track for further failures. If the server that went down starts again then it will also wait for other servers to go down and this process will continue.
Failover mechanism will work the same way for all the other services under it.
By following the described approach Ephesoft application will ensure high-availability for different services.