Overview

This document describes Ephesoft’s failover mechanism which provides high-availability support for services like Folder Monitor service across servers and helps recover servers from crashes. If one of the servers fails then failover mechanism initializes another live server in multi-server environment and starts to provide (e.g. folder monitor) service through newly initialized server. Thus user experiences minimum or no disruption in services. At present two services are managed by failover mechanism i.e. folder monitor and application script.

Problem Statement

Some services should keep running on one server at least in a multi-server environment keeping track of in-active servers and distributing the service among active servers.

Solution

Assumption

All the servers with in multi-server set up will share the single instance of Ephesoft database which will be the mode of synchronization between the servers.

400px-3.1_FailoverMechanism_10001

 

Ephesoft Database will be running in all circumstances and will form the basis of all communication between the servers.

 

Approach

Ephesoft database will contain two tables to keep track of number of servers and also which server is responsible for providing which service.

The two tables are as follows:-

  • server_registry: To keep record of active and inactive servers.
Field Name Data Type Description
ID BIGINT ID is primary key of server_registry
ip_address VARCHAR(255) Ip_address is the IP address of the server
app_context VARCHAR(255) app_context is the application context like /dcma
port_number VARCHAR(255) Port_number is the port number on which application is running like: 8080
is_active BIT Is_active is 1 for active server and 0 for in-active server
  • service_status: To keep track of which server is providing which service.
Field Name Data Type Description
ID BIGINT ID is primary key of server_status
service_registry_id BIGINT Service_registry_id is the foreign key to server_registry table
service_type VARCHAR(255) Service_type is the type of service provided by the server

 

A heartbeat service will be running on all the servers at different time intervals which will serve following purpose:-

  1. It will ping the servers listed in server_registry table on regular basis and update their status by updating the is_active field of table.
  2. It will update the service_status table and persist which service is being executed on which server.

 

400px-3.1_FailoverMechanism_10002

 

Above figure shows a setup of a multi-server environment.

Following are the cases which will ensure a high-availability support for folder monitor service:-

When the server starts initially: Heart beat service at each server will check the service_status table synchronously and folder monitor service will be registered to the server on first come first serve basis. For example, if there are three servers say A, B, C and if serverA starts first, then serverA will provide the folder monitor service .It will make an entry in service_status table and all other servers will wait until serverA is down.

Please refer below:-

 

400px-3.1_FailoverMechanism_10003

 

Ephesoft database will look like this:-

server_registry

ID IP_ADDRESS PORT IS_ACTIVE APP_CONTEXT
1 A 8080 1 /dcma
2 B 8080 1 /dcma
3 C 8080 1 /dcma

service_status

ID SERVER_REGISTRY_ID SERVICE_TYPE
1 1 (for A) FOLDER_MONITOR

When all the servers are running: Heartbeat service will keep checking the status of other servers in multi-server environment and if it detects the one of server server is down then it will query the service_status table synchronously to check whether there was any service being provided by that server. If any such service is found then the server which detected other server’s status will take the responsibility of providing the service which the in-active server was providing. It will also update the service_status table and service_registry table to synchronize the state and communicate other servers that responsibility is now being handled by it. For example, server A goes down and server C detects first that server A is down then server C will take the responsibility of providing (folder monitor) service. Please refer below:-

400px-3.1_FailoverMechanism_10004

server_registry

ID IP_ADDRESS PORT IS_ACTIVE APP_CONTEXT
1 A 8080 0 /dcma
2 B 8080 1 /dcma
3 C 8080 1 /dcma

service_status

ID SERVER_REGISTRY_ID SERVICE_TYPE
1 3 (for C) FOLDER_MONITOR

When other servers detect that one of the servers has become in-active, they will also query the service_status table to check whether any service was being handled by the in-active server. But if they found that there was no service for which in-active server was responsible then they will continue to keep track for further failures. If the server that went down starts again then it will also wait for other servers to go down and this process will continue.

Failover mechanism will work the same way for all the other services under it.

Conclusion

By following the described approach Ephesoft application will ensure high-availability for different services.

Was this article helpful to you?

wikiadmin