Page tree
Skip to end of metadata
Go to start of metadata


Cluster service allows establishing communication of several applications, which are placed on different boxes. Each such application, which uses cluster service, are cluster node. Node can be run in leader or backup mode. Leader mode means that the node is active, accepts network connection and performs some useful work. Backup nodes are used for preparing fast replacing if the leader node becomes unavailable. In such case, backup node could be switched to leader mode and it could provide the same service.

Cluster service is distributed service and it allows to automatically find its nodes within network and control presence no more than one leader node.

ClusterManager interface manages interaction between application and cluster. In addition, it allows subscription for current node state and for cluster state.

With ClusterManager it’s possible to subscribe for 2 groups of events: * LocalNodeLeaderListener – notification about leader events * LocalNodeBackupListener – notification about backup event

An application can use both listeners at the same time or the only one that most fit the current goal. For example, if an active node should have an active server, you have to implement only LocalNodeLeaderListener and put start and stop server’s calls into its methods.

To make it easier, we also provide LocalNodeLeaderListenerAdaptor and LocalNodeBackupListenerAdaptor. With their help you should only override necessary methods.

The default implementation of the cluster manager is based on Hazelcast. We use Hazelcast as a distributed cache with a cluster state. Hezelcast also helps to resolve nodes within a network.

Hazelcast supports several different transports including multicast and TCP. The default configuration uses multicast so you must have multicast enabled on your network for this to work or update cluster.xml configuration file.

Cluster implementation also can configure the quorum for automatic leader election.

Current implementation allows you to automatically select the leader at the cluster init only (when there was no leader at all). If, for some reason, the leader disappears from the cluster - new one will need to be select in the manual mode.

Quick setup

Cluster manager has already built-in default settings. This allows you to start the Cluster service with the minimum effort.

public class ClusterSample {
    public static void main(String[] args) {
        ClusterManager clusterManager = new HazelcastClusterManager();                                       //(1)
        clusterManager.addLocalNodeLeaderListener(new LocalNodeLeaderListenerAdaptor() {

            public void onGranted() {                                                                        //(2)
                System.out.println("This node was elected as a leader");

            public void onRevoked() {                                                                        //(3)
                System.out.println("This node isn't leader any more");

        clusterManager.join();                                                                               //(4)
        clusterManager.electLeader(clusterManager.localNode().id());                                         //(5)

        //pause 1 sec
        try {
        } catch (InterruptedException e) {

        //print nodes
        clusterManager.nodes().stream().forEach(System.out::println);                                        //(6)   
        clusterManager.leave();                                                                              //(7)
  1. Create an instance of ClusterManager with default configuration.
  2. This method will be called when node become a leader.
  3. This method will be called when leader services should be stopped.
  4. Join note to the cluster.
  5. Assign this node as a cluster leader
  6. Pring all node in this cluster
  7. leave the cluster

In this case, the cluster will use UDP multicast address for communication. The size of the quorum for the default cluster is equal to 2 (the leader will be selected automatically if there are 2 or more nodes in the cluster).


Hazelcast configuration

The current implementation uses Hazelcast for communication. You can find detailed Hazelcast configuration description on its site: Hazelcast Configuration.

If you want to override default configuration you can: * provide a file called cluster.xml with Hazelcast configuration on your classpath; * build com.hazelcast.config.Config manually and pass it into HazelcastClusterManager:

Config hazelcastConfig = new Config();
// Now set some stuff on the config (omitted)
ClusterManager mgr = new HazelcastClusterManager(hazelcastConfig);

You can specify a name for the current node with instanceName option. Otherwise, a unique name will be assigned automatically.

If the quorum size is not specified in the configuration, then the first launched node will be selected as a leader node.

Cluster service options

In addition to Hazelcast settings in HazelcastClusterManager can be set:

Also, additional properties can be placed in Advanced Configuration Properties. You can find a detailed description in Hazelcast documentation.

  • com.epam.fej.cluster.reelectOnLeaderFailure - the property allows the running of the leader election process if the previous leader has gone. By default this property is not specified, that means it has the default true value and new leader will be elected.

Before the leader has been re-elected, the cluster must contain greater or equal members count, than specified at quorum size property

Cluster troubleshooting

If the default multicast configuration is not working here are some common causes:

Multicast not enabled on the machine

It is quite common in particular on OSX machines for multicast to be disabled by default. Please google for the information on how to enable this.

Using wrong network interface

If you have more than one network interface on your machine (and this can also be the case if you are running VPN software on your machine), then Hazelcast may be using the wrong one.

To tell Hazelcast to use a specific interface you can provide the IP address of the interface in the interfaces element of the configuration. Make sure you set the enabled attribute to true. For example:

<interfaces enabled="true">

When multicast is not available

In some cases, you may not be able to use multicast as it might not be available in your environment. In that case, you should configure another transport, e.g. TCP to use TCP sockets, or AWS when running on Amazon EC2.

For more information on available Hazelcast transports and how to configure them please consult the Hazelcast documentation.

Cluster service lifecycle

Service uses exchanging of events between nodes to manage the cluster and notify nodes about state changes.

Joining a new node to the cluster

After calling the join() method, node subscribes to cluster events. The rest of the cluster nodes also will receive notification about new node in the cluster. On the node, which is a leader at this moment, LocalNodeLeaderListener.backupAdded() method will be called.

If a leader is present in the cluster, then the current node will be started in backup mode. If the leader is absent, then it may be initiated by the leader election procedure.

Automatic election of new leader

The procedure of automatic leader selection is started at the cluster init.

Each node at start receives information about available nodes in the cluster and decides whether or not currently be elected a leader. This decision is based on the presence or absence of a quorum (see Hazelcast configuration). It is considered that the cluster has a quorum (and the leader can be chosen) if

 S/2+1 >= Q

where S – the number of nodes in the cluster at this moment and Q – the configured quorum size.

If the quorum size is not specified, it is considered that even one node is already compose a quorum.

Current implementation automatically chooses the leader in cluster, only in the case when the cluster did not have leader before.

The first time leader selection algorithm is:

  1. If the new node is decided that it is necessary to choose a leader, it sends LEADER_EVENT event to the cluster, and offers a new leader.

  2. Once the proposed new leader gets LEADER_EVENT event with its ID, it notifies the application about its new status by calling LocalNodeLeaderListener.onGranted() and sends LEADER_STARTED_EVENT event with their ID back to the cluster.

  3. When the rest of the cluster nodes receive the event LEADER_STARTED_EVENT, they set their state to BACKUP (mark itself as backup), and notify their applications about new status by calling LocalNodeBackupListener.onBackup().

You can check the presence of the leader in the cluster by using HazelcastClusterManager.hasLeader() method.

Appointment of the new leader

New leader can be assigned by calling HazelcastClusterManager.electLeader() method and passing the node ID. New leader can be assigned if cluster has a leader or if doesn’t have. In the first case, the previous leader will be deactivated.
The algorithm of assigning new leader is:

  1. Node, which calls HazelcastClusterManager.electLeader() method, sends to the cluster LEADER_EVENT event with new leader ID.

  2. When backup node receives LEADER EVENT event, it calls LocalNodeBackupListener.offBackup() method.

  3. A node, which was elected as a new leader, launches a timer and waits for finishing the old leader. This mechanism was introduced to minimize the possibility of simultaneous work of the two leaders in the cluster. Old leader needs some time to complete its processes. The timer is used to protect cluster from endless waiting in case if the old leader becomes inaccessible or crashes during switching the leader.

  4. Old leader receives the LEADER_EVENT event and notifies its application about the status change by calling LocalNodeLeaderListener.onRevoked(). After the successful completion of this call, it sends a LEADER_STOPPED_EVENT event to the cluster.

  5. When the new leader node receives an LEADER_STOPPED_EVENT event (or it doesn’t receive this event during timeoutLeaderShutdown period), it notifies the application about new status by calling LocalNodeLeaderListener.onGranted() and sends to the cluster LEADER_STARTED_EVENT event with its ID.

  6. When the rest of the cluster nodes receive LEADER_STARTED_EVENT event they change their state to BACKUP (mark theirself as backup) and notifies their applications about new status by calling LocalNodeBackupListener.onBackup().

Recall the leader

There is a way to recall the leader. After recalling all nodes will be in the backup state.
The algorithm of recalling the leader is:

  1. Node, which calls HazelcastClusterManager.recallLeader() method, sends to the cluster LEADER_RECALL_EVENT event with the leader ID.

  2. Current leader receives the LEADER_RECALL_EVENT event and notifies its application about the status change by calling LocalNodeLeaderListener.onRevoked(). After the successful completion of this call, it sends a LEADER_STOPPED_EVENT event to the cluster.

Automatic leader re-election

Since version 1.4.0 by default the new leader can be automatically elected when the existing leader is gone. Configuration options are described here.

  • No labels