Skip to main content
Loading

Network heartbeat configuration

This page describes how to configure the network heartbeat with Aerospike's heartbeat protocols.

Overviewโ€‹

Aerospike's heartbeat protocols are responsible for maintaining cluster integrity. There are two supported heartbeat modes:

  • Multicast (UDP)
  • Mesh (TCP)

Cloud Considerationโ€‹

  1. Lack of Multicast Support: Cloud providers, such as, Amazon and Google Compute Engine do not support multicast networking. For these providers, use Mesh heartbeats, which uses point-to-point TCP connections for heartbeats.

  2. Network Variability: Often, the network latency on cloud platforms is not consistent over time. This can cause problems with heartbeat packet delivery times. For these providers, we recommend setting the heartbeat interval to 150 and the heartbeat timeout to 20.

  3. Instance Pauses: At times, your cloud instance could be paused by the cloud provider for short durations. For example, Google Compute Engine (GCE) employs live migration which could pause your instance for short time durations for maintenance or software updates. The short pauses might cause the other instances in the cluster to consider this instance as "dead".

Multicast Heartbeatโ€‹

We recommend using the multicast heartbeat protocol when available. For various reasons your network may not support multicast. See our troubleshooting guide for information on how to validate multicast in your environment.

Configuration Stepsโ€‹

In the heartbeat sub-stanza:

  1. Set mode to multicast.
  2. Set multicast-group to a valid multicast address (239.0.0.0-239.255.255.255).
  3. (Optional) Set address to the IP of the interface intended for intracluster communication. This setting also controls the interface fabric will use. Needed when isolating intra-cluster traffic to a particular network interface.
  4. Set interval and timeout
    • interval (recommended: 150) controls how often to send a heartbeat packet.
    • timeout (recommended: 10) controls the number of intervals after which a node is considered to be missing by rest of nodes in the cluster if they haven't received the heartbeat from missing node.
    • With the default settings, a node will be aware of another node leaving the cluster within 1.5 seconds.

Exampleโ€‹

...
heartbeat {
mode multicast # Send heartbeats using Multicast
multicast-group 239.1.99.2 # multicast address
port 9918 # multicast port
address 192.168.1.100 # (Optional) (Default any) IP of the NIC to
# use to send out heartbeat and bind
# fabric ports
interval 150 # Number of milliseconds between heartbeats
timeout 10 # Number of heartbeat intervals to wait
# before timing out a node
}
...

Mesh (Unicast) Heartbeatโ€‹

Mesh uses TCP point to point connections for heartbeats. Each node in the cluster maintains a heartbeat connection to all other nodes, resulting in many connections required for mesh. For this reason, we recommend using multicast heartbeat protocol when available.

Configuration stepsโ€‹

In the heartbeat sub-stanza:

  1. Set mode to mesh.
  2. (Optional) Set address to the IP of the local interface intended for intracluster communication. This setting also controls the interface fabric will use. Needed when isolating intra-cluster traffic to a particular network interface.
  3. Set mesh-seed-address-port to be the IP address and heartbeat port of a node in the cluster.
  4. Set interval and timeout
    • interval (recommended: 150) controls how often to send a heartbeat packet.
    • timeout (recommended: 10) controls the number of intervals after which a node is considered to be missing by the rest of the nodes in the cluster if they haven't received the heartbeat from the missing node.
    • With the recommended settings, a node will be aware of another node leaving the cluster within 1.5 seconds.

Exampleโ€‹

...
heartbeat {
mode mesh # Send heartbeats using Mesh (Unicast) protocol
address 192.168.1.100 # (Optional) (Default: any) IP of the NIC on
# which this node is listening to heartbeat
port 3002 # port on which this node is listening to
# heartbeat
mesh-seed-address-port 192.168.1.100 3002 # IP address for seed node in the cluster
# This IP happens to be the local node
mesh-seed-address-port 192.168.1.101 3002 # IP address for seed node in the cluster
mesh-seed-address-port 192.168.1.102 3002 # IP address for seed node in the cluster
mesh-seed-address-port 192.168.1.103 3002 # IP address for seed node in the cluster

interval 150 # Number of milliseconds between heartbeats
timeout 10 # Number of heartbeat intervals to wait before
# timing out a node
}
...

Where to Next?โ€‹