Aerospike on Amazon EC2
Overviewโ
This page describes recommendations for deploying Aerospike on Amazon EC2. The AWS-related information provided here is given with the performance requirements of Aerospike in a production environment. For development purposes, the Aerospike Community Database works on any of the instances which have at least 256MB of RAM dedicated to the Aerospike server. Enterprise versions need at least 1GB of RAM.
Aerospike provides CloudFormation templates that are already configured with recommended settings. For details on how to quickly get a cluster up and running, refer to CloudFormation page.
Prerequisitesโ
Operating systemsโ
Amazon Linux 2023โ
Use the latest version of Amazon Linux 2023. We support other operating systems in AWS, but their performance may not be optimal.
Aerospike Database 6.4 and later uses the amzn2023
RPM. Amazon Linux 2023 is not RHEL 7 compatible.
Support for Amazon Linux 2 and RHEL 7 were removed in Database 7.0.
Virtualization typeโ
We recommend using Hardware Virtual Machine (HVM) based AMIs. In our benchmarks, we have seen an approximate 3x performance gain without any other tuning when using HVM instances instead of PV instances.
Network setupโ
As a prerequisite of using HVM and enhanced networking, we recommend you use a VPC based setup. You cannot use HVM AMIs and enhanced networking in Classic EC2 mode.
IP addressing on EC2-VPC platformโ
On the EC2-VPC platform, we recommend that you use a private IP address for Aerospike on AWS.
Aerospike clients can access a cluster using the AWS private IP addresses, while a private IP address cannot be reached from the internet.
A public IP address can be reached from the internet and is assigned to default-VPC instances by default. However, non-default-VPC instances must have public IP address assignment enabled. Public IP addresses are disassociated from an instance when it is stopped or an ENI or EIP are added to the instance.
An elastic IP address is a static public IP address that remains associated with an instance even when the instance is stopped and restarted.
Setup a mesh heartbeatโ
Use an AWS private IP to setup a mesh heartbeat rather than the public IP. Private IPs are allocated to each EC2 instance by default. You may also need to add a public IP to your instance if you need direct access to the instance. There are two ways to add a public IP to your instance:
- During instance launch: You can explicitly set the option to enable a public IP for your instance while launching the instance. You can also modify the behavior of your VPC subnet to enable it by default.
- Attach an Elastic IP after launching instance: An Elastic IP address is a static, public IP address designed for dynamic cloud computing. You can associate an Elastic IP address with any instance or network interface for your VPC.
Network interfaceโ
Each network interface on an Amazon Linux HVM instance can handle about 250K packets per second. If you need higher performance per instance, do one of the following:
Add More NIC/ENI You can add multiple (virtual) NICs to an instance with Elastic Network Interfaces (ENI). A single NIC peaks at around 250k TPS, bottlenecking on cores processing interrupts. Additional interfaces can process more packets per second on the same instance. Using ENIs with private IPs is free of cost in AWS.
Receive Packet Steering
noteRPS is only available in kernel version 2.6.35 and above.
Another simpler approach is to distribute IRQ over multiple cores using RPS
echo f > /sys/class/net/eth0/queues/rx-0/rps_cpus
With Aerospike, this eliminates the need to use multiple NICs/ENIs, making management easier, and result in similar TPS. A single NIC with RPS enabled can achieve up to 800K TPS with interrupts spread over 4 cores.
AWS has introduced Elastic Network Adapters, or ENAs, that supports Multi-Queue device interface and Receive-Size Steering on select instance types. This makes the above Receive Packet Steering and the addition of more NIC/ENIs redundant. Additional NIC/ENIs can still be beneficial in cases of XDR, Heartbeat and Fabric isolation. ENAs are only supported on select instance types.
Security Groupโ
Aerospike needs TCP ports 3000-3003 for intra-cluster communication. These ports need not be open to the rest of Internet.
If using XDR, port 3000 (or the info port for remote datacenters' aerospike) of destination datacenter should be reachable from the source datacenter.
Additionally, you will need a port for SSH access to your instances (default TCP port 22)
Redundancy using Availability Zoneโ
To add further redundancy to Aerospike in AWS using Availability Zone (AZ), you can set up one cluster across multiple different availability zones such that there is one set of data in each AZ by leveraging the Aerospike rack awareness feature.
Initializing EBS and Ephemeral disksโ
Initializing (formerly known as pre-warming) EBS volumes is only required for volumes that were restored from a snapshot. Blank EBS volumes do not require initialization.
Some ephemeral volumes also needs initializing. Consult this chart on which instance's volumes requires initialization.
The following command reads every block to initialize a volume.
sudo dd if=/dev/<deviceID> of=/dev/null bs=1M &
Swapping to device storageโ
When a raw device is used for storage, it must be either:
- Zeroed to instantiate as an empty device
or - In a state left by Aerospike
This is because on startup, Aerospike will scan the entire device to discover the state of the data on the device. If the device was used previously for another purpose, like file storage, the leftover data is essentially corrupt to Aerospike and will have undefined behavior when scanned.
Newly provisioned blank EBS volumes and all Ephemeral disks are already zeroed.
The following command will zero every block on a device.
sudo dd if=/dev/zero of=/dev/<deviceID> bs=1M &
i3 and i3en NVME SSD instances AMI versionโ
Unlike the m5d, r5d and the c5d, the i3 and i3en NVMe devices are not over provisioned. AWS recommends over provisioning the devices by 10%. We recommend over provisioning them to at least 20%. This will increase performance stability on write operations.
See SSD-based instance store volume I/O performance for more information about AWS recommendations.
Aerospike does not recommend using the i3en.xlarge and i3en.2xlarge instances as we have observed frequent disk issues with the devices on those type of instances.
Shadow device configurationโ
Some EC2 instance types have direct-attached SSDs called Instance Store Volumes, also known as ephemeral drives or volumes. These can be significantly faster than EBS volumes as EBS volumes are network attached. But AWS recommends to not rely on instance store volumes for valuable, long-term data, as these volumes are purged when the instance stops.
To take advantage of the fast direct-attached instance store SSDs, Aerospike provides the concept of shadow device configuration where all writes are also propagated to these shadow devices. To configure this, specify an additional device name in the storage engine block of the namespace configuration.
storage-engine device {
...
device /dev/sdb /dev/sdf
flush-size 256K # Flush size should be appropriate for the (NVMe-compatible) device
...
}
In this case, /dev/sdb
is the Instance store volume where all reads and writes will be directed to. The other shadow device /dev/sdf
is the EBS volume where only the writes are propagated. In this way, we can achieve the high speeds of direct-attached SSDs while not compromising on the durability guarantees of EBS volumes. The write throughput will still be limited by the EBS volume and hence this strategy will give good results when the percentage of writes is low.
For data-in-memory use cases with persistence, both SSD direct-attached devices alongside and EBS volumes are acceptable, but you should scrutinize that you have enough IOPS. As of Database 7.0, only writes are propagated to persistence devices the same as for shadow devices. Operations such as defragmentation and tomb-raiding happen in memory and do not read from the persistence device.
When partitioning EBS shadow devices, consider the following:
- Increasing partitions increases the number of write queues and defragmentation threads. Typically, Aerospike recommends 4 partitions for a 900GB drive (r5d/c5d/m5d) in the 12xl and 24xl sizes. For smaller 300GB or 400GB drives 3 partitions are recommended. For larger 1900GB drives on i3.2xl instances, 8 partitions are recommended.
- More partitions translate into a faster recovery time from shadow devices when the local ephemeral device is empty.
EBS snapshot backupsโ
EBS Snapshots are an excellent method to create and maintain backups. Snapshots maintain the state of an EBS volume at a particular point-in-time. Deploying an EBS volume based on a snapshot is essentially restoring the data from the time the snapshot was taken, into a new volume.
This is beneficial as a backup mechanism because:
- snapshots are taken extremely quickly
- snapshots are block-level consistent
- snapshots are portable
With Aerospike, snapshots guarantee data consistency on a per-volume basis.
Refer to Backup and Recovery page for details.
Placement groupsโ
Placement groups are logical grouping of instances within a single AWS Availability Zone. This provides the lowest latency and highest bandwidth for systems deployed within the same Placement Group. However, Placement Groups are not flexible and you may find yourself running into insufficient capacity
errors should you try to scale up your cluster later on. More details about Placement Groups can be found in Amazon's Documentation.
Aerospike does not recommend using Placement Groups in production due to these limitations.