Best practices for Aerospike and Linux
This page describes stability and performance best practices for Aerospike and the Linux operating system.
Overview
When Aerospike Database starts, it verifies certain best practices and logs a warning for each violation it finds.
Best practices checks
-
For production environments, set
enforce-best-practices
totrue
so that the server shuts down if any best practices are violated during startup. -
When
enforce-best-practices
is set tofalse
, you can still monitor violations with thefailed_best_practices
Boolean statistic, or thebest-practices
info command. -
The
failed_best_practices
statistic reportstrue
if any best practices are violated during startup. Thebest-practices
info command returns the list of best practices that failed.
Best practices checked at startup
Aerospike Database best practices
The best practices listed in this section are specific to the Aerospike Database.
service-threads
The recommended value depends on the configuration of the namespaces in the aerospike.conf
file.
The service-threads
best practice is checked at server startup.
- We suggest and default to 5 per CPU/vCPU in the following configuration. If any namespace has:
storage-engine
set todevice
anddata-in-memory
is set tofalse
ordata-in-memory
isfalse
andcommit-to-device
istrue
- then the recommended value for
service-threads
is at least 3 per CPU/vCPU.
- Otherwise:
storage-engine
is either set topmem
ormemory
orstorage-engine
isdevice
withdata-in-memory
set totrue
andcommit-to-device
set tofalse
- then the recommended and suggested value for
service-threads
is at least 1 per CPU/vCPU which is also the default for such configurations.
indexes-memory-budget
We recommend that the cumulative sum of the memory-size
configuration not exceed the total memory on the machine.
The indexes-memory-budget
best practice is checked at server startup.
Namespace device size
All the namespace storage devices should be the same size within an 8 MiB range of tolerance.
The namespace device size best practice is checked at server startup.
Initialize new cluster nodes with SMD
When adding a new node to an existing cluster, it is a best practice to initialize it by copying the cluster’s shared metadata (SMD) files from an active cluster node.
See Directory structure - Run time directories for more information about the SMD directory.
Linux best practices
The best practices listed in this section are specific to the Linux operating system.
All-Flash deployment
In an All-Flash deployment, the following kernel parameters are required.
enforce-best-practices
verifies that these kernel parameters are at least expected values.
The All-Flash deployment best practice is checked at server startup.
/proc/sys/vm/dirty_bytes = 16777216/proc/sys/vm/dirty_background_bytes = 1/proc/sys/vm/dirty_expire_centisecs = 1/proc/sys/vm/dirty_writeback_centisecs = 10
- When running as non-root, you must set these values before running the Aerospike server.
- When running as root, the server configures them automatically.
Either way, the node will not start if these parameters can’t be correctly set manually or automatically by the server.
RAM reserved for Linux operating system resources
To help prevent out-of-memory issues with host hardware, keep 10-15% of total physical memory reserved for Linux system resources. The Ram reserved for Linux operating system resources best practice is checked at server startup.
The following may influence memory usage:
- Overhead from the Linux OS and services.
- Overhead caused by memory fragmentation.
- Overhead from primary and secondary Aerospike indexes.
- Namespace data for in-memory namespaces. For more information, see Capacity planning.
- Overhead from cache and queue-related configurations, including
max-write-cache
(per device) andpost-write-cache
(per device). See Block size and cache size for more information. - Overhead from the Aerospike process.
min_free_kbytes
Themin_free_kbytes
kernel parameter controls how much memory to keep free from filesystem caches.
Normally, the kernel occupies almost all free RAM with
filesystem caches and frees up memory for allocation by processes as required. As
Aerospike performs large allocations in shared memory (1GB chunks), the default
kernel value may result in an unexpected OOM (out-of-memory kill).
The min_free_kbytes
best practice is checked at server startup.
We recommend that you configure the parameter to a minimum of 1.1GB, preferably 1.25GB if using cloud vendor drivers as these can make large allocations. This ensures that Linux always keeps enough memory available and free for large allocations.
-
Check the parameter value.
cat /proc/sys/vm/min_free_kbytes -
If the value is lower, adjust it accordingly to the running kernel and persist across reboots.
echo 3 > /proc/sys/vm/drop_cachesecho 1310720 > /proc/sys/vm/min_free_kbytesecho "vm.min_free_kbytes=1310720" >> /etc/sysctl.conf
Swappiness
For low-latency operations, using swap to any extent drastically slows down
performance. We recommend that you disable swap with swapoff -a
and remove the
swap partition from /etc/fstab
.
If that’s not possible for operational reasons, set the swappiness to 0:
echo 0 > /proc/sys/vm/swappinessecho "vm.swappiness=0" >> /etc/sysctl.conf
The swappiness best practice is checked at server startup.
THP - Transparent Huge Pages
The best practices startup check permits thp-enabled
and thp-defrag
to be set to
either madvise
or never
.
Aerospike recommends disabling Transparent Huge Pages (THP) before the Aerospike service starts. While the Linux kernel uses THP to improve overall system responsiveness and allocation speed, it can be counterproductive for high-throughput and low-latency databases, which perform multiple small allocations. THP can cause the system to run out of RAM, with similar symptoms to a memory leak. Another issue is latency caused by THP defragmentation page locking.
Zone reclaim mode
For NUMA architectures,zone_reclaim_mode
causes aggressive reclaims and memory scans when enabled.
We recommend that you disable zone_reclaim_mode
by setting /proc/sys/vm/zone_reclaim_mode
to 0
.
The zone_reclaim_mode
best
practice is checked at server startup.
NVMe partitioning
NVMe devices are normally capable of 4 simultaneous I/O operations. Due to their connection design, these occupy 4 PCIe I/O lanes. On raw devices, Aerospike suggests that you partition each NVMe device used to at least 4 partitions. This allows 4 write threads to operate in Aerospike and greatly improves the disk speed.
If using a single partition with Aerospike as raw device, iostat
may show 100% disk utilization (%util),
while the await
operation queuing statistic may be showing no queueing (await
<1 means no queueing is happening). This indicates that the disk itself can do
more, while the PCIe lanes that are used are already saturated.
The NVMe partitioning best practice is checked at server startup.
See Partition your flash devices for details on device partitioning.
vm.max_map_count
If you use Kubernetes or Docker, we recommend that you raise the max_map_count
parameter, which controls the maximum number of memory map commands that can be
performed by a process. If max_map_count
is too low, it may result in memory
allocation issues during normal operation.
To change this parameter:
echo "vm.max_map_count=262144" >> /etc/sysctl.confecho 262144 > /proc/sys/vm/max_map_count
The vm.max_map_count
best practice is checked at server startup.
Containers - networks
When using Kubernetes or Docker, the default behavior is to use EXPOSE
and
PUBLISH
features to publish ports from a container through the host to the
outside world. This causes the Docker process to monitor a given port on
the host and forward all packets to the container itself. This is highly
inefficient and may cause latencies, packet drops and other crashes within the
containers under heavy loads.
If using containers, we recoomend that you configure those containers to either:
- Use bridged networking, rather than Docker-only NAT, or
- Use iptables to forward packets to the NAT network Aerospike containers, rather than the Docker EXPOSE port feature.
- If using a Docker container, run it with the
--net=host
flag to inherit /proc/sys/net/core/*mem_max files. Without that flag, maximums cannot be modified from within that environment.
The containers - network best practice is checked at server startup.
See the Docker configuration manuals for details.
Maximum open file limits
Aerospike clients perform dynamic connections to the database nodes as required. This may result in many active connections. These connections, on a Linux system, hold a file descriptor and are treated as open files.
The Aerospike configuration parameter
proto-fd-max
specifies the maximum number of allowed client connections. The Aerospike server does
not start if proto-fd-max
is higher than the Linux system’s maximum open files
configuration for the process.
After installing Aerospike, verify that the maximum open files for the asd
process
is configured to have a higher maximum open file value than proto-fd-max
to
allow for fabric and heartbeat connections as well as any open files.
The maximum open file limits best practice is checked at server startup.
Non-systemd
Edit /etc/init.d/aerospike.conf
and change the value of the following
line.
ulimit -n 100000
The non-systemd best practice is checked at server startup.
systemd
The systemd best practices is checked at server startup.
-
Create an
override.conf
file to control this.cat <<EOF > /etc/systemd/system/aerospike.service.d/override.conf[Service]LimitNOFILE=<MAX NUMBER OF FILE DESCRIPTORS>EOF -
Reload the systemd daemon.
systemctl daemon-reload -
Restart the Aerospike server to apply the new value.
-
(Optional) You can apply this change dynamically to the
asd
process ifprlimit
is available:prlimit --pid $(pgrep asd) --nofile=200000
somaxconn
Limit of socket listen() backlog, known in userspace as SOMAXCONN. Defaults to 4096. (Prior to Linux kernel 5.4, the default was 128. See tcp_max_syn_backlog
for additional tuning for TCP sockets.
echo 4096 > /proc/sys/net/core/somaxconn
The somaxconn
best practive is checked at startup.
rmem-max
The maximum receive socket buffer size in bytes.
echo 15728640 > /proc/sys/net/core/rmem_max
The rmem-max
best practive is checked at startup only in EE.
wmem-max
The maximum send socket buffer size in bytes.
echo 5242880 > /proc/sys/net/core/wmem_max
The wmem-max
best practive is checked at startup only in EE.
shmall
The sum of all shared memory segments on the whole system.
The shmmall
best practive is checked at startup only in EE.
shmmax
The maximum size of a single shared memory segment.
The shmmax
best practive is checked at startup only in EE.