Skip to main content
Loading

Best practices for Aerospike and Linux

This page describes stability and performance best practices for Aerospike and the Linux operating system.

Overviewโ€‹

When the Aerospike Database starts it verifies certain best practices and logs a warning for each violation it finds.

  • For production environments, set enforce-best-practices to true so that the server shuts down if any best practices are violated during startup.

  • When enforce-best-practices is set to false, you can still monitor violations with the failed_best_practices Boolean statistic, or the best-practices info command.

  • The failed_best_practices statistic reports true if any best practices are violated during startup. The best-practices info command returns the list of best practices that failed.

Best practices checked at startupโ€‹

The following list of best practices are checked at startup:

Aerospike database best practicesโ€‹

service-threadsโ€‹

The service-threads best practice is checked at server startup. The recommended value depends on the configuration of the namespaces in the aerospike.conf file:

indexes-memory-budgetโ€‹

The indexes-memory-budget best practice is checked at server startup.

note

memory-size is deprecated in Database 7.0. For more information, see Aerospike Database 7.0 Release Notes.

We recommend that the cumulative sum of the memory-size configuration not exceed the total memory on the machine.

Namespace device sizeโ€‹

All the namespace storage devices should be the same size, within an 8 MiB range of tolerance. This best practice is checked at server startup.

Linux best practicesโ€‹

All-Flash deploymentโ€‹

In an All-Flash deployment, the following kernel parameters are required. enforce-best-practices verifies that these kernel parameters are at least expected values.

/proc/sys/vm/dirty_bytes = 16777216
/proc/sys/vm/dirty_background_bytes = 1
/proc/sys/vm/dirty_expire_centisecs = 1
/proc/sys/vm/dirty_writeback_centisecs = 10

  • When running as non-root, you must set these values before running the Aerospike server.
  • When running as root, the server configures them automatically.

Either way, if these parameters can't be correctly set manually or automatically by the server, the node will not start.

RAM reserved for Linux operating system resourcesโ€‹

To help prevent out-of-memory issues with host hardware, keep 10-15% of total physical memory reserved for Linux system resources.

The following may influence memory usage:

  • Overhead from the Linux OS and services.
  • Overhead caused by memory fragmentation.
  • Overhead from Aerospike indexes (primary & secondary).
  • Namespace data for in-memory namespaces. For more information, see Capacity planning.
  • Overhead from cache and queue-related configurations, including max-write-cache (per device) and post-write-cache (per device). See Block size and cache size for more information.
  • Overhead from the Aerospike process.

min_free_kbytesโ€‹

The min_free_kbytes best practice is checked at server startup.

The min_free_kbytes kernel parameter controls how much memory to keep free from filesystem caches. Normally, the kernel occupies almost all free RAM with filesystem caches and frees up memory for allocation by processes as required. As Aerospike performs large allocations in shared memory (1GB chunks), the default kernel value may result in an unexpected OOM (out-of-memory kill).

We recommend that you configure the parameter to a minimum of 1.1GB, preferably 1.25GB if using cloud vendor drivers as these can make large allocations. This ensures that Linux always keeps enough memory available and free for large allocations.

tip

If min_free_kbytes is set too high, it is likely to cause an out-of-memory error in Aerospike.

  1. Check the parameter value.

    cat /proc/sys/vm/min_free_kbytes
  2. If the value is lower, adjust it accordingly to the running kernel and persist across reboots.

    echo 3 > /proc/sys/vm/drop_caches
    echo 1310720 > /proc/sys/vm/min_free_kbytes
    echo "vm.min_free_kbytes=1310720" >> /etc/sysctl.conf

swappinessโ€‹

The swappiness best practice is checked at server startup.

For low-latency operations, using swap to any extent drastically slows down performance. We recommend that you disable swap with swapoff -a and remove the swap partition from /etc/fstab.

If that's not possible for operational reasons, set the swappiness to 0:

echo 0 > /proc/sys/vm/swappiness
echo "vm.swappiness=0" >> /etc/sysctl.conf

THP - Transparent Huge Pagesโ€‹

The best practices startup check permits thp-enabled and thp-defrag to be set to either madvise or never.

Aerospike recommends disabling Transparent Huge Pages (THP) before the Aerospike service starts. While the Linux kernel uses THP to improve overall system responsiveness and allocation speed, it can be counterproductive for high-throughput and low-latency databases, , which perform multiple small allocations. THP can cause the system to run out of RAM, with similar symptoms to a memory leak. Another issue is latency caused by THP defragmentation page locking.

Zone reclaim modeโ€‹

The zone_reclaim_mode best practice is checked at server startup.

For NUMA architectures,zone_reclaim_mode causes aggressive reclaims and memory scans when enabled.

We recommend that you disable zone_reclaim_mode by setting /proc/sys/vm/zone_reclaim_mode to 0.

NVMe partitioningโ€‹

NVMe devices are normally capable of 4 simultaneous I/O operations. Due to their connection design, these occupy 4 PCIe I/O lanes. On raw devices, Aerospike suggests that you partition each NVMe device used to at least 4 partitions. This allows 4 write threads to operate in Aerospike and greatly improves the disk speed.

If using a single partition with Aerospike as raw device, iostat may show 100% disk utilization (%util), while the await operation queuing statistic may be showing no queueing (await <1 means no queueing is happening). This indicates that the disk itself can do more, while the PCIe lanes that are used are already saturated.

See Partition your flash devices for details on device partitioning.

vm.max_map_countโ€‹

If you use Kubernetes or Docker, we recommend that you raise the max_map_count parameter, which controls the maximum number of memory map operations that can be performed by a process. If max_map_count is low, it may result in memory allocation issues during normal operation.

To change this parameter:

echo "vm.max_map_count=262144" >> /etc/sysctl.conf
echo 262144 > /proc/sys/vm/max_map_count
note

You may need to restart the Docker daemon and all its containers for the changes to take effect after modifying max_map_count.

Containers - networksโ€‹

When using Kubernetes or Docker, the default behavior is to use EXPOSE and PUBLISH features to publish ports from a container through the host to the outside world. This causes the Docker process to listen on a given port on the host and forward all packets to the container itself. This is highly inefficient and may cause latencies, packet drops and other crashes within the containers under heavy loads.

If using containers, it is advisable to configure those containers to either:

  1. Use bridged networking, rather than Docker-only NAT.
  2. Use iptables to forward packets to the NAT network Aerospike containers, rather than the Docker EXPOSE port feature.
  3. If using a Docker container, run it with the --net=host flag to inherit /proc/sys/net/core/*mem_max files. Without that flag, maximums cannot be modifed from within that environment.

See the Docker configuration manuals for details.

Maximum open file limitsโ€‹

Aerospike clients perform dynamic connections to the database nodes as required. This may result in many active connections. These connections, on a Linux system, hold a file descriptor and are treated as open files.

The Aerospike configuration parameter proto-fd-max specifies the maximum number of allowed client connections. The Aerospike server does not start if proto-fd-max is higher than the Linux system's maximum open files configuration for the process.

After installing Aerospike, verify that the maximum open files for the asd process is configured to have a higher maximum open file value than proto-fd-max to allow for fabric and heartbeat connections as well as any open files.

Non-systemdโ€‹

Edit /etc/init.d/aerospike.conf and change the value of the following line.

ulimit -n 100000

systemdโ€‹

  1. Create an override.conf file to control this.

    cat <<EOF > /etc/systemd/system/aerospike.service.d/override.conf
    [Service]
    LimitNOFILE=<MAX NUMBER OF FILE DESCRIPTORS>
    EOF
  2. Reload the systemd daemon.

    systemctl daemon-reload
  3. Restart the Aerospike server to apply the new value.

  4. (Optional) You can apply this change dynamically to the asd process if prlimit is available:

    prlimit --pid $(pgrep asd) --nofile=200000

somaxconnโ€‹

Limit of socket listen() backlog, known in userspace as SOMAXCONN. Defaults to 4096. (Was 128 before Linux kernel 5.4) See also tcp_max_syn_backlog for additional tuning for TCP sockets.

echo 4096 > /proc/sys/net/core/somaxconn

rmem-maxโ€‹

The maximum receive socket buffer size in bytes. Checked at startup in EE only.

echo 15728640 > /proc/sys/net/core/rmem_max

wmem-maxโ€‹

The maximum send socket buffer size in bytes. Checked at startup in EE only.

echo 5242880 > /proc/sys/net/core/wmem_max

shmallโ€‹

The sum of all shared memory segments on the whole system. Checked at startup in EE only.

shmmaxโ€‹

The maximum size of a single shared memory segment. Checked at startup in EE only.