Skip to main content
Loading

Configure Aerospike Database for AVS

Overview

This page describes how to configure the Aerospike Database (ASDB) for use with Aerospike Vector Search (AVS).

note

AVS requires ASDB version 7.x. For more information, see the FAQ.

Configure the namespace for index metadata

AVS relies on a small namespace for storing metadata about your indexes. The metadata includes the name of your index, details about the distance calculation, and dimensionality. The name of this namespace in Aerospike is avs-meta and must have NSUP set. We recommend the following configuration in your aerospike.conf files.

namespace avs-meta {
replication-factor 2
memory-size 5G
storage-engine memory
nsup-period 10
}
note

See Configuring AVS for details about changing the metadata namespace name.

Specify a namespace for vector data

You need at least one namespace for storing vector records. For example, the default aerospike.conf file includes the test namespace. To use the test namespace for storing vector data, specify it when inserting vectors.

Create a namespace for index storage

To create a unique namespace for index records, add the namespace to your aerospike.conf files and restart the Aerospike process on each Aerospike Database node.

With a unique namespace for index records, you can enable different storage features independently across your data and index. The following example configures replication on record data but not on index data.

namespace avs-index {
replication-factor 1
nsup-period 60

storage-engine device {
file /opt/aerospike/data/index.dat
filesize 8G
}
}

namespace avs-data {
replication-factor 2
nsup-period 60

storage-engine device {
file /opt/aerospike/data/data.dat
filesize 8G
}
}

To add a set, include the set when creating your index. This creates the set dynamically. No extra configuration to the Aerospike Database is required to use a set.

Monitor index storage using sets

With a unique set for your index data, you can better monitor the storage used specifically by the index. Sets are created dynamically within the Aerospike Database and require no intervention by an administrator.

tip

By default, the Python client creates a set based on the index name.

Estimate total storage requirements

To estimate total storage requirements, you must know the following:

  • vectors - number of vectors
  • dim - number of dimensions of your vectors
  • metadata - size in bytes of additional data stored with your vector record
note

The following calculations assume the default index configuration settings.

  1. Calculate the number of HNSW index entries, which can be estimated using the number of vectors:

    graph-nodes = vectors * 1.06
  2. To calculate the size of an index in bytes, you need the number of dimensions of your vectors:

    size-of-vector = dim * 4                         // each dimension is stored as a 4 byte float
    size-of-index-record = 500 + size-of-vector // aprox. number of bytes for storing HNSW neighborhood info
    total-index-size = graph-nodes * size-of-index-records
  3. Consider additional metadata and vector storage on each of your vector records:

    size-of-vector = dim * 4                          // each dimension is stored as a 4 byte float
    size-of-vector-record = 2048 // typical metadata size is 1-2k
    total-data-size = graph-nodes * size-of-index-records
  4. Add index and data sizes to determine your total storage needs:

    total-unique-data = total-index-size + total-data-size

The following table provides some example values, and you can use the Vector Sizing spreadsheet to calculate your own index sizings for the total the storage requirements.

DescriptionVectorsDimensionsMetadata (bytes)Index Size (TB)Total Data (TB)
1 billion, low-dimension1,000,000,00012820481.03.3
1 billion, medium-dimension1,000,000,00076820483.58.1
1 billion, high-dimension1,000,000,0003072204812.425.4
Total3,000,000,00016.936.8