Skip to main content
Loading

Use the Python client

This page describes how to create AI applications with the gRPC API and Python client provided with Aerospike Vector Search (AVS).

Overview

The AVS package includes a client class which is the entrypoint for all AVS operations. The package also provides a types module that contains classes necessary for interacting with the various client APIs.

  • The client performs database operations with vector data, RBAC admin functions, and record reading and writing.
  • The client supports Hierarchical Navigable Small World (HNSW) vector searches, so that users can find vectors similar to a given query vector within an index.

You can use all of the example code on this page with this interactive Jupyter notebook.

Prerequisites

  • Python 3.9 or later
  • pip 9.0.1 or later
  • A running AVS deployment (see Install AVS)

Set up AVS and Python

  1. Install the AVS package.

    pip install aerospike-vector-search
  2. Install the client.

    from aerospike_vector_search import Client, types
  3. Initialize a new client by providing one or more seed hosts for the client to connect to.

    # Admin client configuration
    # LISTENER_NAME corresponds to the AVS advertised_listener config.
    # https://aerospike.com/docs/vector/operate/configuration#advertised-listener
    # this is often needed when connection to AVS clusters in the cloud
    LISTENER_NAME = None
    # LOAD_BALANCED is True if the AVS cluster is load balanced
    # using a load balancer with AVS is best practice and even works
    # with a single node AVS cluster that is not load balanced
    LOAD_BALANCED = True

    client = Client(
    seeds=types.HostPort(host=AVS_HOST, port=AVS_PORT),
    listener_name=LISTENER_NAME,
    is_loadbalancer=LOAD_BALANCED,
    )

Index your data

To search across a set of vectors, create an index associated with those vectors. AVS uses an index to traverse the HNSW neighborhoods to perform queries. See Manage AVS indexes for details about creating an index.

  1. Use the following example to create an index.

    from aerospike_vector_search import AVSServerError

    # Index creation arguments
    # NAMESPACE is the namespace that the indexed data will be stored in
    NAMESPACE = "test"
    # INDEX_NAME is the name of the HNSW index to create
    INDEX_NAME = "basic_index"
    # VECTOR_FIELD is the Aerospike record bin that stores its vector data
    # The created index will use the data in this bin to perform nearest neighbor searches etc
    VECTOR_FIELD = "vector"
    # DIMENSIONS is the dimensionality of the vectors
    DIMENSIONS = 2

    try:
    print("creating index")
    client.index_create(
    namespace=NAMESPACE,
    name=INDEX_NAME,
    vector_field=VECTOR_FIELD,
    dimensions=DIMENSIONS,
    )
    except AVSServerError as e:
    print("failed creating index " + str(e) + ", it may already exist")
    pass
  2. Add vector entries.

    Vectors must exist in AVS before you can do a search. The following call writes 10 records to be indexed. To insert records, use the upsert method and specify the following values when writing a record:

    • namespace - Namespace in which the index exists.

    • key - Primary identifier for your record.

    • record data - Map of any data you want to associate with your vector.

    • setName (optional) - Set in which to place the record.

      # set_name is the Aerospike set to write the records to
      SET_NAME = "basic-set"

      print("inserting vectors")
      for i in range(10):
      key = "r" + str(i)
      client.upsert(
      # namespace must match the namespace of the Index
      namespace=NAMESPACE,
      set_name=SET_NAME,
      key=key,
      record_data={
      "url": f"http://host.com/data{i}",
      # record_data must include VECTOR_FIELD to be indexed
      "vector": [i * 1.0, i * 1.0],
      "map": {"a": "A", "inlist": [1, 2, 3]},
      "list": ["a", 1, "c", {"a": "A"}],
      },
      )
  3. Interact with an Index object.

    After creating an index, you can interact with it through an Index object.

    from aerospike_vector_search import Index

    # create an Index object to interact with the index
    index = client.index(namespace=NAMESPACE, name=INDEX_NAME)

    # get the status of the index
    print("index status: ", index.status())
  4. Wait for index construction.

    After inserting vectors into AVS, it takes some time to build the index. If the index is not complete, vector search results may be inaccurate. If you are running a batch job and want confirmation that index construction is complete, check the index status to ensure all pending vector records have been indexed with the following:

    # Wait for the index to finish indexing records
    def wait_for_indexing(index: Index):
    import time

    verticies = 0
    unmerged_recs = 0

    # Wait for the index to have verticies and no unmerged records
    while verticies == 0 or unmerged_recs > 0:
    status = index.status()

    verticies = status.index_healer_vertices_valid
    unmerged_recs = status.unmerged_record_count

    time.sleep(0.5)

    wait_for_indexing(index)
    print("indexing complete")
  5. Check what percentage of your index is merged with the get_percent_unmerged method.

    pct_unmerged = index.get_percent_unmerged()
    print("percent unmerged: %" + str(pct_unmerged))

    Waiting for the index to complete may provide more accurate search results.

  6. Check if a vector is indexed​.

    status = index.is_indexed(
    key=key,
    set_name=SET_NAME,
    )

    print("indexed: ", status)

Searching

After vectors have been indexed, you can begin searching them by providing a vector for search.

  1. Run your machine learning model on user input, and then perform a search using the generated embedding.

    print("querying")
    for i in range(10):
    print(" query " + str(i))
    results = index.vector_search(
    query=[i * 1.0, i * 1.0],
    limit=3,
    )
    for result in results:
    print(str(result.key.key) + " -> " + str(result.fields))
  2. Results are a list of nearest neighbors. Loop through the results from your entries to extract the relevant properties to use in your application.

    for result in results:
    print(str(result.key) + " -> " + str(result.bins))
    note

    To save on network traffic and CPU resources, the vector field is excluded by default.

  3. To retrieve the vector data, include it in the include_fields argument.

    print("querying")
    for i in range(10):
    print(" query " + str(i))
    results = index.vector_search(
    query=[i * 1.0, i * 1.0],
    include_fields=[VECTOR_FIELD, "url"]
    limit=3,
    )
    for result in results:
    print(str(result.key.key) + " -> " + str(result.fields))
  4. Read a record from AVS.

    key = "r0"

    result = client.get(
    namespace=NAMESPACE,
    key=key,
    set_name=SET_NAME,
    )

    print(str(result.key.key) + " -> " + str(result.fields))

AVS Python client using Asyncio

The aerospike-vector-search module provides an aio module with asynchronous clients that replace any client methods with coroutine methods. The asynchronous client are initialized in the same way as the synchronous clients. Simply add await in front of synchronous code to convert code examples:

from aerospike_vector_search.aio import Client as asyncClient

async_client = asyncClient(
seeds=types.HostPort(host=AVS_HOST, port=AVS_PORT),
listener_name=LISTENER_NAME,
is_loadbalancer=LOAD_BALANCED,
)

# Use await on client methods to await completion of the coroutine
results = await async_client.vector_search(
namespace=NAMESPACE,
index_name=INDEX_NAME,
query=[8.0, 8.0],
limit=3,
)

for result in results:
print(str(result.key.key) + " -> " + str(result.fields))

Close the clients

When you finish using the client and index objects, close the clients to release associated resources.

client.close()
async_client.close()

Read the documentation