Skip to main content
Loading
Version: Graph 2.4.0

Indexing

Overviewโ€‹

This page describes how to create indexes. Indexes can make graph database queries faster and more efficient. You can create vertex property and label indexes either with the Gremlin call API or with configuration options. Configuration options may be specified either with a properties file or with command-line options.

Impact of indexes on traversalsโ€‹

A vertex property index affects only the first step of a traversal. Subsequent steps are not affected. However, if a traversal's initial steps involve both an indexed property and a non-indexed property, AGS reorders the steps automatically to perform the indexed property step first to obtain its benefit.

For maximum benefit, index the vertex properties that a query can use to narrow the dataset down to the fewest vertices possible and start the traversal there. Properties that tend to have distinct values and a low level of duplication throughout the dataset are best to index.

Manage indexesโ€‹

Aerospike Graph Service (AGS) supports secondary index management using the Gremlin call API. You can create and drop secondary indexes, as well as get index status information.

Create a secondary indexโ€‹

To create a secondary index on a vertex property, use the following command in Gremlin.

g.call("aerospike.graph.admin.index.create").
with("element_type", "vertex").
with("property_key", "<key>").next()
  • The property_key element must be the name of the property you want to index.
  • You can index any user-defined property or the ~label field. The ~id field is indexed automatically.

Create index examplesโ€‹

  1. In the following example, a graph contains a user-defined vertex property called name. The following command creates a secondary index on the name property:
g.call("aerospike.graph.admin.index.create").
with("element_type", "vertex").
with("property_key", "name").next()

Expected output:

Vertex index creation of property key 'name' in progress.
  1. The following example creates a secondary index on the vertex label:
g.call("aerospike.graph.admin.index.create").
with("element_type", "vertex").
with("property_key", "~label").next()

Index creation on a property key which already has an index returns an error.

Drop a secondary indexโ€‹

To drop an existing secondary index, use the following command in the Gremlin console:

g.call("aerospike.graph.admin.index.drop").
with("element_type", "vertex").
with("property_key", "<key>").next()
  • The property_key element must be the name of the property with the index you want to drop.
note

When you drop an index, any query which would have used that index is briefly unavailable while AGS rebuilds its index list.

Index drop examplesโ€‹

  1. In the following example, a graph contains a user-defined vertex property called name with a secondary index. The following command drops the secondary index on the name property:
g.call("aerospike.graph.admin.index.drop").
with("element_type", "vertex").
with("property_key", "name").next()

Expected output:

Vertex index of property key 'name' dropped."
  1. The following example drops a secondary index on the vertex label:
g.call("aerospike.graph.admin.index.drop").
with("element_type", "vertex").
with("property_key", "~label").next()

Index statusโ€‹

To get the status of a secondary index on a vertex property, use the following command in the Gremlin console:

g.call("aerospike.graph.admin.index.status").
with("element_type", "vertex").
with("property_key", "<key>").next()
  • The property_key element must be the name of the indexed property to get the status of.

Expected output:

  • percent_complete: Percentage from 0-100 of the index to completion. Returns 100 when the index is complete and ready to use.

  • total_entries: Total number of entries in the index across all Aerospike nodes.

  • total_used_bytes: Total RAM usage in bytes of index across all Aerospike nodes.

  • load_time: Time in milliseconds to create index.

List indexed property keysโ€‹

To get a list of all property keys with existing secondary indexes, use the following command in the Gremlin console:

g.call("aerospike.graph.admin.index.list").next()

If successful, AGS returns a list of indexed property keys.

Cardinalityโ€‹

In general, indexes with higher cardinality are more effective. To see examples of how indexes affect graph queries, see the Impact of indexes on traversals section of this page.

To get the cardinality of existing secondary indexes, use the following command at the Gremlin console:

g.call("aerospike.graph.admin.index.cardinality").next()

If successful, AGS returns a list of indexed property keys and the cardinality of each one. The cardinality of an index is the number of unique entries in that index.

Index creation with configuration optionsโ€‹

To create an index on a vertex property or label in AGS with configuration options, you can either:

  • Edit the properties file you use to start the AGS Docker image.
  • Use the -e flag to specify configuration options as command-line arguments in the Docker command you use to start the AGS Docker image.

Vertex property index creationโ€‹

To create an index on a vertex property, add the configuration parameter aerospike.graph.index.vertex.properties to the file and assign it a comma-separated list of vertex property keys to index. In the following example, vertex properties property_key1 and property_key2 are specified for indexing:

aerospike.graph.index.vertex.properties=property_key1,property_key2

Vertex property indexes are taken as a union from all AGS instances. This means that if one AGS instance has an index on vertex property property_key1 and another has an index on vertex property property_key2, AGS creates indexes for both properties. If an index is created on any AGS instance in a cluster, the other instances detect it and leverage it as well.

When a vertex property index is first created on a dataset, the time it takes to create the index is proportional to the amount of data in the Aerospike database. Larger amounts of data take longer to index. You can create a property index either before or after populating the database with data, but before is faster.

note

Vertex property indexes have a value limit of 2k bytes. Any property values which are greater than 2k bytes cannot be indexed.

Vertex label index creationโ€‹

To create indexes on all vertex labels, add the configuration parameter aerospike.graph.index.vertex.label.enabled to the properties file and set it to true.

aerospike.graph.index.vertex.label.enabled=true

If you create a label index on one AGS instance, all the other AGS instances in the cluster detect the change and use the same index.

Vertex label index creation exampleโ€‹

Consider an Aerospike Graph database with the following schema:

VERTICES:
label: "Person"
{
"name": "John Doe",
"age": 30,
"address": "123 Main St",
"city": "San Francisco",
"state": "CA",
"country": "USA",
"zip": "94105"
}

EDGES:
label: "knows"
{
}

To create an index on the name and age fields, as well as a vertex label index, add the following line to the properties file:

aerospike.graph.index.vertex.properties=name,age
aerospike.graph.index.vertex.label.enabled=true

Example traversalsโ€‹

The following traversals use the schema and indexes shown in the index example.

Single indexed vertex propertyโ€‹

This traversal uses the index on the name field:

        ______ The first step uses the index, so it is fast and efficient.
|
| _______________ Subsequent steps do not use
| | | | the index because they are not at the
| | | | start of the traversal.
v V v v
g.V().has("name", "Lyndon").out().in().has("name", "Simon").toList()

Single non-indexed vertex propertyโ€‹

This traversal does not use an index and may perform badly.

        ______ This step does not use an index and must scan the entire database
| for the `country` property.
|
| __________ These steps do not use the index because they
| | | are not at the start of the traversal.
v V v
g.V().has("country", "USA").out().has("name", "Lyndon").toList()

One indexed and one unindexed vertex propertyโ€‹

This traversal performs two has steps, one on the unindexed country field and one on the indexed name field. AGS compounds the two has steps together and runs the indexed one first, improving the traversal's performance.

 g.V().has("country", "USA").has("name", "Lyndon").out().has("name", "Simon").toList()

Two indexed vertex propertiesโ€‹

This traversal performs two initial has steps, both on indexed properties. AGS uses cardinality metadata from the Aerospike database to determine which step to run first for maximum efficiency.

note

Cardinality metadata in Aerospike is updated once per hour, so index efficiency information may not always be current.

g.V().has("age", 29).has("name", "Lyndon").out().has("name", "Simon").toList()

Label index and indexed vertex propertyโ€‹

This traversal's first two steps are a hasLabel step which uses the instance's label index, and a has step which uses the name property index. AGS performs the has step first, because property indexes usually have higher cardinality than label indexes.

g.V().hasLabel("Person").has("name", "Lyndon").out().has("name", "Simon").toList()

Label index and unindexed vertex propertyโ€‹

This traversal begins with a hasLabel step which uses the instance's label index, and a has step which involves the unindexed country property. AGS performs the hasLabel step first and uses the index, but the country step may be slow and inefficient.

g.V().hasLabel("Person").has("country", "USA").out().has("name", "Simon").toList()