Policies

Aerospike allows reads and writes with great flexibility. With an Aerospike client policy, you can create read-modify-write patterns of optimistic concurrency, control the time to live, and choose whether to write a record based on the existence of the same record.

Commands of this type are very quick because information like the generation and time-to-live are stored in the primary key index. No extra work is needed to retrieve the data object.

These policies affect both database commands and client commands. Many policies are used to send the appropriate wire protocol commands to the server. Other policies (like maxRetries) affect client operation.

These policies exist with each client, and have slightly different APIs. After understanding which policies you need for your application, see the client-specific documentation for precise syntax.

Set Default Client Policies

You can create default client policies for each AerospikeClient instance. The following example demonstrates how to set policy defaults in the Java client. For language-specific examples, see the documentation for your client.

// Set client default policies.
ClientPolicy clientPolicy = new ClientPolicy();
clientPolicy.readPolicyDefault.replica = Replica.MASTER;
clientPolicy.readPolicyDefault.readModeAP = ReadModeAP.ONE;
clientPolicy.readPolicyDefault.socketTimeout = 100;
clientPolicy.readPolicyDefault.totalTimeout = 100;
clientPolicy.writePolicyDefault.commitLevel = CommitLevel.COMMIT_ALL;
clientPolicy.writePolicyDefault.socketTimeout = 500;
clientPolicy.writePolicyDefault.totalTimeout = 500;

// Connect to the cluster.
AerospikeClient client = new AerospikeClient(clientPolicy, new Host("seed1", 3000));

Set Per-Transaction Client Policies

To set policies on a per-transaction basis, pass the desired policy settings to the individual API call. For example, to perform writes with the master commit level:

// Make a copy of the client's default write policy.
WritePolicy policy = new WritePolicy(client.writePolicyDefault);

// Change commit level.
policy.commitLevel = CommitLevel.COMMIT_MASTER;

// Write record with modified write policy.
client.put(policy, key, bins);

The client policies provided at the client connection level can be overridden at the individual transaction level.

Policy Definitions

The following section describes the Aerospike Java client policies. Other clients use similar constructs.

Replica

Policy.replica specifies which replica the client reads from during a read. Write commands always go against the node that owns the master partition of the record.

note

When the client is reading from a strong consistency namespace the replica policy is ignored, unless SC read mode explicitly relaxes the strong consistency guarantees by selecting ALLOW_REPLICA or ALLOW_UNAVAILABLE.

SEQUENCE (default) — Read from the node that owns this record's master partition first. If a timeout occurs and retries are enabled, try a node that owns the record's replica partition.
PREFER_RACK — Read from a node on the same rack as the client first, which holds either the master or a replica partition for this record. If no nodes in the specified rack have a record partition the client switches to SEQUENCE.
MASTER — Read from the node that owns this record's master partition.
MASTER_PROLES — Distribute reads across nodes that owns the record's master and replica partitions (proles) in a round-robin fashion.
RANDOM — Distribute reads across all nodes in the cluster in round-robin fashion. Only recommended when the namespace replication-factor equals the number of nodes in the cluster.

tip

The performance impact of reading a hot key can be reduced along the order of the replication factor. Consider using Replica.MASTER_PROLES to distribute reads across master and replicas (proles).

AP Read Mode

For read commands against namespaces configured to operate in AP mode (high availability), Policy.ReadModeAP specifies how many partitions should be consulted when the cluster is undergoing data rebalancing, in order to determine the most recent copy of the record. This policy is ignored when the cluster is stable.

ONE (default) — Read a single replica. Might return a stale version of the record when the cluster is rebalancing.
ALL — Read from all the nodes holding the master and replica partitions of this record.

You can dynamically override this client policy using the namespace configuration parameter read-consistency-level-override.

SC Read Mode

Policy.ReadModeSC determines consistency for read commands against namespaces configured to operate with strong consistency (CP mode).

SESSION (default) — Ensures session consistency. The client sees an increasing sequence of record versions. The replica policy is ignored when this read mode is selected.
LINEARIZE — Ensures linearizability. All clients see only an increasing sequence of record versions. The replica policy is ignored when this read mode is selected.
ALLOW_REPLICA — The client may read from the master or any full (non-migrating) replica. Strong consistency guarantees that the reads either will be the latest copy or a valid ancestor of the latest copy. This read mode combines with the replica policy.
ALLOW_UNAVAILABLE — The client may read from the master or any full (non-migrating) replica or from unavailable partitions. Strong consistency guarantees are relaxed, and an increasing sequence of record versions is not guaranteed. This read mode combines with the replica policy.

Send Key

If enabled, send key (Policy.sendKey) sends the user-defined key in addition to hash digest on both reads and writes. If the key is sent on a write, the key will be stored with the record on the server and returned to the client on primary and secondary index queries.

Socket Timeout

Socket timeout (Policy.socketTimeout) specifies socket idle timeout in milliseconds when processing a database command.

If socketTimeout is not zero and the socket has been idle for at least socketTimeout, both maxRetries and totalTimeout are checked. If maxRetries and totalTimeout are not exceeded, the transaction is retried.

If both socketTimeout and totalTimeout are non-zero and socketTimeout > totalTimeout, then socketTimeout will be set to totalTimeout.

If socketTimeout is zero, there will be no socket idle limit.

Total Timeout

Total timeout (Policy.totalTimeout) specifies total transaction timeout in milliseconds.

The totalTimeout is tracked on the client and sent to the server along with the transaction in the wire protocol. The client will most likely timeout first, but the server also has the capability to timeout the transaction.

If totalTimeout is not zero and totalTimeout is reached before the transaction completes, the transaction will abort with a timeout exception.

note

Setting Policy.totalTimeout to 0 is equivalent to setting no timeout at all on the client side, with the result that the server uses its default timeout setting instead.

Max Retries

Max retries (Policy.maxRetries) specifies the maximum number of retries before aborting the current transaction. The initial attempt is not counted as a retry.

If maxRetries is exceeded, the transaction will abort with a timeout exception.

note

Database writes that are not idempotent, such as add(), should not be retried because the write operation may be performed multiple times if the client timed out previous transaction attempts. It's important to use a distinct WritePolicy for non-idempotent writes which sets maxRetries to zero.

Default for read: 2 (initial attempt + 2 retries = 3 attempts)

Default for write/query/scan: 0 (no retries)

Sleep Between Retries

Sleep between retries (Policy.sleepBetweenRetries) is the milliseconds to sleep between retries. Enter zero to skip sleep. This field is ignored when maxRetries is zero. This field is also ignored in async mode.

The sleep only occurs on connection errors and server timeouts which suggest a node is down and the cluster is reforming. The sleep does not occur when the client's socketTimeout expires.

Reads do not have to sleep when a node goes down because the cluster does not shut out reads during cluster reformation. The default for reads is zero.

The default for writes is also zero because writes are not retried by default. Writes can wait for the cluster to reform when a node goes down. Immediate write retries on node failure have been shown to consistently result in errors. If maxRetries is greater than zero on a write, then sleepBetweenRetries should be set high enough to allow the cluster to reform (>= 500ms).

Write Mode

WritePolicy.recordExistsAction specifies how to handle writes when the record already exists.

CREATE_ONLY — Insert the record, and fail if it already exists.
UPDATE_ONLY — Update the record, and fail if it does not exist. Merges new bin data into the existing record.
UPDATE (default) — Update or insert (upsert) the record. Merges new bin data if the record exists.
REPLACE — Create or replace record. Delete existing bins not mentioned in this write operation.
REPLACE_ONLY — Replace the record, and fail if it does not exist. Delete existing bins not mentioned in this write operation.

Write commit level

WritePolicy.commitLevel specifies whether the node that owns the master partition of the record must wait until it successfully writes to all replicas before it returns success. Writes commands include insert, update, upsert, delete and calling a UDF.

COMMIT_ALL (default) — Wait until the node that owns the master partition writes to all replicas.
COMMIT_MASTER — Return success after writing to the master replica, and replicate to the prole replica(s) asynchronously.

COMMIT_ALL is required when writing to a strong consistency namespace otherwise a write error will occur.

note

Since Database 5.7, if the client is pushing a higher rate of write transactions than the server's replication system can handle, then the backpressure will cause the server to convert write transactions to COMMIT_ALL.

You can dynamically override this client policy using the namespace configuration parameter write-commit-level-override.

Write Generation Policy

The generation policy (WritePolicy.generationPolicy) specifies how to handle record writes based on record generation.

Record generation is an internal counter that uses integer values and that Aerospike increments every time you update a record. ("Generation" in this context does not mean "the act of generating", but "version".) When a record is inserted, the counter starts at 1. Therefore, a record for which the counter is currently at, say, 5, has been updated four times. Client applications cannot directly change the value of the counter. Reading a record does not cause Aerospike to increment its counter.

When Aerospike is in Available and Partition-tolerant (AP) mode, Aerospike resets a record's counter to 1 after it has been updated 64K times. When Aerospike is in strong-consistency mode, it resets a record's counter to 1 after the record has been updated 1K times.

Client applications can use this counter to coordinate a read-modify-write sequence of commands with other client applications.

For example, suppose a client application needs to read data from a record, modify the data, and then write the modified data back into the record. Reading the record requires a lock on it, as does writing to the record. However, during the time the client app modifies data, it holds no lock on the record. Another client app can update the same record before the first client app is able to obtain a write lock and write the modified data.

If the generation policy is set to GEN_EQUAL or GEN_GT:

During the read operation, the client app also reads the value of the generation counter for the record.
After the client app modifies the data and obtains a write lock on the record, it reads the current value of the counter.
One of the following situations occurs:
- If the generation policy is set to GEN_EQUAL:
  - If the generation value sent by client is equal to the generation value on the server, then the client app writes the modified data to the record.
  - If the generation value sent by client is not equal to the generation value on the server, the client app does not perform the write operation. The client app can retry the sequence of read-modify-write commands.
- If the generation policy is set to GEN_GT:
  - If the generation value sent by client is strictly greater than the generation value on the server, the client app writes the modified data to the record.
  - If the generation value sent by client is not greater than the generation value on the server, the client app does not perform the write operation. The client app can retry the sequence of read-modify-write commands.
- If the generation policy is set to NONE:
  - The client app does not read the value of the counter when reading data from the record.
  - After modifying the data that it read, it writes the modified data to the record.
- With the generation policy set to GEN_EQUAL or GEN_GT, write commands fail with error code 3, AS_ERR_GENERATION. In this case,the fail_generation and the generic client_write_error stats will tick on the server.
Possible values:
- If the value is set to NONE (default):
  - Client apps do not use the record-generation counter to restrict writes.
- If the value is set to EXPECT_GEN_EQUAL:
  - Client apps update or delete records where the generation value sent by client is equal to the generation value on the server.
  - Otherwise, write commands fail, and client apps can retry them.
- If the value is set to EXPECT_GEN_GT:
  - Client apps update or delete records where the generation value sent by client is less than the generation value on the server.
  - Otherwise, write commands fail and client apps can retry them.
  - This value is useful for when you want to restore records from a backup, and want to write only records for which you have an older version.
- With the value set to EXPECT_GEN_EQUAL or EXPECT_GEN_GT, write commands fail with error code 3, AS_ERR_GENERATION. In this case,the fail_generation and the generic client_write_error stats will tick on the server.

Expiration (Time To Live)

Record expiration (WritePolicy.expiration), or time to live (TTL), is the number of seconds the record will live before being removed by the server. Expiration values:

-2 — Do not change ttl when record is updated.
-1 — Never expire.
0 — Default to namespace configuration variable "default-ttl" on the server.
> 0 — Actual ttl in seconds.

Aerospike Database 7.1 introduced an LRU eviction behavior. Client versions implementing this functionality can control when reads extend record void-time, regardless of namespace configuration, with the Policy.readTouchTtlPercent:

A value of 0 instructs the server to use the default-read-touch-ttl-pct of the namespace or set.
A value of -1 states that this read operation will never modify the record's TTL.
A value of 1-100 describes that this read should also touch the record, extending its TTL, if the record's void-time is within this percentage.

Durable Delete

When WritePolicy.durableDelete is true, a record delete leaves a tombstone. This prevents deleted records from reappearing after node failures.

note

When a namespace is configured with strong consistency, regular deletes (expunges) are blocked unless the configuration parameter strong-consistency-allow-expunge is used to relax strong consistency.

Set Default Client Policies​

Set Per-Transaction Client Policies​

Policy Definitions​

Replica​

AP Read Mode​

SC Read Mode​

Send Key​

Socket Timeout​

Total Timeout​

Max Retries​

Sleep Between Retries​

Write Mode​

Write commit level​

Write Generation Policy​

Expiration (Time To Live)​

Durable Delete​

Set Default Client Policies

Set Per-Transaction Client Policies

Policy Definitions

Replica

AP Read Mode

SC Read Mode

Send Key

Socket Timeout

Total Timeout

Max Retries

Sleep Between Retries

Write Mode

Write commit level

Write Generation Policy

Expiration (Time To Live)

Durable Delete