Aerospike 8 adds distributed ACID transactions for developers
Dive into Aerospike 8.0's groundbreaking distributed ACID transactions, designed to enhance consistency and scalability for developers.
Aerospike has long been known as one of the world's fastest, most scalable, and most resilient databases. Its support for guaranteed consistency, even during network splits, makes it suitable for use cases where consistency is paramount, such as real-time payment systems. It is used across a wide range of mission-critical use cases, such as real-time bidding in AdTech, fraud detection, recommendation systems, customer 360, and so on.
Good as it is, Aerospike has been limited in some use cases because its unit of atomicity was a single record. It was not possible to coordinate updates to two or more records with atomic guarantees. Now, with version 8.0, Aerospike introduces distributed ACID transactions to solve this shortcoming.
The need for ACID transactions
ACID (atomicity, consistency, isolation, durability) transactions have long been a feature of relational databases. However, NoSQL databases are typically built for low latency and high throughput, and transactions inherently require locking, limiting both of these important factors.
In Aerospike, the record associated with a single key can be very complex, having multiple columns, with each column able to be a complex nested document. Multiple operations could be executed within a single command on a single record atomically, so this allowed some level of transactional guarantees – assuming the objects to be altered within a transaction could be grouped into a single record.
In some cases, this makes sense. For example, consider a Person object that has Addresses. These addresses may have no meaning to the business if the owning person goes away, so a common practice would be to embed (denormalize) the addresses inside a document in the record for the person. Deleting the person, therefore, removes the addresses, and addresses could be updated atomically with the person.
There are, however, many situations where this technique is not applicable.
Consider, for example, the classic banking case of transferring money between two different accounts. These accounts might belong to the same person or different people, making nesting them into a single record infeasible.

Here, we want to send $100 from Account 1 with an initial balance of $1,000 to Account 2 with an initial balance of $2,000. The two operations of the debit from Account 1 and the credit to Account 2 must either both succeed, or both fail. There can be no possibility of one operation failing and the other succeeding, nor at any time should the sum of the balances of Account 1 and Account 2 not be $3,000, no matter how transiently.
For this, ACID transactions are necessary.
Transactional properties in Aerospike
Given the high-throughput, low-latency nature of Aerospike, two design criteria were mandated for transactions:
Strict Serializability: The transactions should have the highest level of consistency guarantee, which is strict serializability. This means that in all situations, the records will be committed or rolled back atomically. Strict serializability means that transactions with disjoint keys cannot suffer from “causal reversal”.
Performant. Aerospike is known for predictable low latency for reads and writes, so the design called for as low additional latency as could be achieved while still maintaining strict serializability. Aerospike’s design attempts to minimize the number of locks needed in any transaction. For example, if a record is read in a transaction, no locks are taken so multiple transactions can read the same record concurrently, without the cost of creating a lock. However, to ensure the record has not changed during the transaction, the version of the record is verified when committing the transaction. This ensures it matches the version that was initially read.
Aerospike’s implementation of ACID transactions
Let’s look at how a transaction is implemented in Aerospike:
1: Txn txn = new Txn();
2:
3: WritePolicy wp = client.copyWritePolicyDefault();
4: wp.txn = txn;
5:
6: client.add(wp, key1, new Bin("balance", -100));
7: client.add(wp, key2, new Bin("balance", 100));
8: client.commit(txn);
This code is fairly simple if you’re used to using Aerospike. Some key points to note:
Line 1: Here we create a new Txn object. This object contains an id (a randomly generated long value), some configuration parameters like the timeout, and the internal state of the transaction.
Line 4: Most policies in Aerospike now have a txn field. By default, this is null, which means the command does not exist within a transaction. Setting it to be a transaction object, as shown in this case, includes that command within the transaction.
Lines 6 and 7: The policy is passed to the command, making these commands part of the same transaction.
Line 8: This line will commit the transaction, ensuring that changes to the two records are visible to all operations.
Note that with this syntax, it is possible, but highly unlikely, to have one thread start multiple different transactions, none of which have been committed or aborted (outside of testing scenarios).
Transaction timeout
Transactions are not designed to be long-running. They have a timeout that specifies the maximum anticipated life of the transaction. Any transaction that exceeds its timeout will correctly be handled by the system. This is almost always to abort the transaction. The only exception is If a transaction times out during a commit operation and the system is far enough along to guarantee success, then the system will finish committing the transaction.
This means that even in the face of system issues, such as cluster node failure, or a client application dying in the middle of a transaction, the data will still maintain transactional integrity.
The timeout for a transaction is specified in the Txn object. It is a value specified in seconds and must be in the range between 0 to 120 seconds inclusive. To set the value to 20 seconds, the following code could be used:
Txn txn = new Txn();
txn.setTimeout(20);
Note that if the timeout is set to zero (the default), the transaction timeout will get the value of the mrt-duration
configuration parameter on the server. This defaults to 10 seconds.
The transaction timeout is not started until the first write occurs within the transaction. Upon receiving this write, Aerospike will create a “monitor record” which is persisted in the database. This monitor record contains the state of the transaction, a list of all the records changed or deleted in the transaction, and the expiry epoch. The epoch is calculated using the server time rather than the client time to ensure consistency if the client clock is not synchronized with the server.
If the transaction exceeds this transaction epoch, then an exception with a result code of MRT_EXPIRED will be thrown the next time the transaction is used.
Visibility of items changed in a transaction
Let’s see what happens when the values changed within a transaction are read. There are three distinct cases to look at. Assume for the following examples that the initial value of the age
field was 1:
1. Reads within the same transaction: If a record has been changed and is read within the transaction that changed it, the new value will be returned. Consider the following code:
Txn txn = new Txn();
WritePolicy wp = client.copyWritePolicyDefault();
wp.txn = txn;
client.put(wp, key, new Bin("age", 2));
// Read in the same transaction
Policy rp = client.copyReadPolicyDefault();
rp.txn = txn;
int age = client.get(rp, key).getInt("age");
The put
and get
commands both use the same transaction. Hence, irrespective of the old value of the age
bin on the record, the get()
call will return 2 and set this value in the age
variable.
2. Commands not in a transaction: If an uncommitted transaction has modified a record and that record is read outside of any transaction, the old value will be returned. For example:
Txn txn = new Txn();
WritePolicy wp = client.copyWritePolicyDefault();
wp.txn = txn;
client.put(wp, key, new Bin("age", 2));
// Non-transactional read
int age = client.get(null, key).getInt("age");
Here the get has no policy attached to it, so it is outside any transaction. In this case, it will return the original value, that is 1.
3. Commands in a different transaction: If a second transaction tries to read a record that is modified by an uncommitted transaction, the second transaction will fail as it cannot know if the modifying transaction will commit or rollback, so the value is indeterminate.
Consider the following code:
Txn txn = new Txn();
WritePolicy wp = client.copyWritePolicyDefault();
wp.txn = txn;
client.put(wp, key, new Bin("age", 2));
// Read in a different transaction
Txn txn2 = new Txn();
Policy rp = client.copyReadPolicyDefault();
rp.txn = txn2;
int age = client.get(rp, key).getInt("age");
The put
and get
commands are in different transactions. The get
command executes after the put
, so the record is locked by the modifying transaction. The get
will fail with an exception which has a result code of MRT_BLOCKED.
Error handling
If an exception happens during the processing of a transaction and it cannot successfully complete, the transaction should be rolled back. This can be achieved with the abort() method. This will remove any provisional records created during the transaction (discussed later) and ensure the state of the records changed during the transaction are the same as they were prior to the changes made during the transaction.
Hence, a more apt pattern would be similar to:
Txn txn = new Txn();
try {
WritePolicy wp = client.copyWritePolicyDefault();
wp.txn = txn;
client.add(wp, key1, new Bin("balance", -100));
client.add(wp, key2, new Bin("balance", 100));
client.commit(txn);
}
catch (AerospikeException ae) {
client.abort(txn);
}
If any command fails within the transaction, the abort() method will be called, rolling back the transaction.
There are several scenarios where a single command can fail in a transaction with multiple commands due to conflicts with other Aerospike transactions. There are other reasons a single command might experience a transient failure too, such as writing to a node that has failed and the cluster has not had time to self-heal.
In both these cases, it may be possible to retry the single command that has failed and not re-do the whole transaction. There are only certain failure reason codes where this would be feasible. For example, if the record has not been read in the transaction and is being written and encounters an MRT_BLOCKED exception, retrying would be feasible; another transaction is modifying the record and this write may succeed once the other transaction has completed.
However, if the record has already been read in the transaction and a write conflict occurs, the write cannot just be retried; the version of the read will no longer match and, hence, will always fail.The whole transaction must be restarted.
Additionally, handling various error codes, especially when performing non-idempotent transactions like incrementing an account balance, can be very complex. For this reason, it is recommended that, in almost all cases, the whole transaction be aborted and restarted.
Thus, it is not unusual to have code like:
while (true) {
Txn txn = new Txn();
try {
WritePolicy wp = client.copyWritePolicyDefault();
wp.txn = txn;
client.add(wp, key1, new Bin("balance", -100));
client.add(wp, key2, new Bin("balance", 100));
client.commit(txn);
break;
}
catch (AerospikeException ae) {
client.abort(txn);
Thread.sleep(500);
}
}
This code will keep attempting to execute the transaction with any errors causing it to retry. The most likely scenarios why this transaction might fail and need to be retried include:
Another transaction has already locked one of the two records needed for this transaction. In this case, an exception with a ResultCode of MRT_BLOCKED would be thrown. Subsequent retries of the whole transaction after a delay would allow the other transaction to complete, permitting this transaction to continue.
The transaction has exceeded the maximum timeout. In this case, the exception thrown will have a ResultCode of MRT_EXPIRED. The transaction can be retried, potentially with a longer timeout.
A node in the cluster failed catastrophically without going through a clean shutdown. This could be caused by a hardware failure for example. In this case, requests to the node would experience a timeout. Network requests to this node would then fail until the cluster readjusts to cater for the failed node. This process typically takes a couple of seconds, but depends on your system configuration. During this time, an exception would be thrown with inDoubt set to true.
Commit failure handling
There is one situation where error handling becomes very important: if the commit itself fails. There are several reasons that a transaction can fail. These include:
Read verification fails with a result code of MRT_BLOCKED. The transaction needs to be retried.
The maximum time for the transaction has been exceeded, giving a result code on MRT_EXPIRED. The transaction needs to be retried.
The state of the transaction cannot be verified, most likely due to an issue updating the internal state of the monitor record. The commit can be retried, normally after a delay, to allow Aerospike to discover and rectify the issue.
This last point can lead to code like:
while (true) {
try {
client.commit(txn);
break;
}
catch (AerospikeException ae) {
if (ae.getInDoubt()){
// special handling for in doubt case -
// will need to attempt to commit again
try {
Thread.sleep(200);
}
catch (InterruptedException ignored) {}
}
else if (ae.getResultCode() == ResultCode.MRT_COMMITTED) {
// Great, all done!
break;
}
else if (ae.getResultCode() == ResultCode.MRT_ABORTED) {
// already aborted
throw ae;
}
else {
try {
client.abort(txn);
}
catch (AerospikeException ignored) {}
throw ae;
}
}
}
It should be noted that a transaction that has previously been committed may be committed again and will succeed, or a transaction that has previously been aborted may be aborted again successfully. However, performing a commit()
after a successful abort()
call or an abort()
call after a successful commit()
call will lead to an exception being thrown.
This is not the only way to handle these exceptions, but it would work in many use cases. It is likely that Aerospike will build some logic similar to the above into the client calls in the future.
Limitations
Aerospike’s transactions are designed to be fast and scalable. While there are a number of limitations to these transactions, they are unlikely to impact applications in most business use cases.
A maximum of 4096 different records can be written in a single transaction. Exceeding this limit will result in an exception with a result code of MRT_TOO_MANY_WRITES being thrown. There is no limit on the number of records that can be read in a single transaction.
ACID transactions can only run inside a namespace configured with
strong-consistency
set totrue
. The whole point of transactions is to ensure data integrity between records, and availability mode cannot guarantee correctness to ensure this data integrity is preserved.All deletes within a transaction must be durable deletes. Using a non-durable delete can introduce a corner case where correctness cannot be guaranteed. It is recommended that any namespace that uses transactions be configured with
strong-consistency-allow-expunge
to befalse
to prevent accidental non-durable deletes.Transaction timeouts cannot exceed 120 seconds. For this reason, it is important to avoid “conversational transactions,” where the transactions wait for external systems or users to respond before progressing with the transaction unless strict SLAs are enforced. Waiting for user input within a transaction is never recommended.
All records forming part of a transaction must be in the same namespace. The Aerospike client stores lists of records that have been read and written as part of this transaction, as well as the monitor record keeping a list of records that have been modified. These lists just use the Aerospike digest to uniquely identify a record and do not include the extra overhead of the containing namespace.
ACID transactions will not block if there is a conflict with another transaction; instead they will fail as soon as the conflict is detected. This is typically faster in systems with many short-lived transactions across a large number of records, such as the account transfer example presented here. In systems where there is a small pool of records across a very large number of transactions, care must be taken to retry on MRT_BLOCKED result code. An example of this would be a seat reservation system for a newly released concert with a very large fan base.
Queries are not supported in transactions. Batch reads and point reads, however, are, so if queries are really needed in a transaction, the query can be performed with
includeBinData
set tofalse
, and the resulting matched records could be read in a batch.Truncates should not be performed on a namespace with active transactions. This can cause havoc with transaction state, such as read verification if the record to be verified has been removed!
Internal workings
Let’s take a look at how transactions work in Aerospike. While this is not mandatory to understand, knowing this will help with making sure your programming models are efficient and cover all error cases. This is not a comprehensive discussion – other blogs published by Aerospike will cover this topic in more depth.
Reading a record
When a record is read for the first time in a transaction, the Aerospike client library will read the version of a record as well as its contents. This version changes any time a write of the record occurs, and is retained by the Aerospike client library in a list of reads for that transaction.
As mentioned earlier, no locks are held against the read, so multiple different transactions can read the same record at the same time without the introduction of read locks.
Writing a record
When a write occurs on a record for the first time in a transaction, Aerospike attempts to obtain a write lock on the record. If the record is not locked (i.e., it is not currently being modified in another transaction), then Aerospike will obtain the lock. When Aerospike obtains a lock on the record, it creates a transient second copy of the record holding the updated version on the record after the commands in the transaction have been applied. This copy is called the provisional record, and the presence of the provisional record implies the record is locked.
If the record was already locked by a different transaction, then an exception with reason code MRT_BLOCKED is thrown. Note that the same transaction can change a single record multiple times within the same transaction.
The normal Aerospike primary index points to this provisional (modified) record, but this primary index also now holds a reference to a second primary index entry. This entry points to the original record before the transactional changes were applied. This original primary index entry does not appear in the normal Aerospike primary index trees; the only reference to it is held in the provisional primary index.
The primary index structure refers to the provisional record’s entry, and the id of the transaction that created this copy is also stored in the primary index entries.
Diagrammatically, this looks like:

The Aerospike client library also holds a list of records that have been written in this transaction. This list is also held on the servers in the monitor record.
Committing a transaction
The commit of a transaction does a reasonable amount of work. This includes:
For every read in the list of reads, the record is read and the version checked. If the current version is not the same as the version that was read, an
AerospikeException.Commit
exception is thrown with a result code of TXN_FAILED. This class knows exactly which records failed read verification if this is important to the application.
These reads are done with linearizability correctness to ensure there is no chance of a stale read. This allows the transaction to be serializable.The transaction is marked as “roll-forward” on the server. Until this time, the transaction is marked as “roll-backwards,” which means that if the transaction is abandoned due to a timeout or a failure, Aerospike will automatically abort the transaction. However, once marked as “roll-forward,” the transaction will be committed even if it’s abandoned.
Every record that was changed in the transaction removes the link to the original record, thereby promoting the provision record to be the record in the database. The original records are removed from the database.
The transaction metadata is removed by deleting the monitor record.
Multiple reads and writes
Let’s look at some scenarios featuring multiple reads and writes. For the sake of this discussion we will use a circle to denote a read, with the record being read in it, and a write will be designated by a square. Different transactions will appear in different colors.

So, the example illustration above will represent a read of record A, followed by a write of record B in one transaction.
Read promoting
A read promotion happens when a single transaction reads a record (A, at time t0) and later in the same transaction writes the same record. As mentioned above, when the read occurs, the record is not locked, but rather the version of the record remembered.

When a write to A occurs at time t1, Aerospike performs the following atomically under a record lock:
Checks the current version of the record against that read at time t0. If the versions don’t match then the record has been changed since it was read. The read will fail with a result code of MRT_VERSION_MISMATCH. This allows an “early fail” mechanism and is simpler than the read verification at step 1 on the commit as there is only one record being promoted.
A lock is taken on the record and the provisional record created like a standard write.
The read is removed from the read verification list in the client as there’s now a lock on the record to prevent any further changes except from this thread, so read verification is no longer needed.
Reads in multiple transactions
In this scenario, two transactions overlap but only share a record (or set of records) that are read. Those records (“A” in this example) are never changed. In this case, both transactions would succeed.

Reads and writes in multiple transactions
Consider a similar case now, but in this case the left-most transaction actually alters the state of record A after reading it. In this case, the read on the right transaction will succeed (as the object has not been modified at that point), but the commit of the transaction at time T3 will fail as the read verification of record A will return a different version than was read at time T1.
Note that there are other timings in this scenario where the read of A could actually fail instead. If the read of A on the right-most transaction occurred at time T3 instead of T1, the left transaction would have record A locked, causing a read failure on the right transaction at that point.

Writes can also conflict, and this is probably the more likely scenario. This diagram shows only writes conflicting on the B record. Note that since Aerospike serializes write access to a record, it is impossible for two transactions to update the same record at exactly the same time — one will always occur before the other.
In this diagram, the right transaction modifies record B first. When the left transaction tries to modify the same record an exception with reason code MRT_BLOCKED would be thrown since the lock could not be obtained.

Explore distributed ACID transactions with Aerospike 8
Aerospike’s ACID transactions provide serializable transactions at scale with only minor changes to the API developers use. There are some caveats to be aware of, but generally, transactions are easy to use and scale very well. Transactions do involve additional reads and writes of the database (as transactions require locking); however, the impact of these in terms of latency is often smaller than anticipated. Many different use cases are now achievable with the use of transactions at high speed and high scale.