Skip to main content
Loading

XDR record shipment lifecycle

This page describes the stages of Aerospike's Cross Datacenter Replication (XDR) record shipment lifecycle, the configuration parameters that control that lifecycle, and the metrics that monitor the stages.

Overviewโ€‹

XDR delivers data asynchronously from an Aerospike source cluster to a variety of destinations.

  • You can start writing records as soon as you enable a namespace.
  • The system logs and collects statistics on the progress through the in-memory XDR transaction queue even if the datacenter is not yet connected.
  • The system starts shipping as soon as the connection to the datacenter is established.
note

For more details see the XDR architecture page.

XDR record shipment lifecycleโ€‹

The XDR shipment lifecycle begins when a record is successfully written in the source partition, then submitted to the per-partition, in-memory XDR transaction queue of each datacenter.

XDR transaction queueโ€‹

The size of the transaction queue is controlled by transaction-queue-limit. If the transaction queue fills faster than XDR can ship to the remote destination, Aerospike switches to recovery mode. After XDR catches up, it switches back to using the transaction queue.

There is a distinct XDR transaction queue for each datacenter, namespace and partition permutation. This means that each record modified in a namespace enabled for XDR shipping is placed according to its partition in the correct XDR transaction queues. Because a namespace may be configured to ship to multiple remote destinations, the record's metadata may be placed in multiple XDR transaction queues simultaneously.

Transaction threadsโ€‹

The various transaction threads manage a record throughout the lifecycle. The threads include:

  • dc thread - sequentially processes all the partitions at the source node for a specific remote datacenter. This thread processes all pending entries in the XDR transaction queues and retry threads.
  • service thread - receives record from dc thread, reads it locally and prepares it for shipment.
  • recovery thread - reads updates from the primary index when XDR is in recovery mode.

XDR recovery modeโ€‹

When the XDR transaction queue is full, the queue is dropped and XDR goes into recovery mode. In this mode, XDR scans the full primary index to compare a record's last ship time (LST) to its last update time (LUT)) in order to ship instead of using the transaction queue.

Diagram explanationโ€‹

The following diagram shows the lifecycle of a record from a single partition in a single namespace in a single datacenter. These processes occur for all partitions and namespaces in a live environment.

  • Dashed boxes indicate a process stage.
  • Bullets indicate how the metric or stage names are logged.

Each stage is described immediately following the diagram.

Fig. 1 - XDR record shipment lifecycle, with metrics

Stage 1 - Write to source partitionโ€‹

After a successful write in the source partition, the record is submitted to the per-partition, in-memory transaction queue of each datacenter.

  • If a duplicate key is found in the XDR transaction queue within the time specified in hot-key-ms, the duplicate is not inserted into the queue. This reduces the number of times the same version of a hot key ships.

Stage 2 - Read from queueโ€‹

The dc thread picks the record from the transaction queue and forwards it to the service thread.

  • The run period of the dc thread is controlled by period-ms. If there is a delay in shipping a write transaction, period-ms is the likely source.
  • If the recordโ€™s last update time (LUT) is within the time specified in delay-ms from the current time, the dc thread skips it, and does not yet hand it to a service thread.
  • (strong consistency mode) The sc-replication-wait-ms parameter provides a default delay in strong consistency mode (SC mode) to prevent XDR from attempting to ship records before their initial replication is complete.

Stage 3 - Send to service thread and remote destinationโ€‹

The service thread reads the record locally, prepares the record for shipment, and ships it to the remote destination. The max-throughput to the remote destination is applied on a per-datacenter, per-namespace basis.

  • In SC mode, if the record is unreplicated, XDR triggers a re-replication by inserting a transaction into the internal transaction queue. A service thread picks up the transaction from the internal transaction queue and checks if the transaction timed out.
    • If it timed out, it does not re-replicate and XDR does not ship the record to the destination. A future client read/write will trigger re-replication and may succeed in shipping to destination.
    • If not timed out, the service thread re-replicates the record. The re-replication makes an entry into the XDR in-memory transaction queue. This re-replication also may time out and leave the record unreplicated. However, the entry in the XDR in-memory transaction queue triggers another round of re-replication immediately.

Stage 4 - Send responseโ€‹

The remote destination attempts to write the record and returns the completion state of the transaction to the source datacenter. The completion state can be success, temporary failure (key busy or device overload), or permanent error (such as record too big).

Stage 5 - Retry if necessaryโ€‹

If there is a timeout or temporary error, the service thread sends the record to a retry queue.

Transaction delaysโ€‹

XDR configuration parameters control various stages of the record lifecycle where delays may be necessary because XDR is not able to keep up with writes.
The following table lists the configuration parameters that control the record lifecycyle.

ParameterDescriptionNotes
transaction-queue-limitMaximum number of elements allowed in XDR's in-memory transaction queue per partition, per namespace, per datacenter.Default: 16*1024 = 16384

Minimum: 1024

Maximum: 1048576
hot-key-msPeriod in milliseconds to wait in between shipping hotkeys. Controls the frequency at which hot-keys are processed.

Good in situations where the XDR transaction queue is potentially large. This avoids having to check across the whole queue for a potential previous entry corresponding to the current incoming transaction.
Minimum: 0, Maximum: 5000.
period-msPeriod in milliseconds at which the dc-thread processes partitions for XDR shipment.Default: 100ms.
delay-msPeriod in milliseconds as an artificial delay on shipment of all records, including hotkeys and records that are not hot. Forces records to wait the configured period before processing.Minimum: 0, Maximum: 5000. Must be less than or equal to the value of hot-key-ms.
sc-replication-wait-msNumber of milliseconds that XDR waits before dequeuing a record from the in-memory transaction queue of a namespace configured for SC. Prevents records from shipping before fully replicated.Minimum: 5, Maximum: 1000
max-throughputNumber of records per second to ship using XDR.Must be in increments of 100 (such as 100, 200, 1000)

XDR version shipping controlโ€‹

Database 7.2 introduces two configuration parameters to control dynamically how XDR ships versions of records.

  • ship-versions-policy controls how XDR ships versions of modified records between the source cluster and a destination.
  • ship-versions-interval specifies a time window in seconds within which XDR is allowed to skip versions.

XDR keeps track of the unique identifier (digest) of records that are modified (inserted, updated, deleted), expire, or are evicted. When it's time to ship a record, XDR reads it from storage and ships it to the destination. If there are multiple updates to the same record in a short period of time, XDR achieves maximum throughput by shipping only the latest version of the record. While this keeps two remote Aerospike Database clusters in sync, it also prevents XDR from shipping all versions of a record, which may be necessary when Aerospike is shipping data to an Aerospike or a non-Aerospike destination through a connector, such as Kafka, Pulsar, or JMS.

ship-versions-policy: latestโ€‹

This is the default policy. It allows the latest write to be shipped to the destination. This is the behavior in server versions prior to 7.2.

ship-versions-policy: allโ€‹

This policy ships all writes to the destination. Subsequent writes of the record are delayed or blocked until the previous write is shipped to the destination. For blocked writes, the server returns AS_ERR_XDR_KEY_BUSY 32 error to the client.

Similar to the interval policy, in low-throughput use cases this policy can ensure that the destination receives every update to every record. The operator does need to consider the backpressure implications when lag between source and destination increases.

If ship-versions-policy is all, delay-ms must be 0. delay-ms cannot exceed the time window specified by ship-versions-interval.

This configuration guarantees that there are no two writes on the record at the same time, so there won't be any write hotkeys in the system.

ship-versions-policy: intervalโ€‹

This policy guarantees shipping of at least one version in this interval. If there are multiple writes within an interval, the last write in the interval is definitely shipped.

The value of the interval is set in ship-versions-interval in the XDR's DC namespace subsection. ship-versions-interval takes value between 1 and 3600 seconds. The default value is 60 seconds.

ship-versions-policy is a relaxation over the all policy. It allows the write on the record to proceed even if the previous write is not shipped to the destination, if they belong to the same interval. The interval policy is a good balance between the latest and all policies. It allows the writes to proceed even if the previous write is not shipped to the destination if they belong to the same interval.

Limitationsโ€‹

Writes may be blocked or delayedโ€‹

Writes may be blocked or delayed until the previous version of a record is shipped. This is more noticeable with hot keys when the ship-versions-policy is set to all, as XDR needs to ship the prior version of the record before allowing a subsequent write to the same key.

As a workaround, you can change the ship-versions-policy to interval with an appropriate interval granularity.

Rewind may have blocking impactโ€‹

If ship-versions-policy is set during rewind of the namespace, it can block all the writes until the first round of recovery is complete. See Rewind a shipment for details.

Impact of ship-versions-policy on multiple DCsโ€‹

When configuring multiple datacenters for XDR, writes may be blocked if the ship-versions-policy is set to either all or interval in any of the datacenters. This occurs because the Aerospike server processes writes globally, rather than independently for each datacenter.

If the previous version of a record has not been shipped to one datacenters due to the configured ship-versions-policy, subsequent writes to that record are blocked until the pending version is successfully shipped. As a result, even if shipping to only one datacenter is delayed, it can impact the shipping process to all other datacenters.

Rescueโ€‹

In certain scenarios, XDR may fail to ship deletes even when this feature is enabled. For example, if a record is expired or evicted, and then the record is rewritten before the NSUP thread can convert the deleted record into an XDR tombstone. When that happens the primary index element of the expired or evicted record will be reused by the new write. If the tombstone is not created, the non-durable delete caused by expiry or eviction is not shipped.

Metrics of XDR processesโ€‹

Lifecycle stageMetric nameDescription
Waiting for processing and processing startedin_queue

in_progress
The record is in the XDR in-memory queue and is awaiting shipment.-XDR is actively shipping the record.
Retryingretry_conn_reset

retry_dest

retry_no_node
A transient stage. The shipment is being retried due to some error. Retries continue until one of the completed states is achieved.
Recoveringrecoveries

recoveries_pending
A transient stage that indicates internal-to-XDR "housekeeping". If XDR cannot find record keys in its in-memory transaction queue, it reads the primary index to recover those keys for processing.
Completedsuccess

abandoned

not_found

filtered_out
Shipment is complete. The state of shipment is either success or one that indicates a non-success result.