Aerospike Restore (asrestore)
Overviewโ
This page describes how to restore data saved using the Aerospike backup utility, asbackup
, and explains command options.
The asrestore
utility restores backups created with asbackup
.
If records exist in the namespace on the cluster, a configurable write policy determines which records take precedence, either records in the namespace or records from the backup.
If a transaction fails, it is retried according to configurable timeout options.
Required permissionsโ
Access requirements for the asrestore
utility depend on what kind of objects exist in the namespace.
If the namespace contains no user-defined functions or secondary indexes,
read-write
is the minimum necessary access role.If the namespace contains user-defined functions,
udf-admin
is the minimum necessary access role to restore UDFs for Database 6.0 or newer. Otherwise,data-admin
can be used.If the namespace contains secondary indexes,
sindex-admin
is the minimum necessary access role to restore secondary indexes for Database 6.0 or newer, otherwise,data-admin
can be used.
For more information about Aerospike's role-based access control system, see Privileges, permissions, and scopes.
Usageโ
The -Z
or --help
option of asrestore
gives an overview of all supported command line options.
asrestore --help
The simplest option for running asrestore
is to specify the cluster to restore (--host
) and the local directory containing the backup files (--directory
).
Suppose you have a cluster that contains a node with IP address 1.2.3.4
.
To restore a backup from directory backup_2024_08_24
, issue the following command:
asrestore --host 1.2.3.4 --directory backup_2024_08_24
By default, the backup is restored to the namespace that it was taken from. The --namespace
option can be used to restore to a different namespace. Suppose that the previous backup example was taken from namespace test
, and you want to restore it to namespace prod
. Issue the following command:
asrestore --host 1.2.3.4 --directory backup_2024_08_24 --namespace test,prod
When specifying the --directory
option, asrestore
expects multiple .asb
backup files in the given directory. Alternatively, --input-file
makes asrestore
read the complete backup from the given single file. If -
is specified as the file, asrestore
reads the backup from stdin
. This allows for pipelines:
cat backup.asb.gz | gunzip | asrestore --input-file - [...]
Connection Optionsโ
Option | Default | Description |
---|---|---|
-h HOST or --host HOST | 127.0.0.1 | The host that acts as the entry point to the cluster. Any of the cluster nodes can be specified. The remaining cluster nodes will be automatically discovered. |
-p PORT or --port PORT | 3000 | Port to connect to. |
-U USER or --user USER | - | User name with write permission. Mandatory if the server has security enabled. |
-P PASSWORD or --password | - | Password to authenticate the given user. The first form passes the password on the command line. The second form prompts for the password. |
-A or --auth | INTERNAL | Set authentication mode when user and password are defined. Modes are (INTERNAL, EXTERNAL, EXTERNAL_INSECURE, PKI) and the default is INTERNAL. This mode must be set EXTERNAL when using LDAP. |
-t THREADS or --parallel THREADS | 20 | The number of client threads to spawn for writing to the cluster. Higher numbers mean faster restores, which may, however, have a negative impact on server performance. |
--tls-enable | disabled | Indicates a TLS connection should be used. |
-S or --services-alternate | false | Use to connect to alternate-access-address when the cluster nodes publish IP addresses through access-address which are not accessible over WAN and alternate IP addresses accessible over WAN through alternate-access-address . |
Timeout Optionsโ
Timeouts are governed by --max-retries
and --retry-scale-factor
.
By default, these are 5 and 150ms respectively.
An exponential backoff strategy is followed where the delay is retry-scale-factor * 2 ** (retry_attempts - 1)
, or 0 on the first try.
Option | Default | Description |
---|---|---|
-T TIMEOUT or --timeout TIMEOUT | 10000 | Timeout (ms) for Aerospike commands to write records, create indexes and create UDFs. |
--socket-timeout MS | 10000 | Socket timeout for write transactions in milliseconds. If this value is 0, it is set to total-timeout. If both are 0, there is no socket idle time limit. |
--total-timeout MS | 0 | Total socket timeout in milliseconds. If this value is 0 and --timeout is set, then the --timeout value is used as the write transaction timeout. Default is 0, that is, no timeout. |
--max-retries N | 5 | Maximum number of retries before aborting the current write transaction. |
--retry-scale-factor MS | 150000 (150ms) | The scale factor to use in the exponential backoff retry strategy, in microseconds. Note: --retry-delay and --sleep-between-retries are deprecated in favor of --retry-scale-factor . |
If --max-retries
is exceeded, the transaction is logged as an error and asrestore
exits. The specific errors that do not affect retries are "record exists" and "generation mismatch." In the case of these errors, asrestore
skips the affected record and moves on.
TLS Optionsโ
Option | Default | Description |
---|---|---|
--tls-cafile=TLS_CAFILE | Path to a trusted CA certificate file. | |
--tls-capath=TLS_CAPATH | Path to a directory of trusted CA certificates. | |
--tls-name=TLS_NAME | The default TLS name used to authenticate each TLS socket connection. Note: this must also match the cluster name. | |
--tls-protocols=TLS_PROTOCOLS | Set the TLS protocol selection criteria. This format is the same as Apache's SSL Protocol. If not specified, asrestore uses TLSv1.2if supported. Otherwise it uses -all +TLSv1`. | |
--tls-cipher-suite=TLS_CIPHER_SUITE | Set the TLS cipher selection criteria. The format is the same as OpenSSL's Cipher List Format. | |
--tls-keyfile=TLS_KEYFILE | Path to the key for mutual authentication (if Aerospike Cluster is supporting it). | |
--tls-keyfile-password=TLS_KEYFILE_PASSWORD | Password to load protected tls-keyfile. It can be one of the following: 1) Environment variable: env:VAR 2) File: file:PATH 3) String: PASSWORD User will be prompted on command line if --tls-keyfile-password specified and no password is given. | |
--tls-certfile=TLS_CERTFILE <path> | Path to the chain file for mutual authentication (if Aerospike Cluster is supporting it). | |
--tls-cert-blacklist PATH | Path to a certificate blocklist file. The file should contain one line for each blocklisted certificate. Each line starts with the certificate serial number expressed in hex. Each entry may optionally specify the issuer name of the certificate (serial numbers are only required to be unique per issuer). Example: 867EC87482B2 /C=US/ST=CA/O=Acme/OU=Engineering/CN=TestChainCA | |
--tls-crl-check | Enable CRL checking for leaf certificate. An error occurs if a valid CRL files cannot be found in TLS_CAPATH. | |
--tls-crl-checkall | Enable CRL checking for entire certificate chain. An error occurs if a valid CRL files cannot be found in TLS_CAPATH. | |
--tls-log-session-info | Enable logging session information for each TLS connection. |
TLSNAME is only used when connecting with a secure TLS enabled server.
The following example restores a cluster backup to node 1.2.3.4
using the default Aerospike port of 3000
with TLS configured.
HOST is "HOST:TLSNAME:PORT,...
".
asrestore --host 1.2.3.4:cert1:3000 --directory backup_2024_08_24 --namespace test --tls-enable --tls-cafile /cluster_name.pem --tls-protocols TLSv1.2 --tls-keyfile /cluster_name.key --tls-certfile /cluster_name.pem
Input Optionsโ
Option | Default | Description |
---|---|---|
-d PATH or --directory PATH | - | Directory from which to read the .asb backup files. Mandatory, unless --input-file is given. |
--directory-list PATH1[,PATH2[,...]] | - | A comma-separated list of paths to directories that hold backup files. Required, unless -i or -d is used. The paths may not contain commas. Example: asrestore --directory-list /PATH/TO/DIR1,/PATH/TO/DIR2 |
-i PATH or --input-file PATH | - | The single file from which to read the backup. - means stdin . Mandatory, unless --directory is given. |
-N BANDWIDTH, TPS or --nice BANDWIDTH, TPS | - | Throttles asrestore 's read operations from the backup file(s) to not exceed the given I/O bandwidth in MiB/s and its database write operations to not exceed the given number of transactions per second. Useful to limit the impact of asrestore on server performance. |
--parent-directory DIRECTORY | - | A common root path for all paths used in --directory-list. This path is prepended to all entries in --directory-list. Example: asrestore --parent-directory /common/root/path --directory-list /path/to/dir1/,/path/to/dir2 |
-y COMPRESSION_ALG or --compress COMPRESSION_ALG | none | The decompression algorithm to use on backup files as they are read. This option must match that used when taking the backup. The options available are zstd . Refer to compression and encryption. |
-z ENCRYPTION_ALG or --encrypt ENCRYPTION_ALG | none | The decryption algorithm use on backup files as they are read. This option must match that used when taking the backup. The options available are aes128 and aes256 . This option must be accompanied by either --encryption-key-file or encryption-key-env . Refer to compression and encryption. |
Output definitionsโ
When asrestore
restores data to a namespace, it prints out a summary line to indicate the status of the restore. Within the status line there are a number of fields which are quantified with integers as shown below:
2023-02-15 11:06:45 GMT [INF] [29112] Expired 0 : skipped 0 : err_ignored 0 : inserted 795922: failed 3094502 (existed 0 , fresher 3094502)
The status fields are described in the following table.
Field | Description |
---|---|
expired | The record in the backup set has expired (the current time is greater than the void time) and the record is not loaded into the target. |
skipped | Specific data was selected from the restore. Records not selected are shown as skipped. |
err_ignored | Certain errors such as errors in the bin name or AEROSPIKE_ERR_RECORD_TOO_BIG cause asrestore to ignore the record and log it as err_ignored. |
inserted | Record from backup set successfully inserted into target. |
failed(existed) | The --unique option was chosen for the restore, so only records that do not exist are inserted. Records not inserted due to them existing in the target are logged as failed(existed). |
failed(fresher) | The default mode of asrestore is to check generation on insertion. This avoids inserts if the active record is newer than the backup. This also results in a high client write error count in your source cluster, corresponding to these records being fresher than their corresponding backup. To ignore the generation check, use the --no-generation write policy. |
Record insertions that failed due to err_ignored, failed(existed), and failed(fresher) are counted by the Aerospike server as failed client writes and will increment the client_write_error metric.
Data Selection Optionsโ
Option | Default | Description |
---|---|---|
-n ORIGINAL-NS,NEW-NS or --namespace ORIGINAL-NS,NEW-NS | Original namespace | Namespace to be restored. By default, asrestore restores a backup to the namespace from which it was taken. If this option is specified and the namespace from which the backup was taken does not match ORIGINAL-NS , asrestore aborts with an error. This ensures that we restore the data that we intend to restore. If NEW-NS is specified, the backup will be restored to NEW-NS instead of the namespace from which it was taken. |
-s SET1,SET2,... or --set-list SET1,SET2,... | All sets | The sets to restore. |
-B BIN1,BIN2,... or --bin-list BIN1,BIN2,... | All bins | The bins to restore. |
-R or --no-records | - | Do not restore any record data (metadata or bin data). By default, asrestore restores record data, secondary index definitions, and UDF modules. |
-I or --no-indexes | - | Do not restore any secondary index definitions. |
-F or --no-udfs | - | Do not restore any UDF modules. |
-K or --ignore-record-error | false | Ignore permanent record specific error. For example, AEROSPIKE_RECORD_TOO_BIG. By default such errors are not ignored and asrestore terminates. Optional: Use verbose mode to see errors in detail. |
Write Policy Optionsโ
Option | Default | Description |
---|---|---|
-u or --unique | - | Existing records take precedence. With this option, only records that do not exist in the namespace are restored, regardless of generation numbers. If a record exists in the namespace, the record from the backup is ignored. Note: this option is mutually exclusive to --replace and --no-generation . |
-g or --no-generation | - | Records from backups take precedence. This option disables the generation check. With this option, records from the backup always overwrite records that already exist in the namespace, regardless of generation numbers. Note: this option is mutually exclusive to --unique . Warning: by using this option you may lose a more recent version of your data by overwriting it with an older version. |
-r or --replace | - | Replace records. This controls how records from the backup overwrite existing records in the namespace. By default, restoring a record from a backup only replaces the bins contained in the backup; all other bins of an existing record remain untouched. With this option, the existing record is completely replaced; that is, any bins that are not contained in the backup are discarded. This option still does a generation check by default and would need to be combined with the -g option if no generation check is desired. Note: this option is mutually exclusive to --unique . |
-l or --extra-ttl N | - | For records with expirable void-times, add N seconds of extra-ttl to the recorded void-time . |
Transaction Processing Optionsโ
Option | Default | Description |
---|---|---|
--batch-size N | 128 or 16 | The max allowed number of records per C client async batch write call. Default is 128 with batch writes enabled, or 16 without batch writes. Without batch writes, records are grouped and written in a logical "batch". |
--max-async-batches N | 32 | The max number of outstanding async record batch write calls at a time. For pre-6.0 servers, batches are only a logical grouping of records, and each record is uploaded individually. The true max number of async Aerospike calls would then be max-async-batches * batch-size. |
--disable-batch-writes | - | Disables the use of batch writes when restoring records to the Aerospike cluster. By default, the cluster is checked for batch write support, so only set this flag if you explicitly don't want batch writes to be used, or asrestore is failing to recognize that batch writes are disabled and is failing to work because of it. |
--event-loops N | 1 | The number of C client event loops to initialize for processing of asynchronous Aerospike transactions. |
Restore from S3โ
To restore files from Amazon S3, prefix file/directory names with s3://BUCKET/KEY
, where BUCKET
is the name of the S3 bucket to download from, and KEY
is the key of the object to download/prefix of files in the S3 "directory". If using the default S3 endpoint, --s3-region REGION
must be set to the region that the bucket you're uploading to is in. If using another endpoint, specify that endpoint with --s3-endpoint-override URL
.
Files are downloaded in chunks of 5MB, and the maximum number of simultaneous downloads across all threads is controlled with --s3-max-async-downloads
.
Required permissionsโ
asrestore
requires certain permissions for successful use with Amazon S3. The IAM JSON policy should include the following elements. Replace backup-bucket
with the name of the S3 bucket you are using for the restore.
{
"Statement": [
{
"Action": [
"s3:ListBucket",
"s3:GetBucketLocation",
"s3:ListBucketMultipartUploads",
"s3:ListBucketVersions"
],
"Effect": "Allow",
"Resource": [
"arn:aws:s3:::backup-bucket"
]
},
{
"Action": [
"s3:GetObject",
"s3:PutObject",
"s3:DeleteObject",
"s3:AbortMultipartUpload",
"s3:ListMultipartUploadParts"
],
"Effect": "Allow",
"Resource": [
"arn:aws:s3:::backup-bucket/*"
]
}
],
"Version": "2012-10-17"
}
S3 restore optionsโ
Option | Default | Description |
---|---|---|
--s3-region REGION | - | Sets the S3 region of the bucket being uploaded to/downloaded from. Must be set if using the default S3 endpoint. |
--s3-endpoint-override URL | - | Sets the S3 endpoint to use. Must point to an S3-compatible storage system. |
--s3-profile PROFILE_NAME | default | Sets the S3 profile to use for credentials. |
--s3-max-async-downloads N | 32 | The maximum number of simultaneous download requests from S3. |
--s3-connect-timeout MILLISECONDS | 1000 | The AWS S3 client's connection timeout in milliseconds. Equivalent to cli-connect-timeout in the AWS CLI, or connectTimeoutMS in the aws-sdk-cpp client configuration. |
--s3-log-level LEVEL | Fatal | The log level of the AWS S3 C++ SDK. The possible levels are, from least to most granular.
|
Example
To restore all records to an S3 bucket test-bucket
in region us-west-1
under directory test-dir
:
asrestore -d s3://test-bucket/test-dir --s3-region us-west-1
Configuration File Optionsโ
asrestore
can be configured by using tools configuration files. Refer to
Aerospike Tools Configuration for more details. The following
options affect configuration file behavior.
Option | Default | Description |
---|---|---|
--no-config-file | disabled | Do not read any configuration file. Mutually exclusive to --only-config-file . |
--instance NAME | - | Section with these instance is read. For example, in case instance a is specified sections cluster_a , asrestore_a is read. |
--config-file PATH | - | Read this file after default configuration file. |
--only-config-file PATH | - | Read only this configuration file. Mutually exclusive to --no-config-file . |
Secret Agent optionsโ
asrestore
supports using secrets from the Aerospike Secret Agent as arguments. See Secrets for more information.
Other Optionsโ
Option | Default | Description |
---|---|---|
-v or --verbose | disabled | Output considerably more information about the running restore. |
-L or --indexes-last | - | Create indexes after restoring everything else. By default, indexes are restored before everything else, which can prevent costly SSD reads required to build the indexes. |
-w or --wait | - | Wait for secondary indexes to finish building before proceeding. Wait for restored UDFs to be distributed across the cluster. |
-m or --machine PATH | - | Output machine-readable status updates to the given path, typically a FIFO. |
--validate | - | Validate the integrity of the backup files but do not restore any data. |
Validate Backup Filesโ
When used with the --validate
option, asrestore
identifies invalid backup files and displays the bad data.
Validate mode works with any method used to supply backups to asrestore. For example, --directory
, --input-file
, --directory-list
, and so on. Validate mode ignores data selection and Aerospike TLS/connection related
arguments because no connection is made to the database. If validating backups in S3, a connection is still made to s3.
Note Validate mode cannot detect all corrupted data, only issues that make the backup files impossible to parse. For example, if a backup file contains a record with corrupted data that can still be considered a valid record, that issue will not be identified. See the details of the backup file format, here, for more information.
Example
This example shows backing up a single record to a directory, corrupting the backup file, then identifying the invalid data.
% asbackup -h 127.0.0.1:3000 -n test -d test-backup-dir
...
2023-07-19 20:33:55 UTC [INF] [81932] Backed up 1 record(s), 0 secondary index(es), 0 UDF file(s), 168 byte(s) in total (~168 B/rec)
% ls test-backup-dir
test_00000.asb
% cat test-backup-dir/test_00000.asb
Version 3.1
# namespace test
# first-file
+ k S 4 key1
+ n test
+ d 7JEZLUt/jONdXXjTS8ply6qqyWA=
+ s demo
+ g 1
+ t 430086781
+ b 3
- I foo 123
- S bar 3 abc
- Z baz T
Then corrupt the backup file.
% sed -i '' -e 's/S/2/g' ./test-backup-dir/test_00000.asb
% cat test-backup-dir/test_00000.asb
Version 3.1
# namespace test
# first-file
+ k 2 4 key1
+ n test
+ d 7JEZLUt/jONdXXjT28ply6qqyWA=
+ s demo
+ g 1
+ t 430086781
+ b 3
- I foo 123
- 2 bar 3 abc
- Z baz T
And run asrestore in validate mode.
% ./bin/asrestore --validate -d test-backup-dir
2023-07-19 20:48:42 UTC [INF] [94673] Starting validation of test-backup-dir
2023-07-19 20:48:42 UTC [INF] [94673] Found 1 backup file(s) in test-backup-dir
2023-07-19 20:48:42 UTC [INF] [94673] Validating backup files
2023-07-19 20:48:42 UTC [INF] [94673] Finished validating backup file(s)
2023-07-19 20:48:42 UTC [INF] [94675] validating test-backup-dir/test_00000.asb
2023-07-19 20:48:42 UTC [INF] [94675] Opened backup file test-backup-dir/test_00000.asb
2023-07-19 20:48:42 UTC [ERR] [94675] Invalid key type character "2" in block (line 4, col 5)
2023-07-19 20:48:42 UTC [ERR] [94675] Error while parsing record
2023-07-19 20:48:42 UTC [ERR] [94675] Error while parsing backup file test-backup-dir/test_00000.asb (line 4)
2023-07-19 20:48:42 UTC [INF] [94674] 0 UDF file(s), 0 secondary index(es), 0 record(s) (0 rec/s, 11 KiB/s, 0 B/rec, retries: 0)
2023-07-19 20:48:42 UTC [INF] [94674] Expired 0 : skipped 0 : err_ignored 0 : inserted 0: failed 0 (existed 0 , fresher 0)
2023-07-19 20:48:42 UTC [INF] [94674] 27% complete, ~0s remaining
If you would like to change the target set to which to restore data, contact Aerospike Support for assistance.
Resource usageโ
See asbackup
and asrestore
resource usage for more information about resources required for asrestore
.
Restoring from backup
This section describes the most essential restore commands and some common variations.
Prerequisites and notes for restoring from backupโ
asrestore
can restore only backups from Aerospike server and tools version 3.0 or later. To restore a backup from earlier releases, contact Aerospike Support.
The TTL of restored keys is preserved, but the last-update-time and generation count are reset to the current time.
asrestore
command basics and useful variationsโ
The following example shows the basic syntax of asrestore
:
asrestore --host HOST --directory DIRECTORY
--host HOST
specifies the cluster node's IP address or hostnames to be restored.--directory DIRECTORY
is the name of the directory containing the backup files.
Restoring from a single backup fileโ
If you backed up to a single file, use the following syntax to restore from it:
asrestore --host HOST --input-file FILENAME
Restoring to a different namespaceโ
By default, data is restored to its original namespace. Use the --namespace
option to restore to a different namespace. You must specify the comma-separated old and new namespace names:
asrestore --host HOST --directory DIRECTORY --namespace OLD-NAMESPACE,NEW-NAMESPACE
Write policy for duplicate key IDsโ
The target namespace might already contain keys with the same IDs as the backup you are restoring. The logic of the write policy for managing existing keys is as follows:
- If the record from the backup is expired, based on its TTL value, the backup record is ignored.
- If the record does not exist in the namespace, the backup record is added to the namespace.
- If an older version of the record (that is, with a lower generation count) already exists in the namespace, the backup record is restored. If you want
asrestore
to ignore this condition, specify this option:
--unique
:asrestore
does not touch any existing records, regardless of generation counts.
- If a newer version of the record (that is, with a higher or same generation count) already exists in the namespace, the backup record is ignored. If you want
asrestore
to ignore this condition, specify this option:
--no-generation
:asrestore
overwrites any existing records, regardless of generation count.
- If the record in the namespace contains bins that are not present in the backup, those bins in the namespace are preserved. If you want
asrestore
to ignore this condition, specify this option:
--replace
: When restoring a record from the backup,asrestore
does not preserve namespace bins that are not present in the backup.
Reading from stdin, piping, and uncompressingโ
Instead of --input-file
or --directory
, use -
with standard Unix pipes to read the backup data from stdin.
The following three usage examples uncompress a gzip file and then pipe the data to asrestore
with the -
option to read from stdin:
gunzip -c BACKUP-FILE.GZ | asrestore --host HOST -i -
zcat BACKUP-FILE.GZ | asrestore --host HOST -i -
cat BACKUP-FILE.GZ | gzip -d | asrestore --host HOST -i -
This example concatenates a single uncompressed backup file, and pipes the data to asrestore
with the dash,-
, option:
cat BACKUP-FILE | asrestore --host HOST -i -
Other asrestore
options and command-line helpโ
asrestore
includes options that you may find useful for the following tasks:
- Restoring to specific nodes or connecting to a port other than the default 3000.
- Securing connections using username/password or TLS certificates or both.
- Restoring specific bins or sets.
- Using configuration files to help automate restores.
For more information, run asrestore --usage
, or see these asrestore command-line options.
Transaction retriesโ
- Failed Record Uploads: If a transaction fails, it is retried according to
--max-retries
and--retry-scale-factor
. By default these are 5 and 150ms respectively. An exponential backoff strategy is followed where the delay isretry-scale-factor * 2 ** (retry_attempts - 1)
, or 0 on the first try. If--max-retries
is exceeded the transaction is counted as a failure in the info level log output. Note:--retry-delay
and--sleep-between-retries
are deprecated in favor of--retry-scale-factor
.
Possible error or informational messages from asrestore
โ
- Record exists: When the
--unique
option is used, this informational message is displayed. - Generation mismatch: The backup copy and existing copy of a key do not match, and so the key is not restored. You can override this behavior with the
--no-generation
option. - Invalid username or password: The wrong username or password was specified on the command line.
Restore to any clusterโ
Backup and restore are cluster-configuration-agnostic. A backup can be restored to a cluster of any size and configuration. Restored data is evenly distributed among cluster nodes, regardless of cluster configuration.