Validation tool (asvalidation)
This page describes Aerospike’s validation tool (asvalidation
) which scans all records in a namespace and validates bins with Collection Data Type (CDT) values (List and Map bins). Optionally, it attempts to repair any damage detected. Records with unrecoverable CDT errors are written to output if an output file is specified. Records without CDTs or detected errors are ignored.
Download asvalidation
Download asvalidation
, or visit its repo aerospike/aerospike-tools-validation
.
Modes of operation
You can run asvalidation
in validation mode or fix mode.
Aerospike recommends first running in validation mode, limited by --max-records
, to see the kinds of errors it discovers. Afterwards, run it in fix mode.
- Validation mode discovers problems and produces a report. This is the default mode.
- Fix mode attempts to correct discovered problems where possible. It is triggered by the
--cdt-fix-ordered-list-unique
option. Fix counts should report zero in the summary printout.
Run in validation mode
In the following example, myNamespace
is checked and output is written to asvalidationOutput.txt
.
asvalidation
was run in validation mode, because the--cdt-fix-ordered-list-unique
option was not specified.- It found 100 lists:
100 Lists
. - It found no Maps:
0 Maps
. - 10 of the lists were corrupted:
10 Need Fix
underLists
. - The reason for corruption is shown as out of order:
10 Order
. - Because it was run in validation mode, no fixes were applied:
0 Fixed
.
> asvalidation -n myNamespace -o asvalidationOutput.txt...2024-05-01 22:12:28 GMT [INF] [24662] Found 10 invalid record(s) from 1 node(s), 2620 byte(s) in total (~262 B/rec)2024-05-01 22:12:28 GMT [INF] [24662] CDT Mode: validate2024-05-01 22:12:28 GMT [INF] [24662] 100 Lists2024-05-01 22:12:28 GMT [INF] [24662] 0 Unfixable2024-05-01 22:12:28 GMT [INF] [24662] 0 Has non-storage2024-05-01 22:12:28 GMT [INF] [24662] 0 Corrupted2024-05-01 22:12:28 GMT [INF] [24662] 0 Invalid Keys2024-05-01 22:12:28 GMT [INF] [24662] 10 Need Fix2024-05-01 22:12:28 GMT [INF] [24662] 0 Fixed2024-05-01 22:12:28 GMT [INF] [24662] 0 Fix failed2024-05-01 22:12:28 GMT [INF] [24662] 10 Order2024-05-01 22:12:28 GMT [INF] [24662] 0 Padding2024-05-01 22:12:28 GMT [INF] [24662] 0 Maps2024-05-01 22:12:28 GMT [INF] [24662] 0 Unfixable2024-05-01 22:12:28 GMT [INF] [24662] 0 Has duplicate keys2024-05-01 22:12:28 GMT [INF] [24662] 0 Has non-storage2024-05-01 22:12:28 GMT [INF] [24662] 0 Corrupted2024-05-01 22:12:28 GMT [INF] [24662] 0 Invalid Keys2024-05-01 22:12:28 GMT [INF] [24662] 0 Need Fix2024-05-01 22:12:28 GMT [INF] [24662] 0 Fixed2024-05-01 22:12:28 GMT [INF] [24662] 0 Fix failed2024-05-01 22:12:28 GMT [INF] [24662] 0 Order2024-05-01 22:12:28 GMT [INF] [24662] 0 Padding
Run in fix mode
In the following example, myNamespace
is checked and output is written to asvalidationOutput.txt
.
-
Fixing is triggered by the
--cdt-fix-ordered-list-unique
option. -
Fixes are applied to the server. Failed fixes can be due to (but not limited to) an unsupported server version.
asvalidation -n myNamespace -o asvalidationOutput.txt -r --cdt-fix-ordered-list-uniquevalidation of 127.0.0.1 (namespace: myNamespace, set: [all], bins: [all], after: [none], before: [none]) to asvalidationOutput.txt2024-05-01 22:08:25 GMT [INF] [10999] [src/main/aerospike/as_cluster.c:132][as_cluster_add_nodes_copy] Add node BB909000027000A 127.0.0.1:30002024-05-01 22:08:25 GMT [INF] [10999] Processing 1 node(s)2024-05-01 22:08:25 GMT [INF] [10999] Node ID Objects Replication2024-05-01 22:08:25 GMT [INF] [10999] BB909000027000A 130 12024-05-01 22:08:25 GMT [INF] [10999] Namespace contains 130 record(s)2024-05-01 22:08:25 GMT [INF] [10999] Created new output file temp.txt2024-05-01 22:08:25 GMT [INF] [11018] Starting validation for node BB909000027000A2024-05-01 22:08:25 GMT [INF] [11018] Completed validation for node BB909000027000A, records: 30, size: 9200 (~306 B/rec)2024-05-01 22:08:26 GMT [INF] [11017] 23% complete (~8 KiB/s, ~30 rec/s, ~306 B/rec)2024-05-01 22:08:26 GMT [INF] [11017] ~3s remaining2024-05-01 22:08:26 GMT [INF] [11017] Found 30 invalid record(s) from 1 node(s), 9200 byte(s) in total (~306 B/rec)2024-05-01 22:08:26 GMT [INF] [11017] CDT Mode: fix2024-05-01 22:08:26 GMT [INF] [11017] 110 Lists2024-05-01 22:08:26 GMT [INF] [11017] 0 Unfixable2024-05-01 22:08:26 GMT [INF] [11017] 0 Has non-storage2024-05-01 22:08:26 GMT [INF] [11017] 0 Corrupted2024-05-01 22:08:26 GMT [INF] [11017] 0 Invalid Keys2024-05-01 22:08:26 GMT [INF] [11017] 10 Need Fix2024-05-01 22:08:26 GMT [INF] [11017] 10 Fixed2024-05-01 22:08:26 GMT [INF] [11017] 0 Fix failed2024-05-01 22:08:26 GMT [INF] [11017] 10 Order2024-05-01 22:08:26 GMT [INF] [11017] 0 Padding2024-05-01 22:08:26 GMT [INF] [11017] 20 Maps2024-05-01 22:08:26 GMT [INF] [11017] 10 Unfixable2024-05-01 22:08:26 GMT [INF] [11017] 10 Has duplicate keys2024-05-01 22:08:26 GMT [INF] [11017] 0 Has non-storage2024-05-01 22:08:26 GMT [INF] [11017] 0 Corrupted2024-05-01 22:08:26 GMT [INF] [11017] 0 Invalid Keys2024-05-01 22:08:26 GMT [INF] [11017] 10 Need Fix2024-05-01 22:08:26 GMT [INF] [11017] 0 Fixed2024-05-01 22:08:26 GMT [INF] [11017] 0 Fix failed2024-05-01 22:08:26 GMT [INF] [11017] 10 Order2024-05-01 22:08:26 GMT [INF] [11017] 0 Padding
Minimal options for asvalidation
Following is the minimal set of asvalidation
options.
Option | Description |
---|---|
--cdt-fix-ordered-list-unique | Fix lists where elements were not stored in order and remove duplicate elements. Without this option, the tool only validates, not fixes. |
--no-cdt-check-map-keys | Do not check CDT Map keys. |
-o | Output file name for validation report. |
-d | Output directory. |
--help | Get a comprehensive list of options for tool. |
Namespace data selection options
Option | Default | Description |
---|---|---|
-n NAMESPACE or --namespace NAMESPACE | - | Mandatory. Namespace to validate. |
-s SETS or --set SETS | All sets | Set(s) to validate. May pass in a comma-separated list of sets to validate. |
-B BIN1,BIN2,... or --bin-list BIN1,BIN2,... | All bins | Bins to validate. |
-M or --max-records N | 0 = all records. | Approximate limit for the number of records to process. |
Partition scanning validation options
Scan a list of partition filters. Partition filters can be ranges, individual partitions, or records after a specific digest within a single partition.
This option is mutually exclusive with --node-list
.
Option | Default |
---|---|
-X, --partition-list LIST | not used |
Default partitions to scan: 0 to 4095 (all partitions)
LIST
format: FILTER1,FILTER2,…FILTER
format: BEGIN-PARTITION -PARTITION-COUNT|DIGESTBEGIN-PARTITION
: 0 to 4095.- Either the optional
PARTITION-COUNT
: 1 to 4096. Default: 1 - Or the optional
DIGEST
: Base64-encoded string of desired digest to start at in specified partition.
Example | Description |
---|---|
-X 361 | Validate only partition 361. |
-X 361-10 | Validate 10 partitions, starting with 361 and including 370. |
-X VSmeSvxNRqr46NbOqiy9gy5LTIc= | Validate all records after the digest VSmeSvxNRqr46NbOqiy9gy5LTIc= in its partition. |
-X 0-1000,2222,EjRWeJq83vEjRRI0VniavN7xI0U= | Validate partitions 0 to 999 (1000 partitions starting from 0), partition 2222, and all records after the digest EjRWeJq83vEjRRI0VniavN7xI0U= in its partition. |
Connection options
Option | Default | Description |
---|---|---|
-h HOST1:TLSNAME1:PORT1,... or --host HOST1:TLSNAME1:PORT1,... | 127.0.0.1 | The host that acts as the entry point to the cluster. Any nodes in the cluster can be specified. The remaining nodes are discovered automatically. |
-p PORT or --port PORT | 3000 | Port to connect to. |
-U USER or --user USER | - | User name with read permission. Mandatory if the server has security enabled. |
-P PASSWORD or--password | - | Password to authenticate the given user. The first form passes the password on the command line. The second form prompts for the password. |
-A or --auth | INTERNAL | Set authentication mode when user and password are defined. Modes are (INTERNAL, EXTERNAL, EXTERNAL_INSECURE, PKI) and the default is INTERNAL. This mode must be set EXTERNAL when using LDAP. |
--parallel N | 1 | Maximum number of scans to run in parallel. If only one partition range is given, or the entire namespace is being validated, the range of partitions is evenly divided by this number to be processed in parallel. Otherwise, each filter cannot be parallelized individually, so you may only achieve as much parallelism as there are partition filters. |
-l HOST1:[TLSNAME1:]PORT1,... or --node-list HOST1:[TLSNAME1:]PORT1,... | - | Validate the given cluster nodes only. Mutually exclusive with --partition-list . |
--tls-enable | disabled | Indicates a TLS connection should be used. |
-S or --services-alternate | false | Set to true to connect to Aerospike node’s alternate-access-address. |
--prefer-racks RACKID1,... | disabled | A comma separated list of rack IDs to prefer when reading records. This is useful for limiting cross datacenter network traffic. |
Output options
Option | Default | Description |
---|---|---|
-d PATH or --directory PATH | - | Directory to store the .asb validation files. If the directory does not exist, it is created before use. Mandatory, unless --output-file or --estimate is given. |
-o PATH or --output-file PATH | - | The single file to write to. - means stdout . Mandatory, unless --directory or --estimate is given. |
-q DESIRED-PREFIX or --output-file-prefix DESIRED-PREFIX | - | Must be used with the --directory option. A desired prefix for all output files. |
-F LIMIT or --file-limit LIMIT | 250 MB | File size limit (in MiB) for --directory . If a .asb validation file crosses this size threshold, asvalidation switches to a new file. |
-r or --remove-files | - | Clear directory or remove output file. By default, asvalidation refuses to write to a non-empty directory or to overwrite an existing validation file. This option clears the given --directory or removes an existing --output-file . Mutually exclusive to --continue . |
--remove-artifacts | - | Clear directory or remove output file, like --remove-files , without running a validation. This option is mutually exclusive to --continue and --estimate . |
-N BANDWIDTH or --nice BANDWIDTH | - | Throttles asvalidation ’s write operations to the validation file(s) to not exceed the given bandwidth in MiB/s. Effectively also throttles the scan on the server side as asvalidation refuses to accept more data than it can write. |
Other options
Option | Default | Description |
---|---|---|
-v or --verbose | disabled | Output considerably more information about the running validation. |
-m or --machine PATH | - | Output machine-readable status updates to the given path, typically a FIFO. |
-L or --records-per-second RPS | 0 | Limit total returned records per second (RPS). If RPS is zero, a records-per-second limit is not applied. |
-V or --version | - | Print ASVALIDATION version information. |
-C or --compact | disabled | Do not apply base-64 encoding to BLOBs; results in smaller output files. |
Configuration file options
Configurations are read from the following files in the given order:
- /etc/aerospike/astools.conf
- ~/.aerospike/astools.conf
Option | Default | Description |
---|---|---|
-no-config-file | disabled | Do not read any config file. |
--instance NAME | - | Section with this instance is read. In case instance a is specified, sections cluster_a , asvalidation_a is read. |
--config-file PATH | - | Read this file after default configuration file. |
--only-config-file PATH | - | Read only this configuration file. |
Error descriptions
Reason | Description | Disposition |
---|---|---|
Has non-storage | The bin contains an infinite or wildcard element which is not allowed as storage. | Requires manual intervention to fix. |
Has duplicate keys | A map bin has duplicate key entries. | Unfixable without manual intervention. |
Corrupted | A problem not attributable to any other error categories. | Requires manual intervention to fix. |
Invalid Keys | The bin has a Map with at least one invalid key. | Requires manual intervention to fix. |
Order | The bin has elements out of order. | Can be fixed by reordering the list with the --cdt-fix-ordered-list-unique option. |
Padding | The bin has garbage bytes after the valid List or Map. | Can be fixed by truncating the extra bytes. |