Skip to main content
Loading

Data validation (asvalidation)

The asvalidation tool checks the validity of collection data type (CDT) bins in a namespace. It also has an option to fix some of the problems it discovers.

Recommendations and server versionsโ€‹

The Aerospike CDT Validation Tool addresses the following types of CDT issues, which require certain Aerospike Database versions to detect or correct.

Download asvalidationโ€‹

Download asvalidation, or visit its repo aerospike/aerospike-tools-validation .

Descriptions of possible corruption reasonsโ€‹

ReasonDescriptionDisposition
Has non-storageThe bin contains an infinite or wildcard element which is not allowed as storage.This type of error is unfixable without your manual intervention.
Has duplicate keysA map bin has duplicate key entries.This type of error is unfixable without your manual intervention.
CorruptedA problem not attributable to any of the other categories of errors.This type of error is unfixable without your manual intervention.
Invalid KeysThe bin has a map with at least one invalid key.This type of error is unfixable without your manual intervention.
OrderThe bin has elements out of order.Can be fixed by reordering the list with the --cdt-fix-ordered-list-unique option.
PaddingThe bin has garbage bytes after the valid list or map.Can be fixed by truncating the extra bytes.

asvalidation Modesโ€‹

asvalidation can be run in the following modes. Records without CDTs or detected errors are ignored. Records with detected errors are backed up unless otherwise specified. By default, no fixes are applied.

  • "Validation" mode discovers problems and produces a report.
  • "Fix" mode, triggered by the --cdt-fix-ordered-list-unique option, attempts to correct discovered problems where possible.

You should first run asvalidation in validation mode, limited by partition or --max-records, to see the kinds of errors it discovers before running it in fix mode to fix them.

Basic options for asvalidationโ€‹

This is a minimal set of asvalidation options.

OptionDescription
--cdt-fix-ordered-list-uniqueFix lists whose elements were not stored in order and remove duplicate elements. Without this option, the tool only validates, not fixes.
--no-cdt-check-map-keysDo not check cdt map keys.
-oOutput file name for validation report.
-dOutput directory.
--helpGet a comprehensive list of options for tool.

Namespace data selection optionsโ€‹

OptionDefaultDescription
-n NAMESPACE or --namespace NAMESPACE-Namespace to validate. Mandatory.
-s SETS or --set SETSAll setsThe set(s) to validate. May pass in a comma-separated list of sets to validate.
-B BIN1,BIN2,... or --bin-list BIN1,BIN2,...All binsThe bins to validate.
-M or --max-records N0 = all records.An approximate limit for the number of records to process. Available in Database 4.9 and later. Note: this option is mutually exclusive to --partition-list and --after-digest.
-X, or --partition-list PARTITIONID0 = all records.Scan a specific partition number 1-4096, or list of partition IDs.

Running asvalidation in validation modeโ€‹

Example of asvalidation in validation mode

In the following example, myNamespace is checked and its output stored in the file asvalidationOutput.txt.

Notice CDT Mode: validate.

> asvalidation -n myNamespace -o  asvalidationOutput.txt
...
2024-05-01 22:12:28 GMT [INF] [24662] Found 10 invalid record(s) from 1 node(s), 2620 byte(s) in total (~262 B/rec)
2024-05-01 22:12:28 GMT [INF] [24662] CDT Mode: validate
2024-05-01 22:12:28 GMT [INF] [24662] 100 Lists
2024-05-01 22:12:28 GMT [INF] [24662] 0 Unfixable
2024-05-01 22:12:28 GMT [INF] [24662] 0 Has non-storage
2024-05-01 22:12:28 GMT [INF] [24662] 0 Corrupted
2024-05-01 22:12:28 GMT [INF] [24662] 0 Invalid Keys
2024-05-01 22:12:28 GMT [INF] [24662] 10 Need Fix
2024-05-01 22:12:28 GMT [INF] [24662] 0 Fixed
2024-05-01 22:12:28 GMT [INF] [24662] 0 Fix failed
2024-05-01 22:12:28 GMT [INF] [24662] 10 Order
2024-05-01 22:12:28 GMT [INF] [24662] 0 Padding
2024-05-01 22:12:28 GMT [INF] [24662] 0 Maps
2024-05-01 22:12:28 GMT [INF] [24662] 0 Unfixable
2024-05-01 22:12:28 GMT [INF] [24662] 0 Has duplicate keys
2024-05-01 22:12:28 GMT [INF] [24662] 0 Has non-storage
2024-05-01 22:12:28 GMT [INF] [24662] 0 Corrupted
2024-05-01 22:12:28 GMT [INF] [24662] 0 Invalid Keys
2024-05-01 22:12:28 GMT [INF] [24662] 0 Need Fix
2024-05-01 22:12:28 GMT [INF] [24662] 0 Fixed
2024-05-01 22:12:28 GMT [INF] [24662] 0 Fix failed
2024-05-01 22:12:28 GMT [INF] [24662] 0 Order
2024-05-01 22:12:28 GMT [INF] [24662] 0 Padding

Interpreting the validation mode reportโ€‹

In the example above:

  • asvalidation was run in validation mode, because the --cdt-fix-ordered-list-unique option was not specified.
  • It found 100 lists: 100 Lists.
  • It found no maps: 0 Maps.
  • 10 of the lists were corrupted: 10 Need Fix under Lists.
  • The reason for corruption is shown as out of order: 10 Order.
  • Because it was run in validation mode, no fixes were applied: 0 Fixed.

Numbers under a heading do not necessarily add up to the count of the line. For example, there could be 1 Need Fix record with both an Order and Padding error.

Running asvalidation in fix modeโ€‹

Fixing is triggered by the --cdt-fix-ordered-list-unique option.

Fixes are applied to the server. Failed fixes can be due to (but not limited to) an unsupported server version.

Example of asvalidation fix mode

In the following example, myNamespace is checked and its output stored in the file asvalidationOutput.txt.

Notice CDT Mode: fix.

asvalidation -n test -o temp.txt -r --cdt-fix-ordered-list-unique
validation of 127.0.0.1 (namespace: test, set: [all], bins: [all], after: [none], before: [none]) to temp.txt
2024-05-01 22:08:25 GMT [INF] [10999] [src/main/aerospike/as_cluster.c:132][as_cluster_add_nodes_copy] Add node BB909000027000A 127.0.0.1:3000
2024-05-01 22:08:25 GMT [INF] [10999] Processing 1 node(s)
2024-05-01 22:08:25 GMT [INF] [10999] Node ID Objects Replication
2024-05-01 22:08:25 GMT [INF] [10999] BB909000027000A 130 1
2024-05-01 22:08:25 GMT [INF] [10999] Namespace contains 130 record(s)
2024-05-01 22:08:25 GMT [INF] [10999] Created new output file temp.txt
2024-05-01 22:08:25 GMT [INF] [11018] Starting validation for node BB909000027000A
2024-05-01 22:08:25 GMT [INF] [11018] Completed validation for node BB909000027000A, records: 30, size: 9200 (~306 B/rec)
2024-05-01 22:08:26 GMT [INF] [11017] 23% complete (~8 KiB/s, ~30 rec/s, ~306 B/rec)
2024-05-01 22:08:26 GMT [INF] [11017] ~3s remaining
2024-05-01 22:08:26 GMT [INF] [11017] Found 30 invalid record(s) from 1 node(s), 9200 byte(s) in total (~306 B/rec)
2024-05-01 22:08:26 GMT [INF] [11017] CDT Mode: fix
2024-05-01 22:08:26 GMT [INF] [11017] 110 Lists
2024-05-01 22:08:26 GMT [INF] [11017] 0 Unfixable
2024-05-01 22:08:26 GMT [INF] [11017] 0 Has non-storage
2024-05-01 22:08:26 GMT [INF] [11017] 0 Corrupted
2024-05-01 22:08:26 GMT [INF] [11017] 0 Invalid Keys
2024-05-01 22:08:26 GMT [INF] [11017] 10 Need Fix
2024-05-01 22:08:26 GMT [INF] [11017] 10 Fixed
2024-05-01 22:08:26 GMT [INF] [11017] 0 Fix failed
2024-05-01 22:08:26 GMT [INF] [11017] 10 Order
2024-05-01 22:08:26 GMT [INF] [11017] 0 Padding
2024-05-01 22:08:26 GMT [INF] [11017] 20 Maps
2024-05-01 22:08:26 GMT [INF] [11017] 10 Unfixable
2024-05-01 22:08:26 GMT [INF] [11017] 10 Has duplicate keys
2024-05-01 22:08:26 GMT [INF] [11017] 0 Has non-storage
2024-05-01 22:08:26 GMT [INF] [11017] 0 Corrupted
2024-05-01 22:08:26 GMT [INF] [11017] 0 Invalid Keys
2024-05-01 22:08:26 GMT [INF] [11017] 10 Need Fix
2024-05-01 22:08:26 GMT [INF] [11017] 0 Fixed
2024-05-01 22:08:26 GMT [INF] [11017] 0 Fix failed
2024-05-01 22:08:26 GMT [INF] [11017] 10 Order
2024-05-01 22:08:26 GMT [INF] [11017] 0 Padding