Skip to main content
Loading

Aerospike Loader (asloader)

Aerospike Loader (asloader) migrates data from another database to Aerospike. You provide .DSV data files and an Aerospike schema file, or config file, in JSON format. asloader parses the .DSV files and loads the data into the Aerospike cluster according to your schema.

Prerequisitesโ€‹

  • Java 1.8 or later
  • Maven 3.0 or later

Installationโ€‹

asloader is available:

git clone https://github.com/aerospike/aerospike-loader.git
cd aerospike-loader
./build

For releases prior to Aerospike Tools 6.2, asloader is bundled as part of the Aerospike Tools package.

Dependenciesโ€‹

The following dependencies are downloaded automatically:

  • Aerospike Java client 6.1.6 or later
  • Apache Commons CLI 1.2
  • Log4j 2.17.1
  • Junit 4.4
  • Json-simple 1.1.1

Loader thread architectureโ€‹

The loader uses reader and writer threads.

  • Reader threads read data files. The number of reader threads is equal to either the number of CPUs or the number of files in the directory, whichever is fewer.
  • Writer threads write to the cluster. The number of writer threads is equal to the number of CPUs multiplied by a scale factor of 5.

Usageโ€‹

If you downloaded the jar file from the releases page, use

java -cp aerospike-load-*-jar-with-dependencies.jar com.aerospike.load.AerospikeLoad <options> <data file name(s)/directory>

If you downloaded the source, use the run_loader script in the root directory of the source folder. Pass the options and data files to the script as options. See options for more details.

run_loader <options> <data file name(s)/directory>

The data file name or directory can either be space-delimited files or a directory name containing data files.

Data Filesโ€‹

Sample data file:โ€‹

user_location##user_id##last_visited##community
India##1##08/16/2011##facebook
India##2##08/17/2011##Twitter
USA##3##08/16/2011##Twitter

This example contains a header row and uses the delimiter ##. The config file describes the data file structure for asloader to interpret. See Examples for sample data and config files with various attributes.

Supported Data Types:โ€‹

Data TypeDescriptionExample
Integer123456
Float0.345
String"Aerospike"
BlobBinary fields that are hex encoded are stored as blobs.Hex encoded "abc" as 616263.
TimestampTimestamp data stored as a string or integer."1-1-1970" or -19800 (seconds referenced to UTC)
JSONAny standard JSON file. Lists and maps are interpreted as JSON.List: ["a", "b", ["c", "d"]], Map: {"a": "b", "c": {"d", "e"}}
GeoJSONAerospike supports the GeoJSON datatype natively. It can be stored in its standard format.{"type": "Point", "coordinates": [123.4, -456.7]}
note

Data files that contain any JSON data should not use these JSON-specific characters '}', ']', ',', ':'... as delimiters. Data inside double quotes " " is not interpreted as containing possible delimiters. DSV is supported, so you can use any delimiter.

note

Timestamp data should be formatted consistently and always appear in double quotes. For best practices in timestamp formatting, see Oracle SimpleDateFormat.

Sample config and data fileโ€‹

The example directory in the GitHub repository contains a sample config file alldatatype.json and data file alldatatype.dsv. Run the following command to load the configuration and data:

run_loader -h localhost -c example/alldatatype.json example/alldatatype.dsv

For information about additional data file structures, see Examples.

Command line optionsโ€‹

OptionsDescriptionDefault
-h <hosts>List of seed hosts where Aerospike servers are running.127.0.0.1
-p <port>Port to use with the host specified in the -h option.3000
-U <user>Username.
-P <password>Password.
-n <namespace>Namespace to load data into.test
-c <config>JSON-formatted configuration file specifying parsing attributes and schema mapping.
-g <max-throughput>Maximum target transactions-per-second for the loader.0 (no throttling)
-T <transaction-timeout>(In milliseconds) Timeout for a transaction during write operation.0 (no timeout)
-e <expiration-time>Expiration time of records in seconds. Other valid values:
-1 for records to never expire
0 to use the server default
-1
-tz <timezone>Time zone of data backup source. Used when loading data of timestamp datatype. For example, if data backup location timezone is X, and that data is destined for a server in Y timezone, then specify X's timezone. Valid values are standard three-letter codes such as PST or EST.local timezone
-ec <abort-error-count>Error threshold to determine when the loader should stop loading data. 0 ignores the threshold.0
-wa <write-action>Possible values:
1) UPDATE - Create or update records. Merge incoming bin values with existing values.
2) UPDATE_ONLY - Update existing records. Fail if record does not exist. Merge incoming bin values with existing values.
3) REPLACE - Create or replace existing records.
4) REPLACE_ONLY - Replace existing records. Fail if record does not exist.
5) CREATE_ONLY - Create new records. Fail if record already exists.
UPDATE
-tls <tls-enable>Use TLS/SSL sockets.False
-tlsLoginOnlyUse TLS/SSL sockets on node login only.False
-tp <tls-protocols>Allow TLS protocols. Values: TLSv1,TLSv1.1,TLSv1.2 separated by comma.TLSv1.2
-authAuthentication mode, which can be set to INTERNAL, EXTERNAL, EXTERNAL_INSECURE, or PKI. These options correspond to the Aerospike Java client authentication modes.
-tlsCiphers <tls-cipher-suite>Allow TLS cipher suites. Values: cipher names defined by JVM separated by comma.null (default cipher list provided by JVM)
-tr <tls-revoke>Revoke certificates identified by their serial number. Values: serial numbers separated by comma.null (Do not revoke certificates)
-um <unorderedMaps>If this flag is present, write all maps as unordered maps.
-uk <send-user-key>Send user defined key in addition to hash digest to store on the server. (default: userKey is not sent to reduce meta-data overhead).
-vVerbose mode. If this option is specified, verbose mode is enabled and additional information is displayed on the console.DISABLED
-uDisplay command usage.
-VPrint the asloader version.

Exampleโ€‹

run_loader -h nodex -p 3000 -n test -T 3000 -e 2592000 -ec 100 -tz PST -wa update -c ~/pathto/config.json datafiles/
Server IP:                                  nodex (-h)
Port: 3000 (-p)
Namespace: test (-n)
Write Operation Timeout (in milliseconds): 3000 (-T)
Write Error Threshold: 100 (-ec)
Record Expiration: 2592000 (-e)
Timezone: PST (-tz)
Write Action: update (-wa)
Data Mapping: ~/pathto/config.json (-c)
Data Files: datafiles/