Skip to main content
Loading
Version: Graph 2.3.0

Bulk data loading

There are several methods for load graph data sets into an Aerospike Graph.

  • For relatively small amounts of data (fewer than 1,000 vertices and edges), you can use Gremlin addV and addE steps.

  • For small datasets (less than 25 GB), you can use the Standalone Bulk Loader. This method is optimized for simplicity of use and is useful for testing and experimentation.

  • For large datasets, use the Distributed Bulk Loader. This method requires an Apache Spark cluster (such as with AWS EMR or Google Dataproc) for distributed processing.

FeatureStandalone modeDistributed mode
Local and remote data loading: The bulk loader handles both local and cloud-based data files. For local data, files must be accessible within the AGS Docker image. For data in cloud storage, AWS S3 and Google Cloud Storage are supported.โœ…โœ…
Gremlin CSV format: Support for a CSV format to structure your graph data.โœ…โœ…
Gremlin integration: The bulk loading process is integrated with Gremlin using a call step, enabling bulk loading commands to be entered directly in the Gremlin console. This includes setting paths for data files and handling configuration settings, such as timeout durations to prevent operation timeouts during long-running jobsโ€‹.โœ…
Error handling: Robust error handling capabilities to help manage common issues seen in datasets in the data preparation process.โœ…โœ…
Recoverability: In case of mid-job failure, jobs can be restarted at the point of failure.โœ…
Incremental loading: You can incrementally load data into an existing graph to modify the graph in bulk. You can seamlessly integrate new vertices and edges or modify existing ones without requiring a new data load of CSV files into an existing graph.โœ…โœ