This tutorial describes the secondary index and query functionality pertaining to CDTs available in Aerospike Database 6.1+.
This notebook requires the Aerospike Database running locally with Java kernel and Aerospike Java Client. To create a Docker container that satisfies the requirements and holds a copy of Aerospike notebooks, visit the Aerospike Notebooks Repo.
Introduction
In this notebook, we will illustrate the CDT indexing and query capabilities in Aerospike Database 6.1+. The detailed description is available in the accompanying blog post Query JSON Documents Faster (and More) with CDT Indexing.
The specific topics covered in this notebook include:
New CDT indexing functionality
Code examples
Non-collection elements
Collection elements
JSON documents
Prerequisites
This tutorial assumes familiarity with the following topics:
Define constants for the namespaces test, set cdt-indexing, and helper functions createIndex, dropIndex, executeQueryAndPrintResults, and truncateTestData.
StringNamespace="test";
StringSet="cdt-indexing";
// convenience function to create an index - essentially a pass-through to the client API
You can execute shell commands including Aerospike tools like aql to examine data in the terminal tab. Open a terminal tab by selecting File->Open from the notebook menu, and then New->Terminal.
We will illustrate how to create indexes on CDTs and issue queries using two CDTs - a nested List and a nested Map that are populated in 10 records each.
A nested List in records with user_key (id) 1-10. The list is held in bin list_bin and has the following numeric, string, and List elements.
Numeric elements: user_key to user_key+4.
For example, for user_key=1: 1, 2, 3, 4, 5.
String elements: “s”+user_key to “s”+(user_key+4).
For example, for user_key=1: “s1”, “s2”, “s3”, “s4”, “s5”
Nested List element: holds numeric elements 10*user_key+1 to 10*user_key+5.
For example, for user_key=1: [11, 12, 13, 44, 15].
So, the value in list_bin for the record with user_key=1 looks like:
A nested Map in records with user_key (id) 101-110. The map is held in bin map_bin and has the following numeric, string, and List elements. Below, we have used i with a shorthand for i=user_key-100.
“oid”: i.
For example, for i=1: “oid”: 1.
“obj”: a nested map with the following key-value pairs: {"attr1": 10\*i+1, "attr2": "s"+10\*i+2, "subobj": {"attr3": 100\*i+1, "attr4": 100\*i+2}}.
For example, for i=1: “obj”: {"attr1": 11, "attr2": "s12", "subobj": {"attr3": 101, "attr4": 102}}.
i: [“s”+i, “s”+i+1].
For example, for i=1: 1: [“s1”, “s2”].
So, the value in map_bin for the record with user_key=101 (or, equivalently, i=1) looks like:
View the test data by executing the following commands in a Jupyter terminal tab.
aql -c "select list_bin from test.cdt-indexing"
aql -c "select map_bin from test.cdt-indexing"
We will illustrate how indexes on non-collection elements are created and used. We will start with results in mind, and then create the appropriate index, issue a query using the index, and show the results.
Equality queries
Get records with a specific integer or string value at a specific index, rank, or key position of a List or a Map.
Records with a specific value at index X of a List or Map
The top level List in list_bin has 3 at position 2 (0-indexed)
// Create an index for element at position 2 of the top List in list_bin.
Get records having an integer value within a range at a specific index, rank, or key position of a List or a Map. Range queries are supported on integer values only.
We will use the indexes defined above for the range queries too.
Records with value in a specific range at index X of a List or Map
The top level List in list_bin is in range 3-5 at position 2 (0-indexed)
// issue the query and print results
System.out.println("Records with value in range 3-5 at position 2 in the top level List in list_bin:");
Collection indexes are defined on List and Map elements with the collection type LIST (list values in the List), MAPKEYS (key values in the Map), or MAPVALUES (values in the Map). The indexes can be used for equality queries on integr and string elements, as well as range queries on integer elements. We will illustrate these variations below.
Equality Queries
Get records with a specific integer or string value in a List or Map.
Records with a List or Map containing a specific value.
The nested List in map_bin contains “s91”.
// Create an index for the nested List in map_bin. Note, in key-ordered sequence, the numeric key is the first one.
We will load JSON documents from the file “nobel_prizes.json”, and store each prize entry in the database in a separate record with user_key starting at 1000 in bin “json_bin”. You can view the file’s contents by opening it from Jupyter’s file browser.
Created index idx_json_bin_year_str on ns=test set=cdt-indexing bin=json_bin.
Nobel prizes in year 1969:
key=1314
category=economics
year=1969
laureates=[{firstname=Ragnar, share=2, id=677, surname=Frisch, motivation="for having developed and applied dynamic models for the analysis of economic processes"}, {firstname=Jan, share=2, id=678, surname=Tinb
key=1313
category=chemistry
year=1969
laureates=[{firstname=Derek, share=2, id=237, surname=Barton, motivation="for their contributions to the development of the concept of conformation and its application in chemistry"}, {firstname=Odd, share=2, i
key=1315
category=literature
year=1969
laureates=[{firstname=Samuel, share=1, id=643, surname=Beckett, motivation="for his writing, which - in new forms for the novel and drama - in the destitution of modern man acquires its elevation"}]
key=1318
category=medicine
year=1969
laureates=[{firstname=Max, share=3, id=391, surname=Delbrück, motivation="for their discoveries concerning the replication mechanism and the genetic structure of viruses"}, {firstname=Alfred D., share=3, id=392
key=1317
category=physics
year=1969
laureates=[{firstname=Murray, share=1, id=90, surname=Gell-Mann, motivation="for his contributions and discoveries concerning the classification of elementary particles and their interactions"}]
key=1316
category=peace
year=1969
laureates=[{share=1, firstname=International Labour Organization, id=527, motivation="for creating international legislation insuring certain norms for working conditions in every country"}]
Created index idx_json_bin_category_str on ns=test set=cdt-indexing bin=json_bin.
Nobel peace prizes:
key=1190
category=peace
year=1990
laureates=[{firstname=Mikhail, share=1, id=552, surname=Gorbachev, motivation="for the leading role he played in the radical changes in East-West relations"}]
key=1561
category=peace
year=1920
laureates=[{firstname=Léon, share=1, id=484, surname=Bourgeois, motivation="for his longstanding contribution to the cause of peace and justice and his prominent role in the establishment of the League of Natio
key=1401
category=peace
year=1952
laureates=[{firstname=Albert, share=1, id=513, surname=Schweitzer, motivation="for his altruism, reverence for life, and tireless humanitarian work which has helped making the idea of brotherhood between men an
key=1386
category=peace
year=1955
overallMotivation="No Nobel Prize was awarded this year. The prize money was allocated to the Special Fund of this prize section."
key=1591
category=peace
year=1914
overallMotivation="No Nobel Prize was awarded this year. The prize money was allocated to the Special Fund of this prize section."
key=1646
category=peace
year=1903
laureates=[{firstname=Randal, share=1, id=466, surname=Cremer, motivation="for his longstanding and devoted effort in favour of the ideas of peace and arbitration"}]
... (truncated)
Example 2: Space Companies JSON Dataset
Read JSON File and Populate Database
We will load JSON documents from the file “space_companies.json”, and store each prize entry in the database in a separate record with user_key starting at 2000 in bin “json_bin”. You can view the file contents by opening it from Jupyter’s file browser.
"about" : "Telespazio VEGA UK is a dynamic and experienced consulting, technology and engineering services business based in Luton, UK. Following our merger into the Telespazio Group in 2011 we now have access to a vast array of world leading facilities and services, which we provide both within the UK and to our export markets. With a strong history in the European space arena starting in 1978, we have rich relationships across the space industry, with staff working closely with the European Space Agency (ESA) on programs and projects for over 35 years. ",
"culture" : "As an organisation that aspires to excellence, Telespazio VEGA recognises the need to appreciate our people for their achievements. Without the skills, dedication and enthusiasm of our people, we would not be where we are today. Telespazio VEGA continually strives to create a rewarding environment, offer ample opportunity for personal and professional development and support our people in balancing work and life.\nTo ensure that we remain relevant and of value to our clients, Telespazio VEGA has created an environment that encourages and rewards innovation and entrepreneurial spirit. This, offered with the financial stability afforded us as part of the Finmeccanica group of companies, means that valid business ideas are given the appropriate backing and support to enable them to succeed.\nWe also recognise that each individual has their own aspirations for personal development. Therefore, we offer a range of career options that enable consultants to enhance individual niche capabilities, embrace the challenge of business development, and take on the responsibility of a management role. These opportunities are made available through a structured career development programme which enables our people to look closely at themselves, where they are today, and where they see themselves in the future. We recognise and support all these ambitions and look to retain a rich mix of skills and capabilities that will continue to support our clients’ evolving business requirements.\n\nContinuous Learning\nAs a business, Telespazio VEGA places great emphasis on allowing our people to learn and develop, both personally and professionally. We share ideas with colleagues and clients, challenge each other and learn through our involvement with a variety of projects.\nTelespazio VEGA provides ample opportunity for more formal training. We encourage our people to undertake professional qualifications, and support those who undertake study which relates to their work\n",
"specialities" : "We now serve a wide range of public and private commercial markets globally by providing efficient, skilled support, the most up to date space data and services, and fundamentally, understanding and integrating the needs of our customers.\n\nWe have developed and harnessed our knowledge and expertise to deliver unique complex solutions for:\nGEO Information, Satellite Systems and Applications & Satellite Communications.\nWe offer robust and reliable ground segment systems for satellite missions and utilise space assets to develop downstream applications across a wide variety of sectors. \nWe provide world leading scientific services on earth observation and space science missions.\nOur valued and experienced knowledgeable experts deliver consulting and engineering support to space missions worldwide.",
"testimonials" : null,
"updatedAt" : "2021-01-20T09:45:41.731Z",
"weOfferChallenges" : null,
"weOfferKindOfChallenges" : "Telespazio VEGA’s culture is defined by our people, and drives the value we deliver through the relationships we have with our stakeholders.\n\nOur culture is distinct and discernible, mirroring the multi-national, multi-disciplinary environment in which we work. It is based on a practical engineering ethos that thrives on challenge, and engenders a rigorous, analytical, results-oriented approach to problem solving. We are open and honest, say what we believe, and challenge that which we do not.\n\nAt Telespazio VEGA, our culture is also based on a pursuit of excellence and thought leadership – not only in our understanding of the technologies with which we work and the market domains in which they are applied, but also in the quality of the ideas, methodologies, services and solutions we deliver.\n\nWe share with our clients an unparalleled enthusiasm for Space and its application in society. Therefore, we are able to empathise strongly, and invest this understanding and commitment into long-term relationships with our clients and stakeholders.\n",
Created index idx_json_bin_number_of_jobs_num on ns=test set=cdt-indexing bin=json_bin.
Companies with numberOfJobs greater than 10:
key=2013
createdAt=2018-03-04T14:00:04.000Z
jobs=null
numberOfJobs=11
company={backgroundImage=https://spaceindividuals.com/var/www/images/971f13def35f0b21e379fd1555b3e10d.jpg, isPublished=true, companyName=AKKA Technologies, about=As a consulting and engineering Group and Euro
_id=593fdfae9839ef0a13b6b92d
email=kontakt@akka.eu
updatedAt=2021-01-27T14:04:41.487Z
key=2070
createdAt=2018-03-29T14:00:04.000Z
jobs=null
numberOfJobs=62
company={backgroundImage=https://spaceindividuals.com/var/www/images/02873814215a7b781e0f5f10e726926b.jpg, companyName=CERN, about=Imagine taking part in the largest scientific experiment in the world. CERN n
_id=59df28ff947f49c77101e477
email=recruitment.service@cern.ch
updatedAt=2021-01-27T09:28:02.836Z
key=2084
createdAt=2018-04-01T14:00:04.000Z
jobs=null
numberOfJobs=30
company={backgroundImage=https://spaceindividuals.com/var/www/images/81e0f8e35dbab0242b8729f795986047.jpg, isPublished=true, companyName=RHEA Group, about=25 years of experience in the professional engineerin
_id=58d965bfde9c102b036040f7
email=temp@rheagroup.com
updatedAt=2021-01-27T10:16:40.854Z
key=2107
createdAt=2018-11-21T15:48:41.941Z
jobs=null
numberOfJobs=95
company={backgroundImage=/profile-pic/1574021440639.jpg, isPublished=true, companyName=European Space Agency - ESA, about=The European Space Agency (ESA) is Europe’s gateway to space. Its mission is to shape
_id=5bf57e594739c05820a92eee
email=ZINEB.ELOMRI@ESA.INT
updatedAt=2021-01-25T09:34:07.372Z
key=2071
createdAt=2018-03-07T14:02:08.000Z
jobs=null
numberOfJobs=16
company={backgroundImage=https://spaceindividuals.com/var/www/images/093fb01f43d429801105c6fc0b224d40.jpg, companyName=Terma Group, about=At Terma our people make the difference. We employ talented, dedicated
_id=590a258c7cd117274a0ae681
email=recruitment.de@terma.com
updatedAt=2021-01-27T13:24:59.183Z
key=2083
createdAt=2018-03-04T15:01:08.000Z
jobs=null
numberOfJobs=52
company={backgroundImage=https://spaceindividuals.com/var/www/images/98181cdb3d430d1e90c695e44caeea90.jpg, isPublished=true, companyName=Serco Europe, about=The Serco Group is a FTSE 250 corporation with over
Find companies by a nested field such as the payment type.
The paymentType field may be truncated in the result output, but can be verified with the AQL query in the terminal tab: aql -c “select json_bin from test.cdt-indexing where PK=key”.
// Create an index on the nested paymentType field.
Created index idx_json_bin_payment_type_str on ns=test set=cdt-indexing bin=json_bin.
Companies with enterprise payment type:
key=2013
createdAt=2018-03-04T14:00:04.000Z
jobs=null
numberOfJobs=11
company={backgroundImage=https://spaceindividuals.com/var/www/images/971f13def35f0b21e379fd1555b3e10d.jpg, isPublished=true, companyName=AKKA Technologies, about=As a consulting and engineering Group and Euro
_id=593fdfae9839ef0a13b6b92d
email=kontakt@akka.eu
updatedAt=2021-01-27T14:04:41.487Z
key=2002
createdAt=2017-06-05T00:00:00.000Z
jobs=null
numberOfJobs=0
company={backgroundImage=https://www.spaceindividuals.com/var/www/images/8907e7e06a094fe31098d167567fa1dc.jpg, isPublished=true, companyName=Telespazio VEGA UK, about=Telespazio VEGA UK is a dynamic and exper
_id=58d163cd6ed7bb3c6ee62e63
email=careers@telespazio.com
updatedAt=2021-01-20T09:47:36.694Z
key=2045
createdAt=2018-03-24T14:00:04.000Z
jobs=null
numberOfJobs=8
company={backgroundImage=https://spaceindividuals.com/var/www/images/0786a572d77f4a0b8a48c5b054766c54.jpg, isPublished=true, companyName=Telespazio Germany GmbH, about=Telespazio Germany is the first choice a
_id=58e13089de9c102b03604295
email=recruitment@telespazio.de
updatedAt=2021-01-27T11:04:38.881Z
key=2031
createdAt=2018-04-25T12:27:23.286Z
jobs=null
numberOfJobs=0
company={paymentTypeOrdered=enterprise, backgroundImage=https://www.spaceindividuals.com/var/www/images/5207d3c4216068b9423ade06215f1529.PNG, isPublished=true, companyName=European GNSS Agency, about=Space is
_id=5ae0742b4392412b350ca064
email=jobs@gsa.europa.eu
updatedAt=2020-11-13T17:25:53.261Z
key=2107
createdAt=2018-11-21T15:48:41.941Z
jobs=null
numberOfJobs=95
company={backgroundImage=/profile-pic/1574021440639.jpg, isPublished=true, companyName=European Space Agency - ESA, about=The European Space Agency (ESA) is Europe’s gateway to space. Its mission is to shape
_id=5bf57e594739c05820a92eee
email=ZINEB.ELOMRI@ESA.INT
updatedAt=2021-01-25T09:34:07.372Z
key=2084
createdAt=2018-04-01T14:00:04.000Z
jobs=null
numberOfJobs=30
company={backgroundImage=https://spaceindividuals.com/var/www/images/81e0f8e35dbab0242b8729f795986047.jpg, isPublished=true, companyName=RHEA Group, about=25 years of experience in the professional engineerin
_id=58d965bfde9c102b036040f7
email=temp@rheagroup.com
updatedAt=2021-01-27T10:16:40.854Z
... (truncated)
When the elements are distributed in multiple places in the CDT, a List or Map is not readily available for indexing. A simple example of this is to find the nobel prize(s) by the winner’s name.
{
"year" : "2021",
"category" : "chemistry",
"laureates" : [ {
"id" : "1002",
"firstname" : "Benjamin",
"surname" : "List",
"motivation" : "\"for the development of asymmetric organocatalysis\"",
"share" : "2"
}, {
"id" : "1003",
"firstname" : "David",
"surname" : "MacMillan",
"motivation" : "\"for the development of asymmetric organocatalysis\"",
"share" : "2"
} ]
}
Create List of Record Values to Query
To be able to find the nobel prize records by the winner’s name, we will create a separate List of the names of the laureates, and index that List. The List is stored in its own bin laureate_names.
Created index idx_laureate_names_list_str on ns=test set=cdt-indexing bin=laureate_names.
Prizes with laureate name "marie curie":
key=1647
category=physics
year=1903
laureates=[{firstname=Henri, share=2, id=4, surname=Becquerel, motivation="in recognition of the extraordinary services he has rendered by his discovery of spontaneous radioactivity"}, {firstname=Pierre, share=
key=1604
category=chemistry
year=1911
laureates=[{firstname=Marie, share=1, id=6, surname=Curie, motivation="in recognition of her services to the advancement of chemistry by the discovery of the elements radium and polonium, by the isolation of ra
Remove tutorial data and close connection.
client.truncate(null, Namespace, null, null);
client.close();
System.out.println("Removed tutorial data and closed server connection.");
Output:
Removed tutorial data and closed server connection.
The tutorial illustrated with many examples the secondary index functionality with CDTs that is available in Aerospike Database 6.1.0.0+. The CDT indexing enables faster queries on JSON documents and other complex objects stored as CDTs.
Next steps
Visit Aerospike notebooks repo to run additional Aerospike notebooks. To run a different notebook, download the notebook from the repo to your local machine, and then click on File->Open, and select Upload.