Blog

From Redis to Aerospike: Managing 45 billion records efficiently

Adjust GmbH efficiently managed 45 billion records, transitioning from Redis to Aerospike for optimal results. Discover the challenges faced, strategies employed, and lessons learned.

August 5, 2024 | 7 min read
Steve Tuohy website
Steve Tuohy
Director of Product Marketing

In today's world where milliseconds matter, storing and managing information efficiently is critical. Adjust GmbH, a Berlin-based global mobile measurement and fraud prevention company, processes approximately 52 million requests every minute and stores around 45 billion records. 

Adjust recently shared their success story at The Real-Time Data Summit virtual event, held in June 2024. The event was full of market insights and information for developers on the latest AI and real-time data advancements. Below, we explore the strategies Adjust’s team employed, the lessons they learned from managing such a substantial dataset, and how the company ultimately migrated to Aerospike from Redis to deliver results for their customers.

Understanding the scale

Adjust helps clients track the effectiveness of their marketing channels by monitoring user interactions across social media platforms. 

"Clients create campaigns and put them on Facebook, Twitter, LinkedIn, TikTok—wherever they want to market to people,” explains Adjust Senior Software Engineer Bubunyo Nyavor. “We help them track how people are clicking these campaign links and how these campaign links are converting depending on the platform they are using."

Each of these roughly 52 million requests per minute generates data that needs to be stored and processed in real time. This volume demands a robust and scalable database.

Redis challenges solved: Exploring the power of Aerospike

Scaling Adjust’s data storage to handle 45 billion records has had its challenges, particularly with upgrades and understanding operational modes.

When the company first started with Redis, it could easily handle Adjust’s load of 40,000 requests per second on a single server with 32 GB of RAM and 2 TB of HDD storage. In fact, at that point, the server was over-provisioned, said Robert Abraham, Adjust’s VP of Engineering, at a previous Summit.

However, the company’s growth quickly challenged this. At first, Adjust could manage by adding more memory to its server—up to 384 GB of RAM. However, they eventually ran up against hardware limitations. As a single-threaded database, Redis doesn’t benefit from multi-core CPUs. There were also physical limits on the amount of memory a single server could hold, Abraham explained.

At that point, Adjust added Redis Sentinel, “a distributed system consisting of multiple Redis instances started in sentinel mode” to move beyond a single server to a Redis cluster, which worked for a time. But it also started having problems with latency and cache misses, which led to increased operational overhead. “Redis is still an in-memory database, and the costs for keeping our ever-growing data set in RAM were exploding,” Abraham said. “We needed more and more RAM, and it was quite expensive. We also had increased latency and instability during peak hours because our writes increased. This also meant that the replication stream could not keep up, and all in all, this created a not-so-ideal situation for us.”

server-reduction-adjust

Moving to Aerospike helped in two ways. 

First, Aerospike’s architecture natively supports clustering, whereas Redis requires the additional Sentinel component and separate configuration. Second, Aerospike can achieve real-time performance with far less memory – for example, placing indexes in memory and the actual data on disk. “This means that the majority of data can be kept in storage that costs one-tenth [of the cost of Redis],” Abraham said, reducing the number of servers they needed from 40 to 6. “This means we were able to cut our infrastructure costs by almost 85%, and we also needed to maintain it a lot less. We had more resources available to build new features and products, and even with our growing data, the latency of the operations was stable.”

A new infrastructure and setup

Today, the company’s data storage infrastructure is built on Aerospike. To manage its 45 billion records, Adjust deployed Aerospike clusters across three logical data centers worldwide.

"Each data center has an Aerospike cluster, which has about 64 nodes running on a bare-metal setup running a Gentoo operating system,” Bubunyo details. The hardware includes an average of 70 CPUs, 400 GB of RAM on each node, and about 16 TB of hard disk.

This configuration ensures high availability and redundancy. Adjust uses a replication factor of 2 in some namespaces and 3 in others, depending on the use case. It benefits from Aerospike’s flexibility here and, thus, can handle failures without any significant data loss. 

"All three data centers are set up using Cross Datacenter Replication (XDR),” Bubunyo says. “This is a mesh active-active topology where a request that ends up in one cluster is sent to another cluster where it is backed up."

Designing for low latency and consistency

XDR also helps Adjust to manage latency. "We have to pay a lot of attention to latencies,” Bubunyo says. If a request comes to one data center and the following request goes to the next data center, high latency means that each request is seen as a separate entity. “It becomes difficult to build on top of this.” Moreover, high latency makes failover a problem because potentially, not all the data is sent over for replication to the other data center.

With Aerospike, 50% of the company’s requests now take less than 500 microseconds, Bubunyo says. “For the vastness of the data that we query and the size of the cluster, this is actually pretty impressive.”

Strong consistency and availability modes

Distributed database architectures have to balance consistency and availability (per the CAP Theorem), and certain tasks require more of one than the other. But with Aerospike, Adjust has both, depending on its needs. “There are some sets of data where we place a higher premium on consistency than availability, and there are some namespaces where we place a higher premium on availability rather than consistency,” Bubunyo explains. "Aerospike allows us to have both of these operational modes, and we use them for different namespaces for different reasons…Understanding all the configurations is key to taking advantage of this beast of technology. Aerospike gives us the fine-grained controls we need to manage our cluster to meet our requirements.”

Efficient data operations across mixed workloads

Adjust performs several types of database operations on the dataset. "Each request will invoke an operation in Aerospike,” Bubunyo says. “Depending on what operation it is, we fetch some data, and we write some data. Sometimes we write in batches, sometimes we delete, and then we return a response for these requests."

Read and write operations 

Adjust runs more than 1 million read and write operations per second, writing for each click and each impression to power its attribution operations and serve up these reads to clients. It appreciates Aerospike’s performance across mixed workloads and actually ends up with higher writes than reads.

Adjust leverages the Smart Client to route requests to the appropriate nodes, resulting in efficient distribution and high throughput. Smart Clients are “cluster-aware and will ship each request to the node that is the leader of a partition,” Bubunyo explains. “This way, you can achieve distributed nodes and high throughput by sending requests only to the nodes you need to send requests to.”

Index management 

Unlike some databases, which store everything in memory and consequently impose high infrastructure costs, Aerospike lets Adjust store secondary indexes on NVMe and SSDs, optimizing cost and performance. “We take advantage of this to store our secondary indexes on NVMe and SSD. Otherwise, we'd have to pay a high price to store the number of indexes we have in memory,” Bubunyo says. 

Deprioritizing certain operations 

Adjust regularly scans its dataset for data maintenance and compliance to remove outdated or irrelevant records—352 terabytes (TB) in total. These resource-intensive scans must be managed to minimize the impact on real-time operations.

"Scanning the entire cluster takes about three days, and it is a very slow and intensive process because it takes a lot of resources to scan,” Bubunyo says. “Luckily, Aerospike helps us prioritize scans lower than ‘get’ and ‘write’ operations." 

Making scalability a non-issue

Storing and managing 45 billion records for real-time use cases requires a combination of robust infrastructure, intelligent data operations, and strategic tradeoffs between consistency and availability. With Aerospike, Adjust has built a scalable, high-performance system to fight fraud and optimize the marketing efforts of a rapidly growing customer base.

White paper: Five signs you have outgrown Redis

If you deploy Redis for mission-critical applications, you are likely experiencing scalability and performance issues. Not with Aerospike. Check out our white paper to learn how Aerospike can help you.