In-memory databases support application performance with fast access. Traditional disk-based database engines offer data persistence and peace of mind. In August, Amazon launched a single database service that combines the two.
Amazon MemoryDB for Redis is an in-memory database that also persists data automatically and invisibly in a storage layer. Barry Morris, general manager at Amazon Web Services (AWS), walked us through the company's decision to build the service, and how it sits alongside the existing AWS in-memory offerings.
Adapting to changing architectures
Yesterday's applications were mainly monolithic, with a single code base making calls to a back-end database. That's all changing, says Morris. As developers move to microservices-based models, usually based on Kubernetes-managed containers, they're building software applications as sets of small components that communicate with each other.
These changes to application architecture have implications for the database. Now, when a user clicks on a web page or another application makes an API call, hundreds of independent microservices in the background all make their own database queries. The microservices fabric then pulls it all into a response. This sparks many database accesses all at once.
"The problem is that your entire end-to-end latency to respond to the user is less than half a second, and maybe much less than that if they're using certain applications over a 5G connection," Morris says. That is potentially problematic if hundreds of microservices are each making their own database calls. "So what you need is very low latency database access."
This is where in-memory databases come in. Memory is the fastest random access mechanism we have, and it's getting cheaper all the time. As it drops in cost, it becomes more feasible for developers to put entire databases in memory, eliminating the traditional I/O bottleneck you get when pulling database records from a disk.
Traditionally, AWS users have had two options when dealing with in-memory data sets: Redis, and Amazon's own ElastiCache service. Now, the company is offering MemoryDB as a third option. Why?
The quest for in-memory durability
Redis is popular, having earned the most-loved database title in Stack Overflow's developer survey five years in a row. That means it also has a well-established developer base. It's perfect for many in-memory database applications, says Morris, but AWS still felt it had gaps that made room for an alternative.
"Redis is not durable. It is purely in memory, that's the point. So when we call it a database, it's a database, but it doesn't, by default, put anything on the disk or in any other durable storage," he says. "And so you get the speed. But the trade off is that if all your systems go down, you've lost your data."
The same is true of ElastiCache, which is a cache to temporarily place data when working with another database, speeding up performance.
Morris acknowledges that Redis does feature peer-to-peer replication, enabling it to store data across multiple geographically distributed servers and fail over between them. Many customers are prepared to take that risk with Redis, just as they are with ElastiCache, he points out.
However, he argues that this won't be enough for some users who need firm guarantees that data will be available on persistent storage. "That's not just some kind of lightweight persistence where you make a best effort to try and remember the data that you gave us, but real, solid, multi-Availability Zone durability," he says.
Redis does include some persistence options. Today with Redis, users can set up point-in-time snapshots, or use append-only files that write files to persistence logs that rebuild the data when the server is restarted. Each has its downside, though; snapshots won't persist very recent data, while the append-only option creates relatively large files and can have an impact on latency. You can combine the two to get the best persistence option in Redis. Even then, data that is stored on the actual disks of each node for self-managed Redis can be lost if that node is impacted.
The problem for many customers is that they don't necessarily have the resources or expertise to build out complex multi-regional replication architectures or set up these optional persistence features, argues Morris. This is especially true when you start to scale.
"Now you're getting into questions of how to manage all these things if you're running it yourself," he argues. "How do you do backups? How do you do patching without your system going down?"
He also points to optimization issues including ensuring that each node is running on the most appropriate instance. Then, there's the challenge of managing business continuity against application SLAs.
"When you put all those tasks onto your own team, as opposed to giving it to people that just do it all day, that's where a huge amount of the cost comes in," he says, adding that in some cases management can chew up 80 percent of the database lifecycle budget.
Combining performance and persistence
This is where MemoryDB comes in. It's a managed in-memory database that's fully Redis compatible, like ElastiCache. The difference is that it provides persistence on disk. "This is full-on, three-availability-zone durability," he explains. "It stores data across independent data centre infrastructures."
AWS doesn't talk too much about what's happening under the hood. The idea is that it's invisible to the user. If the database goes down, then the data will still be available when it comes back up, without the user having to do anything. One thing Morris will say is that it uses distributed transaction logs to achieve persistence.
The transaction log is stored in a separate, durable and highly available backend service that is used as a building-block of MemoryDB. They're network APIs that get the data from the in-memory system to the back-end storage infrastructure. MemoryDB also takes snapshots, as Redis does. The snapshots and the log enable the database to reconstruct and store the most recent database state from persistent storage.
"That process is completely invisible to the user," Morris says, adding that this reconstruction is only ever invoked when a database goes down. "We just reinstantiate the data when necessary."
Merging caches with persistent databases
Morris sees the most appropriate candidate users for MemoryDB as those who use Redis or ElastiCache as a cache in front of another database, which is a popular use case for AWS Redis users. Those users are typically trying to boost performance on disk-based databases to near-real-time levels. Examples might include gaming, media and entertainment and video feeds.
He warns that users going it alone must manage all of the integration between the cache and the back-end system, including not just logical and architectural relationships but also versioning issues.
"For a lot of those folks, that's complex, expensive, and painful because it's hard to manage and hard to scale," he explains. "They're saying, 'I am storing this data in one place to get durability and in another to get performance. Can I have both in one place?'"
Although MemoryDB eliminates the need for separate persistent and in-memory storage, Morris acknowledges that people will often need to integrate the database with other cloud services and databases to support their use cases. AWS Lambda, the company's serverless function framework, supports integration with databases, enabling storage events to trigger external functions. The company has also previewed AWS Glue Elastic Views, which creates materialised views of one database in the context of another.
AWS has made the migration from existing AWS Redis-based instances to MemoryDB seamless, because the schemas are exactly the same. It's up to each customer to decide whether the savings on management costs and the MemoryDB pricing structure fits their financial requirements.
AWS hasn't announced plans for a serverless version of MemoryDB yet, meaning that the service is instance-based. Hourly instance cost is one of three factors determining MemoryDB pricing. The other two are the price per data written, and a per-Gb cost for snapshot storage. There are no costs for reads.
Adding MemoryDB to the current list of AWS managed database offerings doesn't constrain existing Redis or ElastiCache users on AWS, Morris argues. "It's a choice, and while it's not for everyone, there are certain customers that really want this functionality," he says.
The other services are still there for those who want to manage their own persistence and performance in the AWS environment. For those that want combined in-memory performance and persistence without the burdensome management costs, the cloud giant now offers another option.
Comments