AWS has made significant strides in enhancing the performance and price-performance ratio of its Graviton processors and Amazon Aurora with each iteration. The latest offering allows users to leverage Graviton4-based R8gd instances, equipped with local NVMe-based SSDs, in conjunction with the Amazon Aurora PostgreSQL-Compatible Edition. This integration features an Optimized Reads-enabled tiered cache and temporary objects, providing a substantial upgrade path from Graviton2-based db.r6g instances. Users can expect up to 165% higher throughput, a 120% improvement in price-performance ratio, and an 80% enhancement in application response time.
Scaling database cache using Aurora PostgreSQL with Optimized Reads
Organizations of all sizes, from startups to large enterprises, are grappling with rapidly expanding data volumes across various applications, including generative AI and real-time analytics. As the working dataset surpasses the memory capacity of the database, it begins to rely on storage reads, leading to increased query latencies and unpredictable performance. This shift can adversely affect throughput, inflate I/O costs, and ultimately hinder business processes, resulting in a suboptimal user experience.
The Aurora I/O-Optimized configuration presents a solution for enhancing write throughput in I/O-intensive applications while ensuring predictable pricing. However, maintaining low read latencies when the working dataset exceeds memory can be challenging. Scaling up database instances to increase memory is one approach, but it may not be cost-effective when other hardware resources remain underutilized.
To tackle this issue, Aurora introduces the I/O-Optimized configuration with an Optimized Reads-enabled tiered cache. This feature expands database caching capacity by up to five times the instance memory using local NVMe SSDs, allowing more data to be cached locally and minimizing network storage access. The result is faster query response times with predictable latency, enhanced throughput, and an improved price-performance ratio.
Customers have already begun to reap the rewards of the Amazon Aurora Optimized Reads-enabled tiered cache. Mindbody, for instance, reported impressive metrics: a 50% reduction in CPU usage, a 90% decrease in read IOPS, query execution speeds up to 59 times faster, and overall cost savings of 23%. More details can be found in the Mindbody case study.
Similarly, Claroty experienced a remarkable enhancement in system performance, reducing API request times from around 30 seconds to under one second on average, while also halving operational costs. Additional insights are available in the Claroty case study.
The Optimized Reads-enabled tiered cache also proves beneficial for generative AI and vector workloads by accelerating vector operations and minimizing latency for vector similarity searches. For more information, refer to the article on improving generative AI workloads on Amazon Aurora.
Benefits of Aurora PostgreSQL 17 with Optimized Reads and Graviton4 R8gd
The major release of Aurora PostgreSQL 17 has brought forth notable performance enhancements for I/O-Optimized cluster configurations, particularly in the database write path. Key improvements include:
- Smarter storage batching algorithms that dynamically adjust flush sizes and frequencies based on real-time storage performance.
- Optimized writes that minimize interference from background maintenance tasks, yielding more consistent performance with improved commit latency and throughput.
- Enhanced allocation of Write-Ahead Log (WAL) stream numbers, leading to increased throughput for write-heavy workloads on Graviton4-based instances.
Aurora PostgreSQL 17.4 allows for dynamic adjustment of temporary objects storage size up to six times the instance memory. Furthermore, version 17.5 enables storage of up to 256 TiB in a single database cluster, simplifying application scaling and data management.
With Graviton4-based R8gd instances, users can scale their Aurora PostgreSQL with Optimized Reads up to 48xl (an increase from the previous maximum of 32xl), providing 192 vCPUs in a 1:1 core ratio, 50 Gigabits of network bandwidth, 1.5 TiB of instance memory, and 10.4 TiB of local NVMe capacity. This local NVMe SSD capacity extends the local database cache to 7.34 TiB (using the default shared buffer).
The optimizations in Aurora PostgreSQL 17, combined with the enhanced hardware specifications of Graviton4 R8gd, deliver superior performance for various use cases, including:
- Internet-scale applications such as payment processing, billing, and e-commerce, which demand strict performance SLAs.
- Real-time reporting dashboards that execute hundreds of point queries for metrics and data collection.
- Generative AI applications utilizing the pgvector extension for searching exact or nearest neighbors across millions of vector embeddings.
For further details on Aurora Optimized Reads capabilities and usage recommendations, please see Improving query performance for Aurora PostgreSQL with Aurora Optimized Reads. Information on Aurora PostgreSQL on AWS Graviton4-based instances can be found in the article Achieve up to 1.7 times higher write throughput and 1.38 times better price performance with Amazon Aurora PostgreSQL on AWS Graviton4-based R8g instances.
Benchmark workload and methodology
To validate the hardware and software enhancements, we employed HammerDB, an open-source database load testing and benchmarking tool that facilitates consistent and repeatable test scenarios. HammerDB simulates online transaction processing (OLTP) workloads across various database engines, including PostgreSQL. The TPROC-C workload in HammerDB is modeled after the TPC-C benchmark, executing multiple transaction types, including read and write operations with concurrent sessions.
Test environment configuration
In our testing, HammerDB was utilized to load the database and execute workloads on an Amazon Aurora PostgreSQL 17.5 cluster. The warehouse configuration generated database sizes of approximately 128GB for small workloads (2xlarge) and 1,024GB for medium workloads (16xlarge), both exceeding the instance memory. Concurrency was set to twice the vCPU, with a 10-minute ramp-up and a 20-minute run for both the 2xlarge (small workload) and 16xlarge (medium workload) instances.
The HammerDB ran on Amazon Elastic Compute Cloud (Amazon EC2), utilizing the same size and Availability Zone as the database instance.
Metrics
We measured and compared HammerDB new orders per minute (NOPM) throughput and application response time.
Instances capacity
The following table summarizes the instance types and hardware specifications used in this benchmark.
| Instance Size | Processor | vCPU | Cores | GHz | Memory (GiB) | DRAM | Instance Storage (GB) – NVMe SSD | Optimized Reads-enabled Tiered Cache size (GB) | Optimized Reads-enabled Temporary object size (GB) | Network Baseline/Burst (Gbps) |
|---|---|---|---|---|---|---|---|---|---|---|
| db.r6g.2xlarge* | Graviton2 | 8 | 8 | 2.5 | 64 | DDR4 | N/A | N/A | N/A | 2.5 / 10 |
| db.r6gd.2xlarge* | Graviton2 | 8 | 8 | 2.5 | 64 | DDR4 | 1 x 474 (474) | 262 | 137 | 2.5 / 10 |
| db.r8gd.2xlarge* | Graviton4 | 8 | 8 | 2.8 | 64 | DDR5 | 1 x 474 (474) | 262 | 137 | 3.75 / 15 |
| db.r6g.16xlarge | Graviton2 | 64 | 64 | 2.5 | 512 | DDR4 | N/A | N/A | N/A | 25 |
| db.r6gd.16xlarge | Graviton2 | 64 | 64 | 2.5 | 512 | DDR4 | 2 x 1,900 (3,800) | 2,105 | 1,099 | 25 |
| db.r8gd.16xlarge | Graviton4 | 64 | 64 | 2.8 | 512 | DDR5 | 2 x 1,900 (3,800) | 2,105 | 1,099 | 30 |
* These instances have a baseline bandwidth and can utilize a network I/O credit mechanism to burst beyond their baseline bandwidth on a best-effort basis. Other instance types can sustain their maximum performance indefinitely. For more details, refer to Amazon EC2 instance network bandwidth and Specifications for Amazon EC2 memory optimized instances.
Benchmark environment’s architecture
The architecture of the benchmark environment is illustrated in the following diagram, showcasing the deployment of an EC2 instance running HammerDB (driver) to benchmark the Aurora cluster (target) with one writer and one reader instance, although the reader was not under load.
It is important to note that Graviton-4 based db.r8gd and Graviton2-based db.r6gd instances include local NVMe SSD storage, while db.r6g instances do not.
Benchmark comparison
When evaluating and comparing Aurora DB instance class types, several factors must be considered to make informed decisions. Key areas to assess include throughput, application response time, and price-performance ratios. By examining these aspects, a broader understanding of performance gains and how application performance can benefit from lower and more consistent response times can be achieved.
This post discusses two potential upgrade paths and the benefits associated with each:
- Enabling Aurora Optimized Reads by using the db.r8gd instances – Highlights the advantages of utilizing the Optimized Reads feature and upgrading from db.r6g to db.r8gd instances.
- Upgrading from db.r6gd to db.r8gd instances – Demonstrates the benefits of transitioning an existing Aurora Optimized Reads cluster from db.r6gd to db.r8gd instances.
Throughput comparison
Database throughput is measured in NOPM, a performance metric indicating how many new orders per minute the database can process within a specified timeframe. This metric evaluates the database’s processing capacity across multiple dimensions: CPU processing speed, query execution time, I/O, network, and memory speed.
The following charts illustrate that upgrading Aurora PostgreSQL I/O-Optimized from db.r6g instances to db.r8gd instances with Optimized Reads results in a 165% improvement in throughput for the 2xlarge instance and a 136% improvement for the 16xlarge instance.
Furthermore, upgrading Aurora PostgreSQL I/O-Optimized with Optimized Reads on db.r6gd instances to db.r8gd instances yields a 68% improvement in throughput for the 2xlarge instance and a 56% improvement for the 16xlarge instance.
Price-performance ratio comparison
This metric combines an instance’s performance capabilities with its monthly cost to determine the value received for each dollar invested. We calculated this ratio using HammerDB NOPM divided by the instance’s on-demand monthly price in the US West (Oregon) AWS Region. A higher ratio indicates better value for investment, as it reflects more database transactions per dollar spent.
This measurement is particularly useful for infrastructure cost planning, as it aids in:
- Optimizing operational expenses while maintaining necessary performance levels.
- Making data-driven decisions regarding instance type and size.
- Balancing performance requirements with budget considerations.
Charts reveal that upgrading Aurora PostgreSQL I/O-Optimized from db.r6g instances to db.r8gd instances with Optimized Reads can lead to a 120% improvement in price-performance ratio for the 2xlarge instance and a 96% improvement for the 16xlarge instance.
Additionally, transitioning from db.r6gd instances to db.r8gd instances results in a 68% improvement in price-performance ratio for the 2xlarge instance and a 56% improvement for the 16xlarge instance.
Response time comparison
Maintaining low, consistent 99th percentile (p99) response times is crucial in high-performance environments. The p99 metric indicates the response time for which 99% of transactions are completed. HammerDB response time measures the duration required for the database and application to process and respond to a new order transaction in the TPROC-C workload.
The p99 metric specifically captures the performance of the 1% outlier. For instance, if 99 e-commerce transactions complete in 200ms but one takes 20 seconds, the average may seem reasonable, but the p99 metric reveals a concerning performance issue. During peak sales seasons, this 1% problematic case can result in numerous users abandoning their purchases, adversely affecting revenue and customer experience. Consistency in p99 response times is essential, as fluctuations can lead to poor user experiences.
The Aurora Optimized Reads-enabled tiered cache addresses these challenges by reducing I/O, storage network, and CPU utilization through larger local caching, delivering sub-millisecond read latency and improving p99 consistency.
Charts demonstrate that upgrading Aurora PostgreSQL I/O-Optimized from db.r6g instances to db.r8gd instances with Optimized Reads can enhance the p99 response time by 80% for the 2xlarge instance and by 77% for the 16xlarge instance.
The db.r8gd maintains lower response times with superior consistency, resulting in more predictable query execution times and improved performance variability. The following chart illustrates a snippet of HammerDB response time (p99) over time, showcasing lower fluctuations.
Additionally, upgrading Aurora PostgreSQL I/O-Optimized with Optimized Reads from db.r6gd instances to db.r8gd instances can improve the p99 response time by 42% for the 2xlarge instance and by 25% for the 16xlarge instance.
Enable Aurora Optimized Reads with tiered cache
Having explored the performance benefits of Aurora PostgreSQL I/O-Optimized with Optimized Reads on db.r8gd instances, let’s discuss how to begin utilizing Aurora Optimized Reads-enabled tiered cache.
For new deployments
Create an Aurora PostgreSQL cluster using the Amazon RDS Console, AWS CLI, or RDS API with I/O-Optimized storage, selecting the Optimized Reads instance class (db.r8gd) that indicates local NVMe SSD storage availability.
For existing clusters
For current Aurora PostgreSQL clusters, upgrade to db.r8gd instances via an instance class modification with minimal downtime, utilizing Aurora’s high availability features. To disable the Optimized Reads-enabled tiered cache, modify your instance to a type without local NVMe SSD storage. For additional information, see Performance and scaling for Amazon Aurora PostgreSQL.
Aurora PostgreSQL I/O-Optimized with an Optimized Reads-enabled tiered cache on db.r8gd is available for Aurora PostgreSQL version 17.4 and higher, 16.3 and higher, 15.7 and higher, and 14.12 and higher, in AWS Regions where Optimized Reads instance classes are supported. For more details, see Amazon Aurora DB instance classes.