Optimize and troubleshoot database performance in Amazon Aurora PostgreSQL by analyzing execution plans using CloudWatch Database Insights

Amazon Web Services (AWS) provides a robust collection of monitoring tools aimed at enhancing visibility into performance and events for both Amazon Relational Database Service (Amazon RDS) and Amazon Aurora databases. This article illustrates the utilization of Amazon CloudWatch Database Insights to analyze SQL execution plans, facilitating troubleshooting and optimization of SQL query performance within an Aurora PostgreSQL cluster.

PostgreSQL Query Optimizer and Query Access Plans

The PostgreSQL query optimizer serves as a fundamental element of the database engine, tasked with determining the most efficient execution path for SQL queries. Upon receiving a query, PostgreSQL generates multiple potential execution strategies, subsequently selecting the optimal one based on cost estimation.

A query access plan outlines the step-by-step execution strategy selected by the optimizer, detailing how PostgreSQL retrieves and processes data through various techniques such as sequential scans, index scans, joins, and sorting operations. Developers and database administrators can utilize the EXPLAIN command to analyze these query plans and gain insights into execution paths.

Comprehending the PostgreSQL query optimizer and access plans is vital for optimizing database performance and resource utilization. For further reading, refer to How PostgreSQL Processes Queries and How to Analyze Them.

Solution Overview

In December 2024, AWS unveiled CloudWatch Database Insights, a comprehensive monitoring solution that supports both Aurora (PostgreSQL and MySQL variants) and Amazon RDS engines, including PostgreSQL, MySQL, MariaDB, SQL Server, and Oracle. This observability tool is tailored for DevOps engineers, developers, and DBAs, enabling them to swiftly identify and resolve database performance issues. By offering a unified view across database fleets, CloudWatch Database Insights enhances troubleshooting workflows and operational efficiency.

The tool features an Advanced mode and a Standard mode, with the SQL execution plan analysis feature available exclusively in Advanced mode. The following sections will guide you through enabling this feature in your Aurora PostgreSQL clusters, troubleshooting query performance issues stemming from plan changes, and optimizing query performance based on the query plan.

For demonstration purposes, we will utilize a typical e-commerce application schema encompassing customer information, a product catalog, order details, and order line items, represented by corresponding tables: customers, products, orders, and order_items.

Prerequisites

To follow this guide, some additional configuration is necessary. Please refer to the prerequisites outlined in Analyzing Execution Plans with CloudWatch Database Insights for detailed information.

Analyze SQL Execution Plans

Let us examine a typical query executed against our application and how to analyze the execution plans using CloudWatch Database Insights. We begin by accessing CloudWatch Database Insights through the CloudWatch console. After navigating to Database Views, we identify our Aurora PostgreSQL DB instance and review its performance metrics on the Top SQL tab.

The Plans Count column in the Top SQL table indicates the number of collected plans for each digest query. If necessary, you can customize column visibility and ordering using the settings icon in the Top SQL table.

Next, we focus on the specific SQL statement executed by our e-commerce application. By selecting the digest query and examining its details on the SQL text tab, we can view the exact SQL statement being executed. The Plans tab provides a detailed view of the query execution plans.

For comparative analysis, we can select up to two plans simultaneously. This side-by-side comparison reveals crucial differences: one plan utilized a sequential scan approach, while the other employed an index scan. Such differences in access path selection offer valuable insights into variations in query performance and resource utilization patterns.

Following are the details of the execution plans as observed in the screenshots:

First Execution plan -74374649:
Gather (cost=1000.00..266914.09 rows=95345 width=126) (actual time=0.264..307.518 rows=100799 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on person (cost=0.00..256379.59 rows=39727 width=126) (actual time=0.023..367.948 rows=33600 loops=3)
Filter: (age = 40)
Rows Removed by Filter: 3299734
Second Execution plan 616519750:
Gather (cost=3904.96..220491.70 rows=105342 width=126) (actual time=26.699..158.131 rows=100799 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Bitmap Heap Scan on person (cost=2904.96..208957.50 rows=43892 width=126) (actual time=21.423..209.867 rows=33600 loops=3)
Recheck Cond: (age = 40)
Rows Removed by Index Recheck: 534444
Heap Blocks: exact=6437 lossy=4705
-> Bitmap Index Scan on idx_person_age_date_active (cost=0.00..2878.62 rows=105342 width=0) (actual time=17.911..17.911 rows=100799 loops=1)
Index Cond: (age = 40)

The first execution plan employs a sequential scan, compelling the PostgreSQL engine to read the entire ‘person’ table. Conversely, the second execution plan adopts a more efficient index scan, significantly enhancing performance by reducing execution time through index utilization. By analyzing the execution plan differences in CloudWatch Database Insights, we can effectively pinpoint and optimize query performance bottlenecks.

Compare Execution Plans to Troubleshoot Performance Degradation

When addressing SQL query performance degradation, DBAs and DBEs frequently analyze query execution plans for changes in execution behavior that could affect performance, such as dropped indexes, inefficient join strategies, or suboptimal query patterns. However, manually comparing complex, nested execution plans over time can be laborious and prone to error. Consider the following statement executed by our sample application, which retrieves order details alongside associated customer and product information for orders placed after a specific date:

SELECT o.order_id, c.customer_name, p.product_name
FROM orders o
JOIN customers c ON c.customer_id = o.customer_id
JOIN order_items oi ON oi.order_id = o.order_id
JOIN products p ON p.product_id = oi.product_id
WHERE o.order_date > '2024-01-01';

To simulate a real-world scenario involving accidental index deletion, we dropped certain indexes on the orders and order_items tables, particularly those pertinent to join conditions, and updated the table statistics. These modifications resulted in a suboptimal execution plan, causing the query to slow down significantly and adversely affecting overall application performance.

Utilizing CloudWatch Database Insights allows for the comparison of execution plans for the same SQL statement over time, simplifying troubleshooting by highlighting changes that may have led to performance degradation and enabling teams to swiftly diagnose and resolve performance issues. The following screenshots illustrate the SQL query and its execution plan information from CloudWatch Database Insights:


Below are the details of the execution plans as seen in the above screenshots:

First Execution plan 1772079466:
Hash Join (cost=68830.55..165811.40 rows=3999608 width=30) (actual time=744.721..5734.411 rows=4000000 loops=1)
Hash Cond: (oi.product_id = p.product_id)
-> Hash Join (cost=92463.15..289295.41 rows=4000000 width=22) (actual time=888.183..8057.716 rows=4000000 loops=1)
Hash Cond: (o.customer_id = c.customer_id)
-> Hash Join (cost=87066.86..273398.67 rows=4000000 width=12) (actual time=839.384..6391.227 rows=4000000 loops=1)
Hash Cond: (oi.order_id = o.order_id)
-> Index Scan using idx_order_items_product on order_items oi (cost=0.43..175832.22 rows=4000000 width=8) (actual time=0.086..3222.440 rows=4000000 loops=1)
-> Hash (cost=62066.43..62066.43 rows=2000000 width=8) (actual time=829.311..829.313 rows=2000000 loops=1)
Buckets: 2097152  Batches: 1  Memory Usage: 94509kB
-> Bitmap Heap Scan on orders o (cost=22360.43..62066.43 rows=2000000 width=8) (actual time=64.681..330.625 rows=2000000 loops=1)
Recheck Cond: (order_date > '2024-01-01'::date)
Heap Blocks: exact=14706
-> Bitmap Index Scan on idx_orders_date (cost=0.00..21860.43 rows=2000000 width=0) (actual time=62.285..62.285 rows=2000000 loops=1)
Index Cond: (order_date > '2024-01-01'::date)
-> Hash (cost=4146.29..4146.29 rows=100000 width=18) (actual time=48.149..48.149 rows=100000 loops=1)
-> Index Scan using customers_pkey on customers c (cost=0.29..4146.29 rows=100000 width=18) (actual time=0.045..25.433 rows=100000 loops=1)
-> Hash (cost=386.29..386.29 rows=10000 width=16) (actual time=3.996..3.997 rows=10000 loops=1)
-> Index Scan using products_pkey on products p (cost=0.29..386.29 rows=10000 width=16) (actual time=0.035..2.257 rows=10000 loops=1)
Second Execution plan -1205501186:
Hash Join (cost=92974.44..300311.14 rows=4000000 width=30) (actual time=892.288..8840.175 rows=4000000 loops=1)
Hash Cond: (oi.product_id = p.product_id)
-> Hash Join (cost=68492.55..154969.99 rows=3999608 width=22) (actual time=742.450..4850.368 rows=4000000 loops=1)
Hash Cond: (o.customer_id = c.customer_id)
-> Hash Join (cost=64703.55..140681.57 rows=3999608 width=12) (actual time=713.232..3392.665 rows=4000000 loops=1)
Hash Cond: (oi.order_id = o.order_id)
-> Seq Scan on order_items oi (cost=0.00..65478.00 rows=4000000 width=8) (actual time=0.005..724.224 rows=4000000 loops=1)
-> Hash (cost=39706.00..39706.00 rows=1999804 width=8) (actual time=705.539..705.540 rows=2000000 loops=1)
Buckets: 2097152  Batches: 1  Memory Usage: 94509kB
-> Seq Scan on orders o (cost=0.00..39706.00 rows=1999804 width=8) (actual time=0.010..241.930 rows=2000000 loops=1)
Filter: (order_date > '2024-01-01'::date)
-> Hash (cost=2539.00..2539.00 rows=100000 width=18) (actual time=29.162..29.162 rows=100000 loops=1)
-> Seq Scan on customers c (cost=0.00..2539.00 rows=100000 width=18) (actual time=0.003..10.732 rows=100000 loops=1)
-> Hash (cost=213.00..213.00 rows=10000 width=16) (actual time=2.246..2.246 rows=10000 loops=1)
-> Seq Scan on products p (cost=0.00..213.00 rows=10000 width=16) (actual time=0.006..0.966 rows=10000 loops=1)

The execution plan on the lower left illustrates the original state, where appropriate indexes were in place, resulting in index scans on the involved tables with a total query cost of 165811.40. In contrast, the execution plan on the lower right reflects the state post-index deletion, leading to sequential scans on the tables and a significantly higher total cost of 300311.14. This increase in cost elucidates the performance degradation experienced following maintenance, during which indexes were inadvertently dropped.

Employing the side-by-side execution plan comparison feature in CloudWatch Database Insights simplified the process of identifying the root cause of performance degradation, automating what would typically require manual query plan extraction and comparison.

Analyze Execution Plans to Optimize Query Performance

Analyzing a SQL query execution plan yields profound insights into the database engine’s processing of a query. By scrutinizing key elements such as join types, scan methods, sorting techniques, estimated and actual row counts, and operator costs, inefficiencies and performance bottlenecks can be identified. This analysis aids in recognizing issues like missing indexes, suboptimal join orders, outdated statistics, and misconfigured database parameters, enabling fine-tuning of queries for enhanced performance.

For instance, consider the following SQL statement executed by our e-commerce application, which summarizes customer spending per order date, sorted from highest to lowest total amount spent. Database monitoring has indicated that degraded query execution times are responsible for application slowdowns and overall performance bottlenecks. We will demonstrate how to optimize query performance using CloudWatch Database Insights, employing EXPLAIN ANALYZE to examine the actual execution plan and identify opportunities for improvement.

SELECT c.customer_name, o.order_date, SUM(oi.quantity * p.price) AS total_spent
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
JOIN order_items oi ON o.order_id = oi.order_id
JOIN products p ON oi.product_id = p.product_id
GROUP BY c.customer_name, o.order_date
ORDER BY total_spent DESC;

The following screenshot showcases the query execution plan in CloudWatch Database Insights:

Here are the details of the execution plan as observed in the screenshot:

Execution plan -2135020023:
Sort (cost=1384921.78..1394921.78 rows=4000000 width=50) (actual time=12885.186..13450.690 rows=1689458 loops=1)
Sort Key: (sum(((oi.quantity)::numeric * p.price))) DESC
Sort Method: external merge  Disk: 61216kB
-> HashAggregate (cost=568161.91..672849.41 rows=4000000 width=50) (actual time=8197.706..11103.900 rows=1689458 loops=1)
Group Key: c.customer_name, o.order_date
Planned Partitions: 256  Batches: 257 Memory Usage: 8465kB  Disk Usage: 245544kB
-> Hash Join (cost=73599.00..219411.91 rows=4000000 width=28) (actual time=576.103..5469.856 rows=4000000 loops=1)
Hash Cond: (oi.product_id = p.product_id)
-> Hash Join (cost=73261.00..208569.47 rows=4000000 width=26) (actual time=573.849..4605.587 rows=4000000 loops=1)
Hash Cond: (o.customer_id = c.customer_id)
-> Hash Join (cost=69472.00..194280.02 rows=4000000 width=16) (actual time=547.779..3304.456 rows=4000000 loops=1)
Hash Cond: (oi.order_id = o.order_id)
-> Seq Scan on order_items oi (cost=0.00..65478.00 rows=4000000 width=12) (actual time=0.004..474.077 rows=4000000 loops=1)
-> Hash (cost=34706.00..34706.00 rows=2000000 width=12) (actual time=547.369..547.370 rows=2000000 loops=1)
Buckets: 262144  Batches: 16  Memory Usage: 7427kB
-> Seq Scan on orders o (cost=0.00..34706.00 rows=2000000 width=12) (actual time=0.003..223.433 rows=2000000 loops=1)
-> Hash (cost=2539.00..2539.00 rows=100000 width=18) (actual time=25.914..25.915 rows=100000 loops=1)
-> Seq Scan on customers c (cost=0.00..2539.00 rows=100000 width=18) (actual time=0.003..10.583 rows=100000 loops=1)
-> Hash (cost=213.00..213.00 rows=10000 width=10) (actual time=2.228..2.228 rows=10000 loops=1)
-> Seq Scan on products p (cost=0.00..213.00 rows=10000 width=10) (actual time=0.008..1.059 rows=10000 loops=1)

From the query execution plan, we observe a significant detail: Sort Method: external merge Disk: 61216kB. This finding indicates that the database resorted to disk-based sorting instead of performing the operation in memory, suggesting a performance bottleneck with the SQL statement.

Upon further investigation into PostgreSQL’s parameter group configuration, we discovered that sorting operations are governed by the work_mem parameter, currently set to its default value of 4 MB. While this setting sufficed when our dataset was smaller, recent data growth has necessitated higher memory requirements. Our analysis revealed that the query now requires approximately 61 MB of memory for sorting operations. With only 4 MB of work_mem available, PostgreSQL is compelled to spill sort operations to disk, incurring high I/O costs and substantial performance degradation.

Based on our findings, we can optimize performance by increasing work_mem at the session or query level to 256 MB prior to executing the query. This adjustment allows Aurora PostgreSQL to utilize faster in-memory sort methods, such as quicksort, rather than relying on costly disk-based sorting. The following code illustrates this:

SET WORK_MEM = '256 MB';

EXPLAIN ANALYZE SELECT c.customer_name, o.order_date, SUM(oi.quantity * p.price) AS total_spent
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
JOIN order_items oi ON o.order_id = oi.order_id
JOIN products p ON oi.product_id = p.product_id
GROUP BY c.customer_name, o.order_date
ORDER BY total_spent DESC;

For additional insights on this subject, refer to Tune Sorting Operations in PostgreSQL with work_mem.

The subsequent screenshot demonstrates that the sort operation is now conducted in the database’s main memory, rather than on disk:

Here are the details of the execution plan as observed in the screenshot:

Execution plan -2135020023:
Sort (cost=1189605.29..1199605.29 rows=4000000 width=50) (actual time=12519.315..12811.902 rows=1689458 loops=1)
Sort Key: (sum((oi.quantity)::numeric * p.price))) DESC
Sort Method: quicksort Memory: 141532kB
-> HashAggregate (cost=509565.92..614253.42 rows=4000000 width=50) (actual time=9644.389..11540.141 rows=1689458 loops=1)
Group Key: c.customer_name, o.order_date
Planned Partitions: 4 Batches: 5 Memory Usage: 524288kB Disk Usage: 442432kB
-> Hash Join (cost=63853.00..160815.92 rows=4000000 width=28) (actual time=789.621..5887.597 rows=4000000 loops=1)
Hash Cond: (oi.product_id = p.product_id)
-> Hash Join (cost=63495.00..149973.47 rows=4000000 width=26) (actual time=787.241..4973.765 rows=4000000 loops=1)
Hash Cond: (o.customer_id = c.customer_id)
-> Hash Join (cost=59706.00..135884.02 rows=4000000 width=16) (actual time=760.789..3448.767 rows=4000000 loops=1)
Hash Cond: (order_id = order_id)
-> Seq Scan on order_items oi (cost=0.00..65478.00 rows=4000000 width=12) (actual time=0.005..721.684 rows=4000000 loops=1)
-> Hash (cost=34706.00..34706.00 rows=2000000 width=12) (actual time=751.073..751.075 rows=2000000 loops=1)
Buckets: 2097152 Batches: 1 Memory Usage: 102522kB
-> Seq Scan on orders o (cost=0.00..34706.00 rows=2000000 width=12) (actual time=0.012..226.164 rows=2000000 loops=1)
-> Hash (cost=2539.00..2539.00 rows=100000 width=18) (actual time=26.279..26.280 rows=100000 loops=1)
-> Seq Scan on customers c (cost=0.00..2539.00 rows=100000 width=18) (actual time=0.004..10.960 rows=100000 loops=1)
-> Hash (cost=213.00..213.00 rows=10000 width=10) (actual time=2.355..2.354 rows=10000 loops=1)
-> Seq Scan on products p (cost=0.00..213.00 rows=10000 width=10) (actual time=0.012..1.068 rows=10000 loops=1)

By leveraging CloudWatch Database Insights, we simplified the optimization process, gaining clear visualizations of query execution patterns that led us to identify the work_mem parameter as the root cause of performance issues.

Tech Optimizer