Optimize and troubleshoot database performance in Amazon Aurora PostgreSQL by analyzing execution plans using CloudWatch Database Insights

Amazon Web Services (AWS) has unveiled a robust array of monitoring tools aimed at enhancing the visibility of performance metrics and events for both Amazon Relational Database Service (Amazon RDS) and Amazon Aurora. This article delves into the utilization of Amazon CloudWatch Database Insights to dissect SQL execution plans, enabling users to troubleshoot and optimize SQL query performance within an Aurora PostgreSQL cluster.

PostgreSQL Query Optimizer and Query Access Plans

The PostgreSQL query optimizer serves as a fundamental element of the database engine, tasked with determining the most efficient execution method for SQL queries. Upon receiving a query, PostgreSQL does not execute it immediately; rather, it generates various potential execution strategies and selects the most optimal one based on cost estimations.

A query access plan outlines the step-by-step execution strategy chosen by the optimizer, detailing how PostgreSQL retrieves and processes data through techniques such as sequential scans, index scans, joins, and sorting operations. The EXPLAIN command can be employed to analyze these query plans, offering insights into execution paths.

For developers and database administrators, understanding the PostgreSQL query optimizer and query access plan is crucial for optimizing database performance and resource utilization. For further insights, refer to How PostgreSQL processes queries and how to analyze them.

Solution Overview

In December 2024, AWS launched CloudWatch Database Insights, a comprehensive monitoring solution designed for Aurora (both PostgreSQL and MySQL variants) and Amazon RDS engines, including PostgreSQL, MySQL, MariaDB, SQL Server, and Oracle. This observability tool is tailored for DevOps engineers, developers, and DBAs, facilitating the swift identification and resolution of database performance issues. By providing a unified view across database fleets, CloudWatch Database Insights enhances troubleshooting workflows and operational efficiency.

The tool features an Advanced mode and a Standard mode, with the SQL execution plan analysis feature available exclusively in Advanced mode. The following sections will guide you through enabling the SQL execution plan analysis feature in your Aurora PostgreSQL clusters and troubleshooting query performance issues stemming from plan changes.

For demonstration purposes, we will utilize a typical e-commerce application schema encompassing customer information, a product catalog, order details, and order line items, represented by the tables customers, products, orders, and order_items.

Prerequisites

To follow along with this post, additional configuration is necessary. Please refer to the prerequisites outlined in Analyzing execution plans with CloudWatch Database Insights for detailed information.

Analyze SQL Execution Plans

Let’s examine a typical query executed against this application and how we can analyze the execution plans through CloudWatch Database Insights. We initiate our investigation by accessing CloudWatch Database Insights via the CloudWatch console. After navigating to Database Views, we locate our Aurora PostgreSQL DB instance and scrutinize its performance metrics on the Top SQL tab.

The Plans Count column in the Top SQL table indicates the number of collected plans for each digest query. Users can customize column visibility and ordering by selecting the settings icon within the Top SQL table.

Next, we focus on the specific SQL statement executed by our e-commerce application. By selecting the digest query and reviewing its details on the SQL text tab, we can observe the exact SQL statement being executed. The Plans tab provides a detailed view of the query execution plans.

For comparative analysis, users can select up to two plans simultaneously, revealing critical differences in execution strategies. For instance, one plan may utilize a sequential scan approach, while another employs an index scan, offering valuable insights into query performance variations and resource utilization patterns.

Following are the details of the execution plans as observed:

First Execution plan -74374649:
Gather (cost=1000.00..266914.09 rows=95345 width=126) (actual time=0.264..307.518 rows=100799 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on person (cost=0.00..256379.59 rows=39727 width=126) (actual time=0.023..367.948 rows=33600 loops=3)
Filter: (age = 40)
Rows Removed by Filter: 3299734

Second Execution plan 616519750:
Gather (cost=3904.96..220491.70 rows=105342 width=126) (actual time=26.699..158.131 rows=100799 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Bitmap Heap Scan on person (cost=2904.96..208957.50 rows=43892 width=126) (actual time=21.423..209.867 rows=33600 loops=3)
Recheck Cond: (age = 40)
Rows Removed by Index Recheck: 534444
Heap Blocks: exact=6437 lossy=4705
-> Bitmap Index Scan on idx_person_age_date_active (cost=0.00..2878.62 rows=105342 width=0) (actual time=17.911..17.911 rows=100799 loops=1)
Index Cond: (age = 40)

Compare Execution Plans to Troubleshoot Performance Degradation

When investigating SQL query performance degradation, DBAs and DBEs often analyze query execution plans for changes in execution behavior that may affect performance, such as dropped indexes or inefficient join strategies. However, manually comparing complex, nested execution plans over time can be labor-intensive and prone to error. For example, consider the following SQL statement executed by our sample application, which retrieves order details along with associated customer and product information for orders placed after a specific date:

SELECT o.order_id, c.customer_name, p.product_name
FROM orders o
JOIN customers c ON c.customer_id = o.customer_id
JOIN order_items oi ON oi.order_id = o.order_id
JOIN products p ON p.product_id = oi.product_id
WHERE o.order_date > '2024-01-01';

To simulate a real-world scenario of accidental index deletion, we dropped some indexes on the orders and order_items tables, particularly those involved in join conditions, and updated the table statistics. These changes resulted in a suboptimal execution plan, causing the query to slow down significantly and adversely affecting overall application performance.

Utilizing CloudWatch Database Insights allows for the comparison of execution plans for the same SQL statement over time, simplifying troubleshooting by highlighting changes that may have led to performance degradation. The following screenshots illustrate the SQL query and its execution plan information from CloudWatch Database Insights:


Here are the details of the execution plans as seen in the screenshots:

First Execution plan 1772079466
Hash Join (cost=68830.55..165811.40 rows=3999608 width=30) (actual time=744.721..5734.411 rows=4000000 loops=1)
Hash Cond: (oi.product_id = p.product_id)
-> Hash Join (cost=92463.15..289295.41 rows=4000000 width=22) (actual time=888.183..8057.716 rows=4000000 loops=1)
Hash Cond: (o.customer_id = c.customer_id)
-> Hash Join (cost=87066.86..273398.67 rows=4000000 width=12) (actual time=839.384..6391.227 rows=4000000 loops=1)
Hash Cond: (oi.order_id = o.order_id)
-> Index Scan using idx_order_items_product on order_items oi (cost=0.43..175832.22 rows=4000000 width=8) (actual time=0.086..3222.440 rows=4000000 loops=1)
-> Hash (cost=62066.43..62066.43 rows=2000000 width=8) (actual time=829.311..829.313 rows=2000000 loops=1)
Buckets: 2097152  Batches: 1  Memory Usage: 94509kB
-> Bitmap Heap Scan on orders o (cost=22360.43..62066.43 rows=2000000 width=8) (actual time=64.681..330.625 rows=2000000 loops=1)
Recheck Cond: (order_date > '2024-01-01'::date)
Heap Blocks: exact=14706
-> Bitmap Index Scan on idx_orders_date (cost=0.00..21860.43 rows=2000000 width=0) (actual time=62.285..62.285 rows=2000000 loops=1)
Index Cond: (order_date > '2024-01-01'::date)
-> Hash (cost=4146.29..4146.29 rows=100000 width=18) (actual time=48.149..48.149 rows=100000 loops=1)
-> Index Scan using customers_pkey on customers c (cost=0.29..4146.29 rows=100000 width=18) (actual time=0.045..25.433 rows=100000 loops=1)
-> Hash (cost=386.29..386.29 rows=10000 width=16) (actual time=3.996..3.997 rows=10000 loops=1)
-> Index Scan using products_pkey on products p (cost=0.29..386.29 rows=10000 width=16) (actual time=0.035..2.257 rows=10000 loops=1)

Second Execution plan -1205501186
Hash Join (cost=92974.44..300311.14 rows=4000000 width=30) (actual time=892.288..8840.175 rows=4000000 loops=1)
Hash Cond: (oi.product_id = p.product_id)
-> Hash Join (cost=68492.55..154969.99 rows=3999608 width=22) (actual time=742.450..4850.368 rows=4000000 loops=1)
Hash Cond: (o.customer_id = c.customer_id)
-> Hash Join (cost=64703.55..140681.57 rows=3999608 width=12) (actual time=713.232..3392.665 rows=4000000 loops=1)
Hash Cond: (oi.order_id = o.order_id)
-> Seq Scan on order_items oi (cost=0.00..65478.00 rows=4000000 width=8) (actual time=0.005..724.224 rows=4000000 loops=1)
-> Hash (cost=39706.00..39706.00 rows=1999804 width=8) (actual time=705.539..705.540 rows=2000000 loops=1)
Buckets: 2097152  Batches: 1  Memory Usage: 94509kB
-> Seq Scan on orders o (cost=0.00..39706.00 rows=1999804 width=8) (actual time=0.010..241.930 rows=2000000 loops=1)
Filter: (order_date > '2024-01-01'::date)
-> Hash (cost=2539.00..2539.00 rows=100000 width=18) (actual time=29.162..29.162 rows=100000 loops=1)
-> Seq Scan on customers c (cost=0.00..2539.00 rows=100000 width=18) (actual time=0.003..10.732 rows=100000 loops=1)
-> Hash (cost=213.00..213.00 rows=10000 width=16) (actual time=2.246..2.246 rows=10000 loops=1)
-> Seq Scan on products p (cost=0.00..213.00 rows=10000 width=16) (actual time=0.006..0.966 rows=10000 loops=1)

The execution plan differences reveal significant insights into performance. The first execution plan demonstrates an efficient index scan, while the second plan resorts to sequential scans due to the absence of necessary indexes, leading to a marked increase in total query cost and execution time. By leveraging CloudWatch Database Insights, we can effectively pinpoint and address query performance bottlenecks.

Analyze Execution Plans to Optimize Query Performance

Analyzing a SQL query execution plan yields profound insights into how the database engine processes a query. By scrutinizing key elements such as join types, scan methods, sorting techniques, estimated and actual row counts, and operator costs, inefficiencies and performance bottlenecks can be identified. This analysis aids in uncovering issues like missing indexes, suboptimal join orders, outdated statistics, and misconfigured database parameters, allowing for fine-tuning of queries to enhance performance.

Consider the following SQL statement executed by our e-commerce application, which summarizes customer spending per order date, sorted from highest to lowest total amount spent. Database monitoring has flagged degraded query execution times as a contributing factor to application slowdowns. We will illustrate how to optimize query performance using CloudWatch Database Insights, employing EXPLAIN ANALYZE to examine the actual execution plan and identify improvement opportunities:

SELECT c.customer_name, o.order_date, SUM(oi.quantity * p.price) AS total_spent
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
JOIN order_items oi ON o.order_id = oi.order_id
JOIN products p ON oi.product_id = p.product_id
GROUP BY c.customer_name, o.order_date
ORDER BY total_spent DESC;

The execution plan in CloudWatch Database Insights reveals critical information:

Execution plan -2135020023
Sort (cost=1384921.78..1394921.78 rows=4000000 width=50) (actual time=12885.186..13450.690 rows=1689458 loops=1)
Sort Key: (sum(((oi.quantity)::numeric * p.price))) DESC
Sort Method: external merge  Disk: 61216kB
-> HashAggregate (cost=568161.91..672849.41 rows=4000000 width=50) (actual time=8197.706..11103.900 rows=1689458 loops=1)
Group Key: c.customer_name, o.order_date
Planned Partitions: 256  Batches: 257  Memory Usage: 8465kB  Disk Usage: 245544kB
-> Hash Join (cost=73599.00..219411.91 rows=4000000 width=28) (actual time=576.103..5469.856 rows=4000000 loops=1)
Hash Cond: (oi.product_id = p.product_id)
-> Hash Join (cost=73261.00..208569.47 rows=4000000 width=26) (actual time=573.849..4605.587 rows=4000000 loops=1)
Hash Cond: (o.customer_id = c.customer_id)
-> Hash Join (cost=69472.00..194280.02 rows=4000000 width=16) (actual time=547.779..3304.456 rows=4000000 loops=1)
Hash Cond: (oi.order_id = o.order_id)
-> Seq Scan on order_items oi (cost=0.00..65478.00 rows=4000000 width=12) (actual time=0.004..474.077 rows=4000000 loops=1)
-> Hash (cost=34706.00..34706.00 rows=2000000 width=12) (actual time=547.369..547.370 rows=2000000 loops=1)
Buckets: 262144  Batches: 16  Memory Usage: 7427kB
-> Seq Scan on orders o (cost=0.00..34706.00 rows=2000000 width=12) (actual time=0.003..223.433 rows=2000000 loops=1)
-> Hash (cost=2539.00..2539.00 rows=100000 width=18) (actual time=25.914..25.915 rows=100000 loops=1)
-> Seq Scan on customers c (cost=0.00..2539.00 rows=100000 width=18) (actual time=0.003..10.583 rows=100000 loops=1)
-> Hash (cost=213.00..213.00 rows=10000 width=10) (actual time=2.228..2.228 rows=10000 loops=1)
-> Seq Scan on products p (cost=0.00..213.00 rows=10000 width=10) (actual time=0.008..1.059 rows=10000 loops=1)

The execution plan indicates a significant clue: Sort Method: external merge Disk: 61216kB. This finding suggests that the database resorted to disk-based sorting instead of utilizing in-memory resources, indicating a performance bottleneck.

Upon further investigation into PostgreSQL’s parameter group configuration, we discovered that sorting operations are governed by the work_mem parameter, currently set to its default value of 4 MB. While this was adequate for smaller datasets, recent growth has necessitated a reevaluation of memory requirements. Our analysis indicated that the query now requires approximately 61 MB of memory for sorting operations. With only 4 MB of work_mem available, PostgreSQL is compelled to spill sort operations to disk, resulting in elevated I/O costs and significant performance degradation.

To optimize performance, we can increase the work_mem at the session or query level to 256 MB prior to executing the query, enabling Aurora PostgreSQL to leverage faster in-memory sort methods like quicksort instead of costly disk-based sorting:

SET WORK_MEM = '256 MB';

EXPLAIN ANALYZE SELECT c.customer_name, o.order_date, SUM(oi.quantity * p.price) AS total_spent
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
JOIN order_items oi ON o.order_id = oi.order_id
JOIN products p ON oi.product_id = p.product_id
GROUP BY c.customer_name, o.order_date
ORDER BY total_spent DESC;

For additional insights on this topic, see Tune sorting operations in PostgreSQL with work_mem.

The following screenshot illustrates that the sort operation is now occurring in the database’s main memory rather than on disk:

Execution plan -2135020023
Sort (cost=1189605.29..1199605.29 rows=4000000 width=50) (actual time=12519.315..12811.902 rows=1689458 loops=1)
Sort Key: (sum((oi.quantity)::numeric * p.price))) DESC
Sort Method: quicksort Memory: 141532kB
-> HashAggregate (cost=509565.92..614253.42 rows=4000000 width=50) (actual time=9644.389..11540.141 rows=1689458 loops=1)
Group Key: c.customer_name, o.order_date
Planned Partitions: 4 Batches: 5 Memory Usage: 524288kB Disk Usage: 442432kB
-> Hash Join (cost=63853.00..160815.92 rows=4000000 width=28) (actual time=789.621..5887.597 rows=4000000 loops=1)
Hash Cond: (oi.product_id = p.product_id)
-> Hash Join (cost=63495.00..149973.47 rows=4000000 width=26) (actual time=787.241..4973.765 rows=4000000 loops=1)
Hash Cond: (o.customer_id = c.customer_id)
-> Hash Join (cost=59706.00..135884.02 rows=4000000 width=16) (actual time=760.789..3448.767 rows=4000000 loops=1)
Hash Cond: (order_id = order_id)
-> Seq Scan on order_items oi (cost=0.00..65478.00 rows=4000000 width=12) (actual time=0.005..721.684 rows=4000000 loops=1)
-> Hash (cost=34706.00..34706.00 rows=2000000 width=12) (actual time=751.073..751.075 rows=2000000 loops=1)
Buckets: 2097152 Batches: 1 Memory Usage: 102522kB
-> Seq Scan on orders o (cost=0.00..34706.00 rows=2000000 width=12) (actual time=0.012..226.164 rows=2000000 loops=1)
-> Hash (cost=2539.00..2539.00 rows=100000 width=18) (actual time=26.279..26.280 rows=100000 loops=1)
-> Seq Scan on customers c (cost=0.00..2539.00 rows=100000 width=18) (actual time=0.004..10.960 rows=100000 loops=1)
-> Hash (cost=213.00..213.00 rows=10000 width=10) (actual time=2.355..2.354 rows=10000 loops=1)
-> Seq Scan on products p (cost=0.00..213.00 rows=10000 width=10) (actual time=0.012..1.068 rows=10000 loops=1)

By utilizing CloudWatch Database Insights, we streamlined the optimization process, gaining clear visibility into query execution patterns and identifying the work_mem parameter as the root cause of performance issues.

Tech Optimizer
Optimize and troubleshoot database performance in Amazon Aurora PostgreSQL by analyzing execution plans using CloudWatch Database Insights