If you work with PostgreSQL, performance issues may have crossed your path, especially as your database expands. Initially, everything might have run smoothly, but as your client database grew, you may have noticed a decline in query performance. This guide aims to assist you in identifying and resolving problematic queries, ensuring your PostgreSQL database operates at optimal efficiency.
Identifying problematic queries
To begin, we need to pinpoint which queries require optimization.
Long queries
Slow performance is often attributed to long-running queries. However, the definition of ‘long’ can vary: while a 30-second delay for loading a website header may be excessive, a lengthy search for books about elephants in a vast library could be acceptable. Here’s how to identify what we consider long queries:
- Add the parameter
log_min_duration_statement=
to the postgresql.conf
file. It’s essential to balance the threshold; a lower value logs more queries, consuming additional resources.
- Update the configuration using
SELECT pg_reload_conf();
or restart the server to apply the changes.
- Test it with
log_min_duration_statement=10000
(10 seconds) by executing two simple queries:
SELECT pg_sleep(5);
SELECT pg_sleep(12);
The first query executes in under 10 seconds, while the second triggers logging as follows:
[2263] LOG: duration: 12007.490 ms
rows: 1
size: 6 bytes statement:
SELECT pg_sleep(12);
This behavior is standard. For more detailed query information, further logging configurations can be set. Refer to the documentation for a comprehensive list of parameters.
Suspicious queries
To monitor the execution statistics of all SQL statements, enabling the pg_stat_statements
module is beneficial:
- Add
shared_preload_libraries ='pg_stat_statements'
to postgresql.conf
.
- Create the extension with
CREATE EXTENSION pg_stat_statements;
.
This module provides various parameters, with the most informative being:
- query — the executed query text;
- calls — the number of times the query was executed;
- total_exec_time — the cumulative execution time of all calls;
- mean_exec_time — average execution time;
- stddev_exec_time — standard deviation in execution time;
- rows — total number of rows returned or affected.
Queries exhibiting a high standard deviation can be flagged as suspicious, as they may execute quickly at times and slowly at others. Possible reasons include:
- Execution time varies based on the number of rows: queries with fewer rows run swiftly, while those with many rows may require optimization.
- Execution time is inconsistent regardless of row count, indicating a potential error in the query or external blocking factors.
Excessive queries
Sorting query statistics by total_exec_time
helps identify queries that, while individually fast, collectively impose a significant load:
SELECT *
FROM pg_stat_statements
ORDER BY total_exec_time DESC;
This approach can reveal quick queries that cumulatively strain resources. Review them to see if:
- The query is unnecessarily executed as a safeguard, such as a developer pinging the database with
SELECT 1
before or after every table change.
- The query repeatedly computes the same expression on infrequently changing data, which could be cached using a view to avoid redundant calculations.
Additionally, sorting queries by CPU usage can illustrate their overall performance impact:
SELECT (100 * total_exec_time / sum(total_exec_time) OVER ()) AS cpu_perc
FROM pg_stat_statements
ORDER BY cpu_perc DESC;
For the curious
To broaden the scope of suspicious queries, consider utilizing the pgpro_stats
utility. This tool stores query plans, collects statistics on wait events, resource usage, and cache invalidation, while also adjusting collection frequency. For more details, consult the documentation.
Real-time query monitoring
Next, let’s investigate what is currently running. Are any queries blocked? Are they executing optimally?
What’s running now?
The pg_stat_activity
system view provides a row for each active backend:
SELECT pid, backend_type, state, query, wait_event_type, wait_event
FROM pg_stat_activity;
If no queries are active, the last executed query will be displayed.
Is a query blocked?
To check for blocked processes, you can utilize built-in IDE tools, third-party software, or construct your own query using system views. By querying pg_locks
for lock information and pg_stat_activity
for active processes, you can gain insights:
SELECT * FROM pg_locks
LEFT JOIN pg_stat_activity
ON pg_locks.pid = pg_stat_activity.pid;
A more refined query can clarify which query is blocking another. Although interpreting the results can be complex, searching for keywords like ‘pg_locks monitoring’ can yield queries that clearly indicate blocking processes.
Is the query executing optimally?
The EXPLAIN
command can provide clarity on this matter. It displays the planner’s tree, where each node represents an operation at a specific stage, detailing the cost and estimated number of rows at that node. It’s important to note that EXPLAIN
(without the ANALYZE
flag) does not execute the query but describes its execution based on the planner’s statistics.
For instance, consider the output of the EXPLAIN
command for a query calculating the maximum profit from sold goods:
EXPLAIN SELECT max(income.value - expense.value)
FROM income FULL JOIN expense ON income.id = expense.id
WHERE expense.value < income.value;
Query plans can be collected automatically using the auto_explain module, which logs execution plans for slow queries:
- auto_explain.log_min_duration (integer) — the number of milliseconds after which the plan is logged;
- auto_explain.log_analyze (boolean) — whether to save
EXPLAIN
or EXPLAIN ANALYZE
.
For complex EXPLAIN
results, query plan visualizers can enhance understanding. A simple search for ‘explain visualizer postgres’ will yield useful tools.
Let’s visualize the query as a tree.
According to the planner, both tables consist of 500,000 rows, which will be scanned sequentially. The tables are joined using the Hash Join method.
To extract more detailed information from the plan, the ANALYZE
flag can be employed. Unlike EXPLAIN
, which merely describes execution, EXPLAIN + ANALYZE
actually runs the query. Keep in mind:
- If you wish to avoid data modification while focusing on execution details, wrap the query in a transaction and perform a
ROLLBACK
.
EXPLAIN
runs almost instantly, imposing minimal load, whereas EXPLAIN ANALYZE
may consume time and resources to execute the SQL query.
In addition to estimated costs, rows, and width, actual execution data is included:
- times — duration spent at each plan node;
- rows — actual number of rows processed by the operator;
- loops — number of processing cycles required.
The planner utilizes various methods for scanning and joining tables. To understand how it selects methods, we can examine the cost parameter in our plan.
Understanding cost
Cost is represented by two numbers:
- The cost of initial preparation before fetching the first result row, expressed in arbitrary units. For sequential scans, this value is zero, as the first row is immediately available.
- The total cost of the operation up to retrieving the last row.
Higher-level nodes await results from lower-level ones, causing total costs to accumulate at the root of the tree. In our example, lower nodes incur high costs due to sequential scans of large tables. Would indexing the tables enhance the query speed? In this case, the planner opts for sequential scanning over index scanning. To understand this choice, we must explore join methods.
Join methods complexity
- Nested Loop requires no preparation (the first cost value is 0) and exhibits quadratic complexity.
- Merge Join preparation cost depends on the number of rows in the node, demonstrating linear complexity.
- Hash Join is similar to Merge Join but has a higher coefficient and necessitates more preparation time.
If the estimated row count is below N1, a Nested Loop is chosen; if above N2, a Hash Join is utilized; otherwise, a Merge Join is selected. While the planner’s choices are mathematically justified, selecting the optimal option isn’t always guaranteed due to two primary factors:
- The estimated number of rows relies on statistics, which may be outdated or differ from actual data.
- The actual cost of an operation can diverge from theoretical expectations. For instance, with a Hash Join, the cost function is linear, but if an inefficient hash function is employed or numerous duplicate identifiers exist, the cost may escalate to N2.
Let’s visualize the estimated complexity of operations with dashed lines and the actual complexity with solid lines. The query planner bases its decisions on available information: if the estimated row count is less than N1, it selects the first method (shown in purple), and if greater, it opts for the second method (shown in blue). However, the actual complexity of the ‘blue’ method may exceed expectations, yet if the estimated row count reaches N1 or more, the planner will still favor it.
Forcing a different plan
Is it possible to compel the planner to choose a different scanning method? Each known method has a configuration parameter enable_
. By default, all parameters are enabled. To disable sequential scanning, you can execute:
SET enable_seqscan = off;
This command does not entirely disable the method but adds a cost of 10 billion units.
After disabling sequential scanning, the planner was compelled to use index scanning, and the join method transitioned from Hash Join to Merge Join. Execution time decreased by 2.5 times! However, this approach should be used judiciously: the SET
command affects the entire session, so remember to reset parameters after the query or utilize SET LOCAL
within a transaction.
Optimizing query execution
Several extensions can enhance query execution:
- sr_plan — allows saving a specific execution plan for reuse;
- pg_hint_plan — enables execution plan control through hints in SQL comments;
- AQO — employs machine learning to refine row count estimates.
While we’ve explored how to analyze query plans, what if a query has been running for an extended period? Should we cancel it or wait for a result? There’s no definitive answer, and unfortunately, we may have neglected to save the plan. However, the pg_query_state
module allows you to view the results of the EXPLAIN ANALYZE
command while the query is still executing. A detailed discussion of pg_query_state
will follow in a separate article coinciding with the module’s new version release. Stay tuned for updates on the Postgres Professional blog.
Assessing Query Progress
To monitor query progress, PostgreSQL offers dynamic views pg_stat_progress_*
for commands such as ANALYZE, CREATE INDEX, VACUUM, CLUSTER, Base Backup, and COPY. By executing two queries:
SELECT * FROM pg_stat_progress_vacuum;
just seconds apart, you can observe changes in processed rows and estimate the completion percentage.
For user processes, a similar tool is the pg_query_state
module, previously mentioned.
How to catch and optimize problematic queries in PostgreSQL
If you work with PostgreSQL, performance issues may have crossed your path, especially as your database expands. Initially, everything might have run smoothly, but as your client database grew, you may have noticed a decline in query performance. This guide aims to assist you in identifying and resolving problematic queries, ensuring your PostgreSQL database operates at optimal efficiency.
Identifying problematic queries
To begin, we need to pinpoint which queries require optimization.
Long queries
Slow performance is often attributed to long-running queries. However, the definition of ‘long’ can vary: while a 30-second delay for loading a website header may be excessive, a lengthy search for books about elephants in a vast library could be acceptable. Here’s how to identify what we consider long queries:
log_min_duration_statement=
to thepostgresql.conf
file. It’s essential to balance the threshold; a lower value logs more queries, consuming additional resources.SELECT pg_reload_conf();
or restart the server to apply the changes.log_min_duration_statement=10000
(10 seconds) by executing two simple queries:The first query executes in under 10 seconds, while the second triggers logging as follows:
This behavior is standard. For more detailed query information, further logging configurations can be set. Refer to the documentation for a comprehensive list of parameters.
Suspicious queries
To monitor the execution statistics of all SQL statements, enabling the
pg_stat_statements
module is beneficial:shared_preload_libraries ='pg_stat_statements'
topostgresql.conf
.CREATE EXTENSION pg_stat_statements;
.This module provides various parameters, with the most informative being:
Queries exhibiting a high standard deviation can be flagged as suspicious, as they may execute quickly at times and slowly at others. Possible reasons include:
Excessive queries
Sorting query statistics by
total_exec_time
helps identify queries that, while individually fast, collectively impose a significant load:This approach can reveal quick queries that cumulatively strain resources. Review them to see if:
SELECT 1
before or after every table change.Additionally, sorting queries by CPU usage can illustrate their overall performance impact:
For the curious
To broaden the scope of suspicious queries, consider utilizing the
pgpro_stats
utility. This tool stores query plans, collects statistics on wait events, resource usage, and cache invalidation, while also adjusting collection frequency. For more details, consult the documentation.Real-time query monitoring
Next, let’s investigate what is currently running. Are any queries blocked? Are they executing optimally?
What’s running now?
The
pg_stat_activity
system view provides a row for each active backend:If no queries are active, the last executed query will be displayed.
Is a query blocked?
To check for blocked processes, you can utilize built-in IDE tools, third-party software, or construct your own query using system views. By querying
pg_locks
for lock information andpg_stat_activity
for active processes, you can gain insights:A more refined query can clarify which query is blocking another. Although interpreting the results can be complex, searching for keywords like ‘pg_locks monitoring’ can yield queries that clearly indicate blocking processes.
Is the query executing optimally?
The
EXPLAIN
command can provide clarity on this matter. It displays the planner’s tree, where each node represents an operation at a specific stage, detailing the cost and estimated number of rows at that node. It’s important to note thatEXPLAIN
(without theANALYZE
flag) does not execute the query but describes its execution based on the planner’s statistics.For instance, consider the output of the
EXPLAIN
command for a query calculating the maximum profit from sold goods:Query plans can be collected automatically using the auto_explain module, which logs execution plans for slow queries:
EXPLAIN
orEXPLAIN ANALYZE
.For complex
EXPLAIN
results, query plan visualizers can enhance understanding. A simple search for ‘explain visualizer postgres’ will yield useful tools.Let’s visualize the query as a tree.
According to the planner, both tables consist of 500,000 rows, which will be scanned sequentially. The tables are joined using the Hash Join method.
To extract more detailed information from the plan, the
ANALYZE
flag can be employed. UnlikeEXPLAIN
, which merely describes execution,EXPLAIN + ANALYZE
actually runs the query. Keep in mind:ROLLBACK
.EXPLAIN
runs almost instantly, imposing minimal load, whereasEXPLAIN ANALYZE
may consume time and resources to execute the SQL query.In addition to estimated costs, rows, and width, actual execution data is included:
The planner utilizes various methods for scanning and joining tables. To understand how it selects methods, we can examine the cost parameter in our plan.
Understanding cost
Cost is represented by two numbers:
Higher-level nodes await results from lower-level ones, causing total costs to accumulate at the root of the tree. In our example, lower nodes incur high costs due to sequential scans of large tables. Would indexing the tables enhance the query speed? In this case, the planner opts for sequential scanning over index scanning. To understand this choice, we must explore join methods.
Join methods complexity
If the estimated row count is below N1, a Nested Loop is chosen; if above N2, a Hash Join is utilized; otherwise, a Merge Join is selected. While the planner’s choices are mathematically justified, selecting the optimal option isn’t always guaranteed due to two primary factors:
Let’s visualize the estimated complexity of operations with dashed lines and the actual complexity with solid lines. The query planner bases its decisions on available information: if the estimated row count is less than N1, it selects the first method (shown in purple), and if greater, it opts for the second method (shown in blue). However, the actual complexity of the ‘blue’ method may exceed expectations, yet if the estimated row count reaches N1 or more, the planner will still favor it.
Forcing a different plan
Is it possible to compel the planner to choose a different scanning method? Each known method has a configuration parameter
enable_
. By default, all parameters are enabled. To disable sequential scanning, you can execute:This command does not entirely disable the method but adds a cost of 10 billion units.
After disabling sequential scanning, the planner was compelled to use index scanning, and the join method transitioned from Hash Join to Merge Join. Execution time decreased by 2.5 times! However, this approach should be used judiciously: the
SET
command affects the entire session, so remember to reset parameters after the query or utilizeSET LOCAL
within a transaction.Optimizing query execution
Several extensions can enhance query execution:
While we’ve explored how to analyze query plans, what if a query has been running for an extended period? Should we cancel it or wait for a result? There’s no definitive answer, and unfortunately, we may have neglected to save the plan. However, the
pg_query_state
module allows you to view the results of theEXPLAIN ANALYZE
command while the query is still executing. A detailed discussion ofpg_query_state
will follow in a separate article coinciding with the module’s new version release. Stay tuned for updates on the Postgres Professional blog.Assessing Query Progress
To monitor query progress, PostgreSQL offers dynamic views
pg_stat_progress_*
for commands such as ANALYZE, CREATE INDEX, VACUUM, CLUSTER, Base Backup, and COPY. By executing two queries:just seconds apart, you can observe changes in processed rows and estimate the completion percentage.
For user processes, a similar tool is the
pg_query_state
module, previously mentioned.