How to catch and optimize problematic queries in PostgreSQL

May 19, 2025

If you work with PostgreSQL, performance issues may have crossed your path, especially as your database expands. Initially, everything might have run smoothly, but as your client database grew, you may have noticed a decline in query performance. This guide aims to assist you in identifying and resolving problematic queries, ensuring your PostgreSQL database operates at optimal efficiency.

Identifying problematic queries

To begin, we need to pinpoint which queries require optimization.

Long queries

Slow performance is often attributed to long-running queries. However, the definition of ‘long’ can vary: while a 30-second delay for loading a website header may be excessive, a lengthy search for books about elephants in a vast library could be acceptable. Here’s how to identify what we consider long queries:

Add the parameter log_min_duration_statement= to the postgresql.conf file. It’s essential to balance the threshold; a lower value logs more queries, consuming additional resources.
Update the configuration using SELECT pg_reload_conf(); or restart the server to apply the changes.
Test it with log_min_duration_statement=10000 (10 seconds) by executing two simple queries:

SELECT pg_sleep(5);
SELECT pg_sleep(12);

The first query executes in under 10 seconds, while the second triggers logging as follows:

[2263] LOG: duration: 12007.490 ms 
rows: 1 
size: 6 bytes statement:
SELECT pg_sleep(12);

This behavior is standard. For more detailed query information, further logging configurations can be set. Refer to the documentation for a comprehensive list of parameters.

Suspicious queries

To monitor the execution statistics of all SQL statements, enabling the pg_stat_statements module is beneficial:

Add shared_preload_libraries ='pg_stat_statements' to postgresql.conf.
Create the extension with CREATE EXTENSION pg_stat_statements;.

This module provides various parameters, with the most informative being:

query — the executed query text;
calls — the number of times the query was executed;
total_exec_time — the cumulative execution time of all calls;
mean_exec_time — average execution time;
stddev_exec_time — standard deviation in execution time;
rows — total number of rows returned or affected.

Queries exhibiting a high standard deviation can be flagged as suspicious, as they may execute quickly at times and slowly at others. Possible reasons include:

Execution time varies based on the number of rows: queries with fewer rows run swiftly, while those with many rows may require optimization.
Execution time is inconsistent regardless of row count, indicating a potential error in the query or external blocking factors.

Excessive queries

Sorting query statistics by total_exec_time helps identify queries that, while individually fast, collectively impose a significant load:

SELECT *
FROM pg_stat_statements
ORDER BY total_exec_time DESC;

This approach can reveal quick queries that cumulatively strain resources. Review them to see if:

The query is unnecessarily executed as a safeguard, such as a developer pinging the database with SELECT 1 before or after every table change.
The query repeatedly computes the same expression on infrequently changing data, which could be cached using a view to avoid redundant calculations.

Additionally, sorting queries by CPU usage can illustrate their overall performance impact:

SELECT  (100 * total_exec_time / sum(total_exec_time) OVER ()) AS cpu_perc
FROM pg_stat_statements
ORDER BY cpu_perc DESC;

For the curious

To broaden the scope of suspicious queries, consider utilizing the pgpro_stats utility. This tool stores query plans, collects statistics on wait events, resource usage, and cache invalidation, while also adjusting collection frequency. For more details, consult the documentation.

Real-time query monitoring

Next, let’s investigate what is currently running. Are any queries blocked? Are they executing optimally?

What’s running now?

The pg_stat_activity system view provides a row for each active backend:

SELECT pid, backend_type, state, query, wait_event_type, wait_event
FROM pg_stat_activity;

If no queries are active, the last executed query will be displayed.

Is a query blocked?

To check for blocked processes, you can utilize built-in IDE tools, third-party software, or construct your own query using system views. By querying pg_locks for lock information and pg_stat_activity for active processes, you can gain insights:

SELECT * FROM pg_locks
LEFT JOIN pg_stat_activity
ON pg_locks.pid = pg_stat_activity.pid;

A more refined query can clarify which query is blocking another. Although interpreting the results can be complex, searching for keywords like ‘pg_locks monitoring’ can yield queries that clearly indicate blocking processes.

Is the query executing optimally?

The EXPLAIN command can provide clarity on this matter. It displays the planner’s tree, where each node represents an operation at a specific stage, detailing the cost and estimated number of rows at that node. It’s important to note that EXPLAIN (without the ANALYZE flag) does not execute the query but describes its execution based on the planner’s statistics.

For instance, consider the output of the EXPLAIN command for a query calculating the maximum profit from sold goods:

EXPLAIN SELECT max(income.value - expense.value) 
FROM income FULL JOIN expense ON income.id = expense.id 
WHERE expense.value < income.value;

Query plans can be collected automatically using the auto_explain module, which logs execution plans for slow queries:

auto_explain.log_min_duration (integer) — the number of milliseconds after which the plan is logged;
auto_explain.log_analyze (boolean) — whether to save EXPLAIN or EXPLAIN ANALYZE.

For complex EXPLAIN results, query plan visualizers can enhance understanding. A simple search for ‘explain visualizer postgres’ will yield useful tools.

Let’s visualize the query as a tree.

According to the planner, both tables consist of 500,000 rows, which will be scanned sequentially. The tables are joined using the Hash Join method.

To extract more detailed information from the plan, the ANALYZE flag can be employed. Unlike EXPLAIN, which merely describes execution, EXPLAIN + ANALYZE actually runs the query. Keep in mind:

If you wish to avoid data modification while focusing on execution details, wrap the query in a transaction and perform a ROLLBACK.
EXPLAIN runs almost instantly, imposing minimal load, whereas EXPLAIN ANALYZE may consume time and resources to execute the SQL query.

In addition to estimated costs, rows, and width, actual execution data is included:

times — duration spent at each plan node;
rows — actual number of rows processed by the operator;
loops — number of processing cycles required.

The planner utilizes various methods for scanning and joining tables. To understand how it selects methods, we can examine the cost parameter in our plan.

Understanding cost

Cost is represented by two numbers:

The cost of initial preparation before fetching the first result row, expressed in arbitrary units. For sequential scans, this value is zero, as the first row is immediately available.
The total cost of the operation up to retrieving the last row.

Higher-level nodes await results from lower-level ones, causing total costs to accumulate at the root of the tree. In our example, lower nodes incur high costs due to sequential scans of large tables. Would indexing the tables enhance the query speed? In this case, the planner opts for sequential scanning over index scanning. To understand this choice, we must explore join methods.

Join methods complexity

Nested Loop requires no preparation (the first cost value is 0) and exhibits quadratic complexity.
Merge Join preparation cost depends on the number of rows in the node, demonstrating linear complexity.
Hash Join is similar to Merge Join but has a higher coefficient and necessitates more preparation time.

If the estimated row count is below N₁, a Nested Loop is chosen; if above N₂, a Hash Join is utilized; otherwise, a Merge Join is selected. While the planner’s choices are mathematically justified, selecting the optimal option isn’t always guaranteed due to two primary factors:

The estimated number of rows relies on statistics, which may be outdated or differ from actual data.
The actual cost of an operation can diverge from theoretical expectations. For instance, with a Hash Join, the cost function is linear, but if an inefficient hash function is employed or numerous duplicate identifiers exist, the cost may escalate to N₂.

Let’s visualize the estimated complexity of operations with dashed lines and the actual complexity with solid lines. The query planner bases its decisions on available information: if the estimated row count is less than N₁, it selects the first method (shown in purple), and if greater, it opts for the second method (shown in blue). However, the actual complexity of the ‘blue’ method may exceed expectations, yet if the estimated row count reaches N₁ or more, the planner will still favor it.

Forcing a different plan

Is it possible to compel the planner to choose a different scanning method? Each known method has a configuration parameter enable_. By default, all parameters are enabled. To disable sequential scanning, you can execute:

SET enable_seqscan = off;

This command does not entirely disable the method but adds a cost of 10 billion units.

After disabling sequential scanning, the planner was compelled to use index scanning, and the join method transitioned from Hash Join to Merge Join. Execution time decreased by 2.5 times! However, this approach should be used judiciously: the SET command affects the entire session, so remember to reset parameters after the query or utilize SET LOCAL within a transaction.

Optimizing query execution

Several extensions can enhance query execution:

sr_plan — allows saving a specific execution plan for reuse;
pg_hint_plan — enables execution plan control through hints in SQL comments;
AQO — employs machine learning to refine row count estimates.

While we’ve explored how to analyze query plans, what if a query has been running for an extended period? Should we cancel it or wait for a result? There’s no definitive answer, and unfortunately, we may have neglected to save the plan. However, the pg_query_state module allows you to view the results of the EXPLAIN ANALYZE command while the query is still executing. A detailed discussion of pg_query_state will follow in a separate article coinciding with the module’s new version release. Stay tuned for updates on the Postgres Professional blog.

Assessing Query Progress

To monitor query progress, PostgreSQL offers dynamic views pg_stat_progress_* for commands such as ANALYZE, CREATE INDEX, VACUUM, CLUSTER, Base Backup, and COPY. By executing two queries:

SELECT * FROM pg_stat_progress_vacuum;

just seconds apart, you can observe changes in processed rows and estimate the completion percentage.

For user processes, a similar tool is the pg_query_state module, previously mentioned.

Tech Optimizer