Improve PostgreSQL performance using the pgstattuple extension

As the digital landscape evolves, the management of extensive data sets has become a pivotal concern for businesses. In this context, PostgreSQL, an open-source relational database management system (RDBMS), has emerged as a formidable contender, adept at addressing intricate data challenges. A hallmark of PostgreSQL is its remarkable extensibility, allowing developers to augment database capabilities through a diverse array of extensions and plugins tailored to specific needs. These enhancements encompass functionalities such as spatial data support, full-text search, advanced data types, and tools for performance optimization. Among these, the pgstattuple extension often remains underappreciated, yet it holds the potential to unlock significant insights into the inner workings of PostgreSQL databases.

Overview of pgstattuple

The pgstattuple extension offers a suite of functions designed to extract detailed statistics at the tuple level from PostgreSQL tables and indexes. This functionality provides a glimpse into the physical storage layer, which standard PostgreSQL statistics views may overlook. Key metrics revealed by pgstattuple include:

  • tuple_count – The number of live tuples
  • dead_tuple_count – The number of dead tuples awaiting cleanup
  • tuple_len – The average length of live tuples in bytes
  • free_space – The total free space available in bytes
  • free_percent – The percentage of free space, with higher values indicating greater bloat
  • dead_tuple_len – The total length of dead tuples in bytes
  • dead_tuple_percent – The percentage of space occupied by dead tuples

These metrics serve as more than mere numbers; they act as an early warning system for potential database health and performance issues. By keeping an eye on these statistics, database administrators can proactively identify storage concerns that may be silently hindering performance, such as excessive table bloat or index fragmentation.

Using pgstattuple in Aurora and Amazon RDS

Both Amazon Aurora and Amazon RDS facilitate the use of the pgstattuple extension. To activate it, one must first create the extension within the database using the command CREATE EXTENSION pgstattuple;. Once enabled, functions like pgstattuple(relation) can be employed to retrieve details about the physical storage utilized by a table, including the number of pages, live tuples, and dead tuples. For quicker estimates, the pgstattuple_approx(relation) function can be utilized, while pgstatindex(index) provides insights into index statistics. Analyzing this low-level data can help pinpoint bloated tables in need of vacuuming, identify tables with high dead tuple ratios that may benefit from rewriting, and optimize physical storage utilization.

Detecting and managing table bloat

One of the most practical applications of pgstattuple is in identifying and managing bloat within PostgreSQL tables. Bloat occurs when UPDATE and DELETE operations leave behind unused space that is not automatically reclaimed. PostgreSQL employs a Multiversion Concurrency Control (MVCC) model to maintain data consistency, allowing each SQL statement to view a snapshot of data from a prior time, regardless of the current state of the underlying data. This approach prevents inconsistencies during concurrent transactions, ensuring transaction isolation per session.

In MVCC systems like PostgreSQL, when a row is deleted, it is not immediately removed from data pages. Instead, it is marked as deleted for the current transaction while remaining visible to older transaction snapshots. As transactions complete, these expired tuples are expected to be vacuumed, reclaiming space. An UPDATE operation effectively combines DELETE and INSERT, marking the old version as expired while inserting a new version of the row. Over time, these expired versions accumulate until the VACUUM process removes them, thereby reclaiming space.

The autovacuum process in PostgreSQL automates the maintenance of dead tuples and updates statistics used by the query planner. It triggers based on certain thresholds, ensuring that storage occupied by dead tuples is reclaimed. However, if autovacuum fails to clean up dead tuples for any reason, manual intervention may be necessary for highly bloated tables.

Dead tuples reside alongside live tuples in data pages, and bloat can also arise from free space remaining after autovacuum has cleaned up dead tuples. During query execution, PostgreSQL may scan additional pages filled with dead tuples, resulting in increased I/O and slower query performance. If autovacuum is unable to manage bloat effectively, a cleanup may be required.

Automating manual vacuum

Regular monitoring for bloat allows proactive identification of maintenance needs before performance is adversely affected. The metrics derived from pgstattuple can also assist in fine-tuning autovacuum settings for more aggressive cleanup when necessary. After identifying the top bloated tables, one can automate the VACUUM operation using the pg_cron extension. This cron-based job scheduler for PostgreSQL runs within the database, enabling the scheduling of PostgreSQL commands directly. For instance, the following code can be used to schedule a daily VACUUM on a specific table:

SELECT cron.schedule('manual vacuum', '0 23 * * *', 'VACUUM pgbench_accounts_test');

Diagnosing and resolving index bloat

Similar to tables, indexes in PostgreSQL can also suffer from bloat, which can waste space and degrade performance. The pgstattuple extension facilitates the detection of index bloat using pgstatindex. This function reveals the index identifier, total index size, and average leaf density, which indicates the percentage of useful data in the leaf pages of the index. Significantly bloated indexes can be rebuilt using REINDEX or pg_repack to eliminate dead space and restore optimal performance.

Best practices for using pgstattuple

When utilizing pgstattuple for PostgreSQL monitoring and maintenance, consider the following best practices:

  • Utilize the check_postgres query for estimating bloat in PostgreSQL tables, as mentioned on the PostgreSQL wiki.
  • Leverage the pgstattuple extension to analyze the physical storage of database tables, providing detailed statistics on space usage and bloat.
  • Rebuild significantly bloated tables and indexes to reclaim dead space.
  • Monitor dead_tuple_percent to identify fragmentation issues.
  • Prioritize maintenance on tables and indexes critical for workload performance.
  • Avoid running pgstattuple on highly active tables to minimize interference.
  • Utilize pgstattuple metrics to optimize autovacuum settings.
  • Combine pgstattuple insights with query analysis and logs for a comprehensive understanding of database performance.
Tech Optimizer
Improve PostgreSQL performance using the pgstattuple extension