pg_stat_statements is a PostgreSQL extension for monitoring query statistics, tracking execution counts, execution times, and row returns. It uses a hash table to store metrics for each query, identified by a key generated from four parameters: queryid, User OID, Database OID, and a toplevel flag. The extension can become a performance hindrance in high contention environments due to locking operations on the hash table.
When numerous unique queries are executed, contention for the hash table can lead to significant performance drops. For example, with pg_stat_statements enabled, a system with 48 CPUs showed a TPS drop from 237,437 to 32,112 when executing unique queries. In contrast, with a high volume of similar queries on a 192 CPU machine, enabling pg_stat_statements resulted in a TPS of 484,338 compared to 1,015,425 with it disabled.
Query sampling is introduced as a method to mitigate performance issues by recording metrics for only a fraction of executed queries. The pg_stat_statements.sample_rate parameter allows configuration of the proportion of queries tracked. However, sampling can lead to incomplete data and potential security risks if sensitive information is recorded in non-normalized form.
Benchmark tests with varying sample_rate values showed that as the sample_rate decreased, TPS increased, and SpinDelay diminished. At a sample_rate of 1.0, TPS was lowest, while at 0.25 and below, SpinDelay effectively disappeared, indicating that sampling can significantly enhance performance under high contention conditions.