The project focuses on enabling temporary tables in PostgreSQL's parallel query execution by benchmarking sequential writes and reads of temporary buffers. Functions were introduced to measure buffer flush operations, and tests were conducted to estimate the cost of flushing temporary table buffer pages to disk. Findings indicate that sequential writes are about 30% slower than reads, leading to a proposed cost model. Temporary tables have limited parallel usage due to their function as relational variables and the complexity of parallel workers accessing the leader process's local state. Two enterprise PostgreSQL distributions already support parallel operations on temporary tables, and discussions on logical replication of DDL highlight the need for improved tooling for temporary tables.
A significant challenge is that temporary buffer pages are local to the leader process, and if not flushed to disk, parallel workers cannot access them. A comment from 2015 suggests flushing the leader's temporary buffers to disk before parallel operations to allow workers to scan the table in parallel. However, this raises concerns about the cost of writing buffers versus the benefits of parallelism.
The benchmarking toolkit includes modifications to PostgreSQL to facilitate measurement of local buffers. Two key commits added statistics infrastructure to track allocated and dirty local buffers and UI functions to manage buffer operations. The methodology involves creating a temporary table, flushing buffers, measuring I/O, and evaluating scan overhead. The tests reveal that the cost of writing a temporary table approximates the cost of a sequential page, leading to a proposed formula for flushing costs.
The results show that larger buffer sizes lead to higher write overhead and increased variability. A comprehensive formula for pre-flushing temp buffers is suggested, factoring in the number of dirtied and allocated local buffers. Next steps include adding planner flags for temporary objects, implementing executor support for worker access to temporary table storage, introducing a temp-buffer flush operation, and integrating the cost model into the planner for parallel scans of temporary tables.