Databricks boosts Postgres writes 5x

Databricks has introduced a noteworthy enhancement to its managed Postgres service, utilizing its lakebase architecture to achieve write throughput improvements of up to five times. This innovation addresses a significant bottleneck in high-scale Postgres applications by reimagining the approach to data durability.

In traditional Postgres, durability mechanisms impose considerable overhead. To safeguard against data corruption during unexpected crashes, Postgres employs a method known as full page writes (FPW), which logs entire data pages prior to writing them to disk. This process can inflate Write-Ahead Log (WAL) volume by as much as 15 times in write-heavy scenarios, ultimately becoming a limiting factor for performance.

Eliminating the ‘torn page’ problem

The lakebase architecture innovatively decouples compute from storage. In this framework, compute nodes operate statelessly, streaming WAL records to a distributed quorum of safekeepers. This design effectively mitigates the risk of torn pages on local disks, as there is no local data directory to contend with.

However, the decision to disable FPW introduces a new challenge regarding read performance. Without the periodic generation of full page images, reconstructing data pages for read operations could necessitate replaying an unbounded series of small changes, which could significantly elevate latency and resource consumption.

Image generation pushed to storage

Databricks has adeptly addressed this issue by transferring the intelligence to the storage layer. The pageserver now reconstructs data pages by identifying the latest materialized image and replaying the corresponding WAL deltas. Notably, it generates new full page images only when a page has accumulated a substantial number of delta records without an intervening image. This ‘image generation pushdown’ is driven by actual page changes rather than the conventional Postgres checkpoint process.

This strategic shift results in impressive performance enhancements. Benchmark tests reveal that compute nodes now transmit only compact deltas, leading to a remarkable 94% reduction in WAL traffic. The workload has been effectively offloaded from the singular Postgres writer to a distributed storage layer that can scale independently.

Quantifiable performance leaps

Benchmarking efforts utilizing HammerDB TPROC-C have showcased extraordinary advancements. On a 32-vCPU instance, write throughput surged by over 4.5 times, while WAL generation decreased from 58KB per transaction to under 4KB. These improvements have been mirrored in real-world production settings, where a 56-vCPU instance experienced a dramatic decline in steady-state WAL generation from 30 MB/s to just 1 MB/s. This reduction has been linked to enhanced transaction throughput during peak operational loads.

Moreover, read latencies have also seen significant enhancements. P99 read latencies have dropped by 30% to 50%, with P50 latencies improving by approximately 30%. In the case of Synced Tables, one customer reported a threefold increase in ingestion throughput, escalating from 17,000 to 62,000 rows per second.

Seamless deployment

This optimization has been seamlessly integrated across Databricks’ entire fleet for Serverless and Neon databases. The implementation was executed without requiring any restarts or interruptions for customers, facilitated through the control plane.

This development marks a significant trend toward offloading demanding tasks from transactional workloads to scalable background storage systems, effectively alleviating the ‘write tax’ commonly associated with managed Postgres.

Tech Optimizer