How lakebase architecture delivers 5x faster Postgres writes

In a lakebase, the architecture distinctly separates compute and storage, a design choice that not only enhances operational flexibility—allowing for scaling, branching, and instant recovery—but also opens the door to significant performance improvements. This decoupling enables the offloading of tasks from Postgres compute to a distributed storage system, a feat that traditional monolithic Postgres deployments cannot achieve. This article delves into how we leveraged this architectural advantage to overcome a long-standing bottleneck in Postgres performance.

The hidden cost of traditional Postgres durability

To appreciate the remarkable 5x enhancement in managed Postgres performance, one must first understand the durability mechanisms inherent in traditional Postgres systems. Each write operation is initially logged in a write-ahead log (WAL). To manage crash recovery time, Postgres periodically flushes modified pages to disk through a checkpoint process. Recovery involves loading the page from disk and redoing any WAL records that have altered the page since its last write. However, if a system crash occurs during this process, the page on disk may become “torn,” leading to potential corruption. To mitigate this risk, Postgres logs the entire 8 KB full page image (FPW) on the first write to any page post-checkpoint, ensuring that recovery operates on an intact base. In write-heavy scenarios, this FPW overhead can inflate WAL volume by up to 15x, becoming a significant cost in the write path for many workloads.

The lakebase solution: eliminating the risk of torn pages

Within the lakebase architecture, the compute layer is designed to be stateless, devoid of reliance on a local data directory. Instead, it streams WAL to a Paxos-based quorum of safekeepers. This design eliminates the risk of torn pages, as there is no local-disk page to compromise. However, simply disabling FPW introduces a new challenge: read performance. Without the periodic full page images in the log, the storage layer would need to replay an unbounded series of small deltas to reconstruct a page for read requests, resulting in increased read latency and resource consumption.

Innovation: image generation pushdown to distributed storage

We addressed this challenge by transferring the intelligence from the compute node to the storage layer, a process we refer to as image generation pushdown. When a Postgres compute node requests a page from storage, the pageserver—a component of the lakebase distributed storage system—reconstructs the page by locating the most recent materialized image and applying any WAL deltas. The full page images that were previously embedded in the WAL served as periodic reset points in the delta chain, maintaining a reasonable bound on the chain and ensuring fast reads. For a comprehensive exploration of this mechanism, refer to Deep dive into Neon storage engine.

With FPW disabled, these reset points vanish. Without additional intelligence in the distributed storage system, frequently updated pages could accumulate extensive chains of small deltas without any intervening images, leading to undesirable increases in read latency and resource consumption as the pageserver attempts to replay the entire chain to fulfill a read request. To counter this, we shifted the responsibility of image generation from the compute’s WAL stream to the storage layer, thereby preserving the bounded read behavior while eliminating WAL overhead on the compute side. The pageserver now generates full page images when a page amasses more delta records than a predefined threshold without an intervening image. This approach is inherently superior, as the decision to generate a new image is based on the actual changes to a page rather than the unrelated Postgres checkpoint process.

Here’s why this is significantly better for performance:

Network efficiency: The compute sends only the compact deltas, representing the actual changes, resulting in a 94% reduction in traffic according to our benchmarks.
Scalability: Tasks are shifted from the single Postgres writer to the distributed, independently scalable storage layer. Image generation for a project branch is now shared across multiple pageservers in the background.
Optimal reads: The timing of image generation is now based on actual changes to a page rather than the unrelated Postgres checkpoint process.

Quantifying the impact: from lab to production

We benchmarked this optimization using HammerDB TPROC-C, a TPC-C derived OLTP benchmark, and validated the results against real-world production workloads.

1. Serverless compute scaling

Throughput is measured in new orders per minute (NOPM), with gains scaling dramatically alongside the size of the compute instance:

Compute size	Before (NOPM)	After (NOPM)	Throughput gain
4-vCPU	78,876	94,891	20%
16-vCPU	95,832	269,189	2.8x
32-vCPU	95,686	439,300	4.5x+

On a 32 vCPU compute instance, the improvement exceeded 450%. With full page images generated on compute, each transaction generates an average of 58Kb of WAL. With image generation pushed down, this figure drops to under 4Kb—a remarkable 94% reduction. The throughput improvement is a direct consequence: less WAL translates to reduced contention on the write path, lower network bandwidth consumption, and decreased workload for the storage layer.

By eliminating the FPW bottleneck in Postgres, we enabled throughput to scale linearly with compute resources, a challenge that traditional monolithic Postgres systems struggle to meet under heavy write loads.

2. Real-world production validation

In a production environment for a prominent 56 vCPU project, enabling image pushdown resulted in a reduction of steady-state WAL generation from 30 MB/s to just 1 MB/s.

Prod customer wal rate: (lower is better)

This significant decrease in volume directly correlated with increased transaction throughput during peak periods. The optimization not only benefited writes; by refining the delta chains, the number of WAL records required per read dropped considerably. We observed p99 read latencies decline by 30% to 50%, while p50 latencies fell by approximately 30%.

Prod customer throughput: (higher is better)

On a broader scale, post-implementation, we noted that the total amount of WAL generated by computes decreased by up to 4x. The p99 latency of reads from the storage engine improved by up to 3x and became significantly more stable.

Regional wal ingest rate (lower is better)

3. Lakebase synced tables

For data-intensive Synced Tables, the impact was immediate. One customer experienced a surge in ingestion throughput from 17k rows per second to 62k rows per second, marking a 3x increase simply by enabling image pushdown.

Seamless rollout: performance without interruption

Since late March, this enhancement has been deployed across our entire fleet, now active for all Lakebase Serverless and Neon databases globally. The transition was executed on running computes via our control plane and storage system, which coordinated the update automatically. This was accomplished using the existing Postgres XLOG_FPW_CHANGE WAL record mechanism, ensuring that no restarts or interruptions were necessary for our customers.

What is next for managed Postgres performance?

The lakebase architecture was conceived for flexibility, yet it was meticulously designed for performance. The pushdown of full page writes is part of a systematic effort to harness the advantages of separating storage and compute. Just as we previously introduced cache prewarming for zero-downtime patching, we continue to shift demanding tasks away from transactions and into our scalable background storage stack. The era of the Postgres write tax is now officially behind us.

Tech Optimizer