Sixteen years ago, the author began their PhD at UC Berkeley, where they were advised to focus on analytics rather than OLTP databases, which were considered a solved problem. However, upon developing Databricks, they found that OLTP databases were cumbersome, challenging to scale, and fragile. This led to the creation of Lakebase, a serverless Postgres database designed to address these issues by externalizing the write-ahead log (WAL) and data files into independent services.
Lakebase architecture separates the WAL into a distributed service called SafeKeeper and data files into another service called PageServer, allowing for improved durability, scalability, and performance. This architecture eliminates data loss risks associated with disk failures and misconfigurations, simplifies high availability, and allows for elastic compute resources.
Lakebase also introduces LTAP (Lake Transactional/Analytical Processing), which enables transactions and analytics to operate on a single data copy in real-time, avoiding the need for separate data copies and reducing costs. The system retains Postgres's ACID semantics while storing data in open columnar formats, making it accessible to both transactional and analytical engines without the delays and complexities of traditional replication methods.
The architecture allows for unlimited storage, serverless compute, and instant branching, while ensuring that analytics can read the most current transactional data without affecting transactional workloads. LTAP aims to unify transactional and analytical processing by leveraging distinct compute engines for each workload while integrating at the storage layer.