Sixteen years ago, the author began their PhD at UC Berkeley and was advised to focus on analytics rather than OLTP databases, which were considered solved. This led to the creation of Apache Spark and Databricks. However, during the development of Databricks, it was found that OLTP databases were not solved problems; they were difficult to scale and fragile. This realization led to the development of Lakebase, a serverless Postgres database designed with modern technology.
Lakebase architecture separates the write-ahead log (WAL) and data files into independent, scalable services. The WAL is externalized to a service called SafeKeeper, which uses Paxos-based replication for durability, while data files are managed by PageServer, which stores them in cloud object storage. This design addresses challenges such as data loss, scaling, and performance interference between transactional and analytical workloads.
Lakebase maintains compatibility with Postgres, offers unlimited storage, serverless compute, durable writes, and simpler high availability. It also introduces LTAP (Lake Transactional/Analytical Processing), which allows both transactional and analytical processing to operate on a single data copy in real time, eliminating the need for separate data copies and reducing costs.
LTAP utilizes a unified storage layer that allows data to be materialized in both row and columnar formats, optimizing it for both transactional and analytical workloads. The system ensures that analytics can access the most current data without affecting transactional performance by using a log sequence number (LSN) to retrieve the latest changes.
Unlike traditional CDC approaches, LTAP requires no explicit table replication, as all data is stored in a single governed copy. This architecture circumvents common issues faced by hybrid transactional/analytical processing (HTAP) systems, such as incomplete feature sets, lack of ecosystem support, and performance contention.
The Lakebase architecture has unlocked capabilities like unlimited storage, elastic compute, durable writes, and instant branching, with ongoing developments anticipated in the future.