Today marks the unveiling of Lakebase Search, a sophisticated hybrid vector and full-text retrieval system seamlessly integrated into Lakebase, now accessible in beta on both AWS and Azure. This innovative solution is powered by two native Postgres extensions: lakebase_vector and lakebase_text, enabling your entire agent loop to operate on a singular data backend—Lakebase.
This advancement ushers in a new era of scalability, economic efficiency, and ergonomics tailored for agents. Agents redefine search as an operational workflow, retrieving context, reasoning, acting, and retaining information. This integration closely links the read path (retrieval) with the write path (memory), making instantaneous retrieval crucial for accessing newly generated insights in real time.
For agents, search is actually an operational workload
Agents now manage four times more databases on Lakebase than human users, with their requirements diverging significantly from those of humans. Traditional search engines operate on a read-only snapshot of outdated data, while agents engage with search as if it were a live operational database.
Consider a typical agent schema: chunked documents and embeddings coexist alongside an active conversational memory log, creating a continuous read/write loop. Agents document new learnings in memory during one interaction and require that data to be fully indexed and searchable in the subsequent interaction. They demand not just fast retrieval but instant access to the most recent writes.
Search is a strange workload
Search presents a unique workload characterized by two key properties. Firstly, the volume of stored data far exceeds the amount queried, leaving much of it dormant. Secondly, vector search can lead to significant data bloat, as a 1 KB text file expands when vectorized. Each document is divided into multiple chunks, with each chunk producing a distinct high-dimensional embedding, not to mention the index overhead.
When applied across thousands of mostly inactive tenants, traditional search architectures falter. Industry-standard vector indexes like HNSW are inherently memory-bound, as efficient graph traversal relies on the index remaining resident in RAM, making the hosting of cold multi-tenant data prohibitively expensive.
Search needs a lakebase
Last year, we introduced Lakebase, a serverless Postgres OLTP architecture where data resides in cost-effective cloud object storage. A tiered cache (RAM, local NVMe, pageserver) ensures that frequently accessed pages can be read with local-disk latency. We recognized this architecture as precisely what modern search requires. However, to unlock these economic advantages without compromising query speed, an index layout explicitly designed for a tiered storage hierarchy was essential. Lakebase initially lacked this, so we developed it.
By combining a tiered architecture with a purpose-built tiered index, we achieve:
- Next-level scale without the speed penalty: By intelligently fetching only the necessary pages from object storage into a local cache, smaller Postgres instances can deliver the same recall and latency without the need for extensive compute resources.
- Next-level economics: The cold tail of vectors is stored in nearly-free object storage, while the hot working set resides on NVMe. You only pay for what you query, not what you store.
The economic benefits are starkly illustrated in the following table, reflecting cloud list prices per terabyte per month:
| Where the data lives | Cost |
| RAM | ~,000 / TB / month |
| Local NVMe (cache) | ~0 / TB / month |
| Object storage | ~ / TB / month |
Our indexing method enables Lakebase to retain only the active working set in RAM, while the majority of cold data is stored in object storage. This architecture results in a system that is two orders of magnitude more cost-effective while still delivering the high-performance search that applications demand.
Bringing lake-native search indexes to Postgres
In the development of Lakebase Search, we adhered to two essential principles: it had to be fully Postgres-native, utilizing standard pgvector and tsvector types along with existing ecosystem tools, and the indexing had to be specifically designed for tiered cloud object storage.
To fulfill these criteria, we are launching two new Postgres extensions in Beta today, both aimed at providing state-of-the-art search relevance without necessitating excessive RAM provisioning:
- lakebase_vector: 32x compression and 1B+ scale. We preserved standard pgvector data types and operators while altering the underlying index type. By clustering and compressing vectors using RaBitQ (Randomized Binary Quantization), we achieved a 32x reduction in index footprint while maintaining high recall. A 100-million-vector index that previously required 300GB of RAM can now fit into under 10GB, allowing a single index to scale to over 1 billion vectors, with the active working set cached on local NVMe and the cold tail stored in object storage.
- lakebase_text: True BM25 without the GIN memory bloat. Postgres utilizes GIN indexes for exact keyword matching, which must remain in RAM to sustain performance, leading to linear memory costs as dataset size increases. lakebase_text replaces GIN with an index optimized for sequential reads from cloud object storage, introducing native BM25 relevance ranking to Postgres without the associated RAM footprint. Both extensions operate within the same engine, enabling hybrid search to run in a single SQL query. Vector similarity and keyword relevance are integrated through reciprocal rank fusion (RRF), allowing results to be joined and filtered against operational tables.
Postgres is ready for large-scale, serious search workloads
We benchmarked Lakebase Search using LAION-100M, which consists of 100 million 768-dimensional vectors, focusing on top-10 retrieval with a single instance. The query performance with a warm cache and a single connection yielded exact nearest neighbor recall with no bloat:
| Recall@10 | P99 latency | QPS |
|---|---|---|
| 0.955 | 30 ms | 51 |
| 0.942 | 18 ms | 104 |
| 0.926 | 14 ms | 142 |
Achieving this scale with traditional architectures typically necessitates a memory-bound structure. A standard pgvector HNSW index requires that the neighbor graph and its target heap pages remain in RAM for efficient traversal. For 100 million vectors:
- pgvector: Requires a 512 GB (64 CPU) instance, with an index build time of approximately 40 hours. Due to the reliance on un-localized random access for graph traversal, cold restarts result in significant disk-read latencies, causing the first query to take several minutes.
- lakebase_vector: Operates on a 192 GB (96 CU / 24 CPU) instance, with an index build time of 1.5 hours. Although traversal remains random access, the index layout clusters data so that random lookups are localized within a hot working set on NVMe cache, while the cold tail is stored in object storage. The instance can scale to zero when idle, with the first cold-cache query taking just 1.13 seconds.
This architecture fundamentally alters the approach to total cost of ownership. Legacy search systems impose a fixed baseline cost regardless of query volume, while Lakebase aligns costs with actual usage:
| Workload Type | Traditional Architecture (Memory-Bound) | Lakebase Search Architecture |
| Large Knowledge Bases (Mostly idle) | Fixed baseline costs to keep idle datasets resident in RAM. | Scales compute to zero; you only pay for object storage. |
| Agent Memory & Chat (Bursty) | Over-provisioned RAM and compute to handle traffic spikes. | Dynamically scales compute for spikes, then scales down to zero. |
| Search Bars (Sustained) | Massive instances sized to fit the entire dataset in RAM. | Smaller, more economical instances as the dataset bypasses RAM residency. |
Lakebase Search enables agent-first ergonomics
A single backend for memory and context: Agents should not need to integrate a vector database for context with a transactional database for memory. By embedding retrieval logic directly into the database, the entire agent loop can function on a unified backend. Since Lakebase Search is built on Postgres—fully utilizing standard pgvector and tsvector types—it integrates natively with existing MCPs, standard drivers, and connectors. Moreover, with search residing adjacent to operational data, hybrid searches can be executed, allowing for joins against application tables and secure tenant filtering, all within a single SQL query.
Continuous search experimentation: Optimizing chunking strategies or hybrid weights often requires iterative testing. Instead of exporting data to external batch systems for reprocessing, Lakebase Search connects with the Lakehouse to establish a tight feedback loop. You can branch multi-terabyte datasets instantly at no cost, construct indexes out-of-band using parallel compute, and route agent feedback back to the Lakehouse for offline evaluation.
A dedicated retrieval engine per agent: Traditional architectures necessitate sharing a single search cluster among all agents. With Lakebase, idle indexes incur nearly zero storage costs, allowing for the provisioning of thousands of isolated corpora dedicated to specific agents, users, or sessions. This transforms search from a static snapshot into an operational read/write loop, ensuring that data written by an agent in one interaction is immediately available for retrieval in the next, complete with full transactional guarantees.
A single foundation for the agent loop
Lakebase eliminates the need to connect disparate vector stores, search clusters, and transactional databases. By consolidating the entire lifecycle within a single Postgres system, it delivers the scalability and low cost of tiered cloud object storage alongside the real-time read/write ergonomics essential for agentic workflows. Lakebase Search is now available in Beta on AWS and Azure, inviting exploration into what your agents can create.