Why Instacart Moved to Postgres & pgvector to Boost Semantic Search

June 10, 2025

Delivering swift and precise search capabilities is paramount for platforms like Instacart, which caters to 14 million daily users navigating through billions of products. The intricacies of this task extend beyond mere keyword matching; it requires a profound semantic understanding to accurately decipher user intent, particularly in the face of ambiguous queries such as “healthy food.”

The search system must not only identify relevant products beyond exact text matches but also reflect real-time changes in inventory, pricing, and ranking. This necessity subjects the search database to significant read and write workloads, ensuring that results remain current and accurate.

Previously, Instacart relied on Elasticsearch for search functionalities and Facebook AI Similarity Search (FAISS) for semantic search. However, the company transitioned to a hybrid search architecture utilizing Postgres and pgvector, which markedly enhanced search performance. The details of this transformative process were elaborated in a blog post released last month.

As Instacart’s database required frequent updates based on inventory fluctuations, the denormalized data model employed in Elasticsearch necessitated continual partial writes across billions of items. “Over time, the indexing load and throughput caused the cluster to struggle so much that fixing erroneous data would take days to be corrected,” the company noted.

Moreover, Instacart aimed to integrate machine learning models into its search features, which further exacerbated the already high indexing load and costs. This situation adversely affected read performance, rendering the overall search capabilities unsustainable, according to Instacart.

‘Somewhat Unconventional, But Made Sense for Our Case’

In response, Instacart migrated its text retrieval stack to sharded Postgres instances, embracing a high degree of data normalization. “While this might seem somewhat unconventional, it made sense for our use case,” the company explained.

A normalized data model enabled a tenfold reduction in write workload compared to the denormalized structure previously utilized in Elasticsearch. Instacart indicated that this shift to Postgres resulted in substantial savings in both storage and indexing.

Additionally, a significant advantage of using Postgres was the capability to store machine learning features and model coefficients in separate tables. This architecture allowed for varying update frequencies across datasets, which could be combined on-demand using SQL, thus providing the necessary flexibility for more advanced machine learning retrieval models.

Furthermore, by moving compute closer to storage through the use of Postgres on non-volatile memory express (NVMe), Instacart achieved a twofold increase in search performance. Unlike traditional methods, this approach of pushing logic to the data layer eliminated multiple network calls and data overfetching, effectively halving latency and streamlining their application.

FAISS to pgvector Migration ‘Was a Great Success’

Initially, Instacart implemented semantic search through a standalone FAISS service for Approximate Nearest Neighbour (ANN) search, while full-text search remained on Postgres. This hybrid configuration combined results at the application layer, enhancing search quality.

However, the limitations of FAISS regarding attribute filtering, overfetching, and the complexity of maintaining two separate, potentially inconsistent systems prompted Instacart to pursue a unified solution.

The company opted for pgvector, a Postgres extension, to merge both retrieval mechanisms. This strategic move eradicated data duplication, minimized operational complexity, and enabled finer-grained control over result sets, all while leveraging Postgres’ existing capabilities for real-time filtering—ultimately enhancing search performance and user satisfaction.

“Based on the offline performance of pgvector, we launched a production A/B test to a section of users. We saw a 6% drop in the number of searches with zero results due to better recall,” Instacart reported. “This led to a substantial increase in incremental revenue for the platform as users encountered fewer dead-end searches and were able to locate the items they sought more effectively.”

A Modern Search Infra is the Need of the Hour

Beyond Instacart, numerous companies globally have embraced modern infrastructures for search. Last year, Shopify, another e-commerce leader, outlined in a blog post how it enhanced consumer search intent through real-time machine learning capabilities.

Shopify improved its storefront search with AI-powered semantic features, moving beyond simple keyword matching to gain a deeper understanding of consumer intent. This advancement was achieved by developing foundational machine learning assets, particularly real-time text and image embeddings.

Shopify’s real-time embedding pipeline processes 2,500 embeddings per second on Google Cloud Dataflow, yet scaling GPU-accelerated streaming inference presented significant optimization challenges.

Dataflow initiated 16 processes with 12 threads each, loading 192 images into memory simultaneously, which led to frequent crashes. Rather than incurring a 14% increase in costs for high-memory instances, Shopify reduced the thread count to four. This adjustment limited concurrent images to 64, effectively cutting memory usage by 2.6 times without compromising performance, given that the GPU was already the bottleneck.

Each process loaded its own copy of the machine learning model, consuming GPU memory but ensuring rapid inference through parallelism. Conversely, sharing a single model across processes saved memory but significantly decreased throughput.

Unpredictable traffic surges meant images arrived one at a time rather than in efficient batches. While enforcing batching improved GPU utilization, it introduced excessive latency.

Shopify’s solution navigated these trade-offs by maintaining multiple model copies, as its models were lightweight enough to accommodate this approach, and accepted inefficient batching since parallel processing kept GPUs sufficiently engaged to meet performance objectives.

“When a merchant edits their products or uploads a new image, they want these updates to be available on their website instantly. Additionally, the ultimate objective is to boost sales for our merchants and offer pleasant interactive experiences for their consumers,” Shopify stated.

“Our data suggests that up-to-date embeddings achieved through a streaming pipeline allow us to optimize for this, despite the additional complexity it incurs when compared with a batch solution,” it added.

Tech Optimizer