Instacart has undertaken a significant transformation of its search infrastructure, opting to replace Elasticsearch with PostgreSQL. This strategic move integrates both keyword and embedding-based retrieval within a unified system. By consolidating catalog and search data into PostgreSQL, the company aims to streamline operations, minimize synchronization challenges, and enhance the precision and recall of search results.
Enhancing Search Result Retrieval
A pivotal element of this redesign focuses on optimizing result retrieval. Traditional keyword searches excel in matching exact product attributes; for instance, a query such as “pesto pasta sauce 8oz” benefits from precise lexical matching. Conversely, broader intent-driven queries like “healthy foods” are more effectively managed through semantic retrieval, which comprehends the relationships between terms and concepts. By merging these two methodologies within PostgreSQL, Instacart achieves a harmonious balance between precision—returning only relevant results—and recall—capturing as many pertinent items as possible. This ensures customers encounter both the specific products they seek and valuable options for exploration.
The engineering team at Instacart reports that this migration has significantly boosted development velocity by eliminating the need to reconcile data across disparate systems. The hybrid infrastructure also offers enhanced flexibility in managing dynamic inventory and complex user preferences, enabling the platform to efficiently process millions of search requests daily. Real-time updates regarding prices, availability, and discounts are now reflected instantaneously, fostering a more personalized and efficient shopping experience for users.
Ankit Mittal, an engineer at Instacart, noted:
A normalized data model allowed us to achieve a 10x reduction in write workload compared to the denormalized data model we used in Elasticsearch. This resulted in nearly 80% savings on storage and indexing costs, reduced dead-end searches, and improved the overall customer experience.
Previously, Elasticsearch managed full-text queries while transactional data resided in PostgreSQL. This dual-database approach led to synchronization issues and increased operational expenses. To incorporate semantic search capabilities, the team initially deployed FAISS before evolving to a hybrid model utilizing the pgvector extension in PostgreSQL. This innovative approach enables both lexical and embedding-based retrieval to function within a single system, thereby reducing data duplication and complexity.
Previous retrieval architecture with FAISS and Postgres (Source: Instacart Engineering Blog)
The newly designed architecture employs sharded PostgreSQL instances with a normalized data model to facilitate horizontal scaling. Each shard encompasses catalog and search indexes, with queries routed through a service layer to the appropriate shard. Instacart engineers have noted that leveraging PostgreSQL’s GIN indexes and a modified ts_rank function has resulted in high-performance text matching. The relational model allows machine learning features and model coefficients to be stored in separate tables, while normalization has led to a tenfold reduction in write workloads compared to Elasticsearch, significantly lowering storage and indexing costs while accommodating hundreds of gigabytes of machine learning feature data for more sophisticated retrieval models.
Hybrid retrieval architecture with pgvector and Postgres (Source: Instacart Engineering Blog)
PostgreSQL extensions have played a crucial role in this redesign. Features such as pg_trgm for trigram-based text search and pgvector for embedding-based search empower the database to adeptly manage both traditional keyword and semantic search. Queries are efficiently routed through a layer directing them to the appropriate shards containing the necessary indexes, ensuring rapid result retrieval without the complications of cross-system synchronization.
About the Author
Leela Kumili
Show moreShow less