Vector embeddings are mathematical representations that enhance the processing of unstructured data in applications like semantic search and recommendation systems. Maintaining current vector embeddings is essential for AI functionalities, especially when using Retrieval-Augmented Generation (RAG) solutions. Amazon Bedrock offers managed solutions for embedding generation, but organizations may opt for custom vector databases using PostgreSQL with the pgvector extension for specific needs.
A vector database setup involves creating a system to generate and update embeddings in response to data changes. The workflow includes identifying data changes, sending content to embedding models, receiving embeddings, and storing them alongside original data. Amazon Titan is highlighted for its performance in generating embeddings.
Five implementation approaches for automating the embedding workflow are discussed:
1. **Database triggers with the aws_ml extension (synchronous)**: This method uses PostgreSQL triggers to generate embeddings immediately upon data changes, ensuring real-time consistency but potentially increasing transaction duration.
2. **Database triggers with the aws_lambda extension (synchronous)**: This approach decouples embedding generation from database operations by invoking a Lambda function synchronously, allowing for more complex processing but still blocking transactions.
3. **Database triggers with the aws_lambda extension (asynchronous)**: Here, triggers invoke Lambda functions asynchronously, allowing database operations to proceed without waiting for embedding generation, which enhances performance but introduces eventual consistency.
4. **Amazon SQS queue with Lambda batch processing (asynchronous)**: This method sends messages to an SQS queue for batch processing by a Lambda function, optimizing for scalability and resilience but increasing latency between data insertion and embedding availability.
5. **Periodic updates scheduled with the pg_cron extension (asynchronous)**: This approach schedules jobs to process embeddings in batches, improving throughput and cost-efficiency while increasing the delay between data changes and embedding updates.
Each approach has its pros and cons, including considerations for transaction blocking, error handling, latency, and scalability. The choice of method depends on specific use case requirements, such as the need for real-time updates versus batch processing efficiency.