RDS databases Archives

Tech Optimizer

September 7, 2025

Automating vector embedding generation in Amazon Aurora PostgreSQL with Amazon Bedrock

Vector embeddings are mathematical representations that enhance the processing of unstructured data in applications like semantic search and recommendation systems. Maintaining current vector embeddings is essential for AI functionalities, especially when using Retrieval-Augmented Generation (RAG) solutions. Amazon Bedrock offers managed solutions for embedding generation, but organizations may opt for custom vector databases using PostgreSQL with the pgvector extension for specific needs. A vector database setup involves creating a system to generate and update embeddings in response to data changes. The workflow includes identifying data changes, sending content to embedding models, receiving embeddings, and storing them alongside original data. Amazon Titan is highlighted for its performance in generating embeddings. Five implementation approaches for automating the embedding workflow are discussed: 1. **Database triggers with the aws_ml extension (synchronous)**: This method uses PostgreSQL triggers to generate embeddings immediately upon data changes, ensuring real-time consistency but potentially increasing transaction duration. 2. **Database triggers with the aws_lambda extension (synchronous)**: This approach decouples embedding generation from database operations by invoking a Lambda function synchronously, allowing for more complex processing but still blocking transactions. 3. **Database triggers with the aws_lambda extension (asynchronous)**: Here, triggers invoke Lambda functions asynchronously, allowing database operations to proceed without waiting for embedding generation, which enhances performance but introduces eventual consistency. 4. **Amazon SQS queue with Lambda batch processing (asynchronous)**: This method sends messages to an SQS queue for batch processing by a Lambda function, optimizing for scalability and resilience but increasing latency between data insertion and embedding availability. 5. **Periodic updates scheduled with the pg_cron extension (asynchronous)**: This approach schedules jobs to process embeddings in batches, improving throughput and cost-efficiency while increasing the delay between data changes and embedding updates. Each approach has its pros and cons, including considerations for transaction blocking, error handling, latency, and scalability. The choice of method depends on specific use case requirements, such as the need for real-time updates versus batch processing efficiency.

Tech Optimizer

July 13, 2024

Synopsis of several compelling features in PostgreSQL 16 | Amazon Web Services

PostgreSQL 16, released in September 2023, brings significant performance improvements and new features to the open-source relational database. The release includes enhancements to logical replication, parallel apply of large transactions, logical replication from a standby instance, new SQL/JSON functionality, faster data ingestion with COPY, subtransaction performance improvements, tracking table and index usage, monitoring I/O with pg_stat_io, and new security features like the SYSTEM_USER keyword. These features aim to boost performance and monitoring capabilities in PostgreSQL 16.

Tech Optimizer

July 5, 2024

New – Rightsizing Recommendations for Amazon RDS MySQL and RDS PostgreSQL in AWS Compute Optimizer | Amazon Web Services

AWS Compute Optimizer now delivers recommendations for Amazon RDS MySQL and PostgreSQL databases to help detect idle instances, identify optimal instance types, and provisioned IOPS settings.