Amazon Bedrock Knowledge Bases has introduced a fully managed Retrieval Augmented Generation (RAG) feature that connects large language models (LLMs) with internal data sources, enhancing the relevance and accuracy of responses by integrating contextual information from private datasets. During AWS re:Invent 2024, it was announced that this feature now supports natural language querying for structured data retrieval from Amazon Redshift and Amazon SageMaker Lakehouse, allowing generative AI applications to access both structured and unstructured data sources. The system converts user queries into SQL queries using natural language processing, enabling data retrieval without requiring users to know SQL syntax.
Amazon Bedrock Knowledge Bases currently supports structured data retrieval from Amazon Redshift and SageMaker Lakehouse. Although direct support for Aurora PostgreSQL-Compatible is not available, users can utilize zero-ETL integration between Aurora PostgreSQL-Compatible and Amazon Redshift to make their data accessible. This integration replicates Aurora PostgreSQL tables to Amazon Redshift in near real-time, simplifying data management.
To enable natural language querying of structured application data stored in Aurora, organizations can set up an Aurora PostgreSQL database, create a schema with interconnected tables (products, customers, and orders), and populate these tables with sample data while maintaining referential integrity. Subsequently, they can establish zero-ETL integration with Amazon Redshift, which involves creating a Redshift Serverless workgroup and mapping the database for synchronization.
Once the zero-ETL integration is verified, organizations can create an Amazon Bedrock knowledge base for natural language querying. This requires granting appropriate permissions to the Amazon Bedrock Knowledge Bases AWS Identity and Access Management (IAM) role and ensuring the knowledge base is synchronized with Amazon Redshift.
After setting up the knowledge base, users can execute natural language queries, which are translated into SQL and processed to generate human-readable responses. Examples of queries include counting unique customers and identifying customers who have purchased the most products. Finally, it is recommended to clean up resources after use to avoid ongoing charges.