Generative artificial intelligence (AI) applications are increasingly being integrated with relational databases, leading to innovative solutions across various sectors. This synergy enables organizations to leverage extensive structured data stored in relational databases for training and refining AI models. The result is a powerful toolset for generating insights, predicting trends, and enhancing database management tasks.
In this discussion, we explore a method for integrating generative AI applications with relational databases such as Amazon Aurora PostgreSQL-Compatible Edition. This integration utilizes the RDS Data API for simplified database interactions, Amazon Bedrock for accessing AI models, Amazon Bedrock Agents for automating tasks, and Amazon Bedrock Knowledge Bases for retrieving contextual information. Currently, Data API support is exclusive to Aurora databases; however, users can customize the integration for Amazon Relational Database Service (Amazon RDS) using traditional database connectivity methods.
Solution overview
This solution merges the AI capabilities of Amazon Bedrock Agents with the robust functionality of Aurora PostgreSQL via the Data API. Amazon Bedrock Agents, powered by large language models (LLMs), interpret natural language queries and generate corresponding SQL statements through action groups executed by AWS Lambda functions and schema artifacts stored in Amazon Simple Storage Service (Amazon S3). These queries are then executed against Aurora PostgreSQL using the Data API, which provides a serverless, connection-free approach to database interactions. For example, a user request such as “show sales data for the last quarter” is transformed into SQL, executed via the Data API, and the results are presented in an accessible format. The architecture of this solution is illustrated in the accompanying diagram.
The detailed steps in this architecture are as follows:
- The generative AI application invokes the Amazon Bedrock agent with natural language input to orchestrate integration with the backend relational database.
- The agent calls the foundational model (FM) on Amazon Bedrock to preprocess the prompt and determine the necessary actions.
- The agent selects the generate-query action group.
- The agent invokes the
/generate
API implemented by the Lambda function. - The Lambda function utilizes schema artifacts from the Amazon S3 bucket to augment the prompt.
- The Lambda function invokes an LLM on Amazon Bedrock to generate the SQL query and returns it to the agent.
- The agent then selects the execute-query action group.
- The agent invokes the
/execute
API implemented by the Lambda function, passing the generated SQL query. - The Lambda function uses the Data API with a read-only role to execute the SQL query against the Aurora PostgreSQL database.
- The agent returns the formatted query results to the application.
While the solution technically supports write operations, allowing AI-generated queries to modify the database poses significant risks to data integrity and security. Therefore, production implementations should restrict access to read-only operations through appropriate IAM policies and database role permissions. From a security and data integrity standpoint, it is advisable to implement this solution exclusively for read-only workloads such as analytics, reporting, and data exploration.
Security guardrails
The solution incorporates multiple layers of security controls to ensure safe and controlled access to the database, maintaining data integrity while enabling natural language query capabilities:
- Agent-level instructions: The Bedrock agents are explicitly configured for read-only operations. Embedded instructions prevent the generation of queries that could modify the database (INSERT, UPDATE, DELETE).
- Action group validation: The generate-query function includes validation of user input to thwart injection attacks and unauthorized operations. The execute-query function validates generated SQL queries against an allowlist of operations and syntax patterns, ensuring query safety before execution.
- Read-only database access: Database interactions are conducted exclusively using a read-only role (configured via READONLY_SECRET_ARN), establishing a critical security boundary that prevents potential write operations.
- Bedrock guardrails: Additional security is enforced through Amazon Bedrock Guardrails, which filter specific words or phrases like INSERT, UPDATE, DELETE from user prompts to prevent harmful or unauthorized requests before they reach the query generation stage.
Prerequisites
To implement the steps outlined in this post, the following resources are required:
Clone the sample Python project from the AWS Samples repository and follow the development environment setup instructions in the readme:
git clone https://github.com/aws-samples/sample-to-connect-bedrock-agent-with-aurora
cd sample-to-connect-bedrock-agent-with-aurora
Setting up the database environment
Begin by deploying an Aurora PostgreSQL cluster using the AWS CDK. To provision the database infrastructure, execute the command:
cdk deploy RDSAuroraStack
The RDSAuroraStack is an AWS CDK construct that provisions an Aurora PostgreSQL Serverless v2 database within a secure VPC environment. It establishes a dedicated VPC with public and private subnets, configures security groups, and manages database credentials through AWS Secrets Manager. The stack also implements a custom Lambda-based solution to create a read-only database user with appropriate permissions, making it suitable for applications requiring segregated database access levels, such as connecting Amazon Bedrock agents to Aurora PostgreSQL databases.
Deploy the agent
Agents facilitate interactions between foundation models (FMs), data sources, software applications, and user conversations. They can also automatically call APIs and invoke Amazon Bedrock Knowledge Bases to enhance contextual information. By integrating the agent with Amazon Aurora, you gain the ability to convert natural language inputs into precise SQL queries using generative AI capabilities. Deploy the agent using the AWS CDK command:
cdk deploy BedrockAgentStack
The BedrockAgentStack is an AWS CDK construct that creates an Amazon Bedrock agent designed to interact with an Aurora PostgreSQL database through natural language queries. It provisions a Lambda function capable of generating and executing SQL queries, implements a comprehensive guardrail system to prevent data modification operations (allowing only SELECT queries), and establishes the necessary IAM roles and permissions for secure communication between Bedrock and Aurora. The stack creates two action groups—one for generating SQL queries from natural language prompts and another for executing those queries—while integrating with the previously deployed Aurora PostgreSQL database using the Data API.
After deploying the CDK stack, you can access the Bedrock Agent builder console to review the configurations of the query-agent
. You will observe that the agent is configured to use Anthropic’s Claude LLM. Additionally, examine the agent’s instructions, which define its functionality. The agent features two key action groups: generate-sql-query
and execute-sql-query
. An action group comprises a logical collection of related functions (actions) that an agent can perform to accomplish specific tasks. The generate-sql-query
group invokes the generate-query
Lambda function, accepting the input user question and returning the generated SQL query. The execute-sql-query
group invokes the execute-query
Lambda function, accepting the query and parameter values. We will delve into these functions in the subsequent sections.
The generate-query function
The generate_sql_query
function employs an LLM to create SQL queries from natural language questions and a provided database schema. It utilizes a detailed prompt containing instructions for SQL generation, the database schema, and example question-SQL pairs. The function formats this prompt and the user’s question into a structured input for the LLM. It then calls an invoke_llm
method to obtain a response from the LLM, extracting the SQL query from the output and returning it. This approach enables dynamic SQL generation based on natural language input while providing the LLM with necessary context about the database structure for accurate query creation. Below is the prompt used to invoke the LLM:
def generate_sql_query(question):
validated_question = validate_input(question)
schema_content = read_schema_file()
# Construct the prompt with schema context
contexts = f"""
Read database schema inside the tags which contains the tables and schema information to do the following:
1. Create a syntactically correct SQL query to answer the question.
2. Format the query to remove any new line with space and produce a single line query.
3. Never query for all the columns from a specific table, only ask for a few relevant columns given the question.
4. Pay attention to use only the column names that you can see in the schema description.
5. Be careful to not query for columns that do not exist.
6. Pay attention to which column is in which table.
7. Qualify column names with the table name when needed.
8. Return only the sql query without any tags.
{schema_content}
"How many users do we have?"
SELECT SUM(users) FROM customers
"How many users do we have for Mobile?"
SELECT SUM(users) FROM customer WHERE source_medium='Mobile'
{validated_question}
Return only the SQL query without any explanations.
"""
prompt = f"""
Human: Use the following pieces of context to provide a concise answer to the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.
{contexts}
Question: {validated_question}
Assistant:
"""
messages = [
{
"role": "user",
"content": [
{"type": "text", "text": prompt.format(contexts, validated_question)}
],
}
]
llm_response = invoke_llm(messages)
return llm_response["content"][0]["text"]
In this example, the entire schema file is embedded as context into the prompt due to its small and simple nature. For larger and more complex schemas, you can enhance the agent’s capabilities by integrating Amazon Knowledge Bases through vector embeddings and semantic search, storing comprehensive schema definitions, table relationships, sample queries, and business context documents. By employing a retrieval-augmented generation (RAG) approach, the agent first searches the Knowledge Base for contextual information before invoking the generate-query action group function, significantly reducing model inference time.
The execute-query function
The execute_query
function utilizes the Data API to execute a SQL query against an Aurora PostgreSQL database, taking the SQL query and parameters as inputs. It returns the response from the database. The Data API simplifies the management of database connections in serverless architectures by providing a secure HTTPS endpoint that handles connection management automatically, eliminating the need for VPC configurations and connection pools in Lambda functions.
The lambda_handler
processes incoming events, extracting parameters and their values into a dictionary. It retrieves a SQL query from these parameters, replacing newlines with spaces for proper formatting. Finally, it calls the execute_query
function with the extracted query and parameters to execute the SQL statement. This setup allows for dynamic SQL query execution in a serverless environment, accommodating different queries and parameters for each invocation of the Lambda function:
def execute_query(query, parameters=None):
try:
# Base request parameters
request_params = {
"resourceArn": DB_CLUSTER_ARN,
"secretArn": DB_SECRET_ARN,
"database": DB_NAME,
"sql": query,
}
# Only add parameters if they exist and are not empty
if parameters and len(parameters) > 0:
request_params["parameters"] = parameters
# Execute the query
response = rds_data.execute_statement(**request_params)
return response
except Exception as e:
print(f"Error executing query: {str(e)}")
raise
Test the solution
Before testing, create a sample schema with data using scripts/create_schema.py
. This script will establish schemas and tables and ingest sample data. The testing process is straightforward: send a natural language prompt as input to the agent, which will generate the necessary SQL query and execute it against the configured Aurora PostgreSQL database using the Data API. The agent will then return a response based on the results queried from the database. Before running the scripts/create_schema.py
script from your IDE, update it with your DB_CLUSTER_ARN
, DB_SECRET_ARN
, and DB_NAME
noted from your RDSAuroraStack CDK deployment output.
python3 scripts/create_schema.py
Once the schema is created and test data is loaded, there are several ways to test the deployed agent. One method is to utilize the Amazon Bedrock console, while another is to employ the InvokeAgent API.
Using the test window of Amazon Bedrock Agents
To test the solution using the test window, follow these steps:
- On the Amazon Bedrock Agents console, open the Test window on the right panel.
- Input your test query for the agent, as illustrated in the accompanying screenshot.
- To troubleshoot and review all the steps the agent utilized to generate the response, expand the test window and examine the Trace section, as shown in the accompanying screenshot.
Using the Amazon Bedrock InvokeAgent API
Another method for testing is through the AWS SDK for the InvokeAgent API. Applications utilize this API to interact with the agent. A utility script scripts/test_agent.py
is available in the repository for testing the agent and its integration with the Aurora PostgreSQL database. Ensure to update the script with the Amazon Bedrock agent ID before executing it. The test script offers options to run with a single test prompt or multiple prompts, and it can also be run with trace enabled to review the steps and reasoning the agent employed to fulfill the request.
Run a single test without trace:
python3 scripts/test_agent.py --test-type single
Run a single test with trace:
python3 scripts/test_agent.py --test-type single --trace
Run all tests without trace:
python3 scripts/test_agent.py --test-type all
Run all tests with trace:
python3 scripts/test_agent.py --test-type all --trace
The following is a sample output from the test, displaying the actual data retrieved from the Aurora PostgreSQL database table:
cdk deploy RDSAuroraStack
0
The following is a sample output with trace enabled, illustrating all the steps of the agent:
cdk deploy RDSAuroraStack
1
Here is another example demonstrating the agent’s response to an input prompt requesting data insertion into the database. This solution strictly permits read operations (SELECT). From a security and data integrity perspective, we do not advocate implementing this solution for write operations. Should you require your agent to support inserts and updates, it is advisable to do so via an API that provides a layer of abstraction over the database, coupled with validations and controls to ensure data consistency.
Considerations and best practices
When integrating Amazon Bedrock Agents with Aurora PostgreSQL and employing generative AI capabilities for generating and executing SQL queries, several key considerations should be observed:
- Enable this integration approach solely for read-only workloads such as analytics and reporting, where flexible data querying access is needed through natural language. For read-write and transactional workloads, utilize well-defined APIs to interface with the database instead of generating SQL queries.
- Database schemas subject to frequent changes in columns and data types can benefit from this generative AI-based integration approach, which generates SQL queries on-the-fly based on the latest schema. Ensure that schema changes are made visible to the query generation function.
- Implement parameter validation in Amazon Bedrock Agents and the action group Lambda functions to prevent SQL injection and uphold data integrity. Refer to Safeguard your generative AI workloads from prompt injections.
- Employ caching strategies where appropriate to alleviate database load for frequently requested information. For further details, refer to: Database Caching Strategies Using Redis.
- Implement comprehensive logging and auditing to track interactions between Amazon Bedrock Agents and your database, promoting compliance and facilitating troubleshooting. Regularly monitor and analyze generated query patterns to identify opportunities for performance tuning. For more details, review the blog: Improve visibility into Amazon Bedrock usage and performance with Amazon CloudWatch.
- If the application is multi-tenant, ensure appropriate isolation controls are in place. For details on implementing row-level security with the Data API, see Enforce row-level security with the RDS Data API.
- For a managed implementation of text-to-SQL query generation functionality, consider utilizing the GenerateQuery API supported with the Bedrock Knowledge Base.
Clean up
To prevent incurring future charges, delete all resources created through CDK:
cdk deploy RDSAuroraStack
2