Multi-tenant vector search with Amazon Aurora PostgreSQL and Amazon Bedrock Knowledge Bases

February 21, 2025

This article continues the exploration of multi-tenant vector stores utilizing Amazon Aurora PostgreSQL-Compatible Edition. In Part 1, we examined a self-managed method for creating a multi-tenant vector search system. This approach leverages direct SQL queries alongside the RDS Data API for data ingestion and retrieval, while ensuring data isolation through built-in row-level security policies.

Solution overview

Imagine a scenario where users submit home survey requests for properties they are interested in purchasing. Home surveyors conduct evaluations and update their findings, which are subsequently stored in an Amazon Simple Storage Service (Amazon S3) bucket. The home survey company aims to enhance user experience by allowing natural language inquiries about the property. To facilitate this, embedding models convert the home survey documents into vector embeddings, which are then ingested into a vector store alongside the original document data. The Retrieval Augmented Generation (RAG) technique enriches prompts sent to a large language model (LLM) with contextual information, thereby generating informed responses for users.

Amazon Bedrock Knowledge Bases offers a fully managed solution that streamlines the entire RAG workflow—from data ingestion to retrieval and prompt enhancement—eliminating the need for custom integrations and data flow management.

While Part 1 focused on a self-managed approach, the fully managed alternative utilizes Amazon Bedrock Knowledge Bases to simplify the creation and maintenance of vector embeddings within an Aurora PostgreSQL vector store. The following diagram illustrates an example implementation using these technologies.

The architecture comprises several high-level steps:

Data is ingested from an S3 bucket via Amazon Bedrock Knowledge Bases.
Amazon Bedrock Knowledge Bases invokes an embeddings model to convert documents into vector embeddings.
The vector embeddings, along with data chunks and metadata, are stored in Aurora utilizing pgvector.
A user submits a natural language query.
The embeddings model in Amazon Bedrock Knowledge Bases transforms the query into embeddings, mirroring the model used for data ingestion.
Amazon Bedrock Knowledge Bases executes a query against the vector store to find similar documents.
The relevant documents are forwarded to an LLM in Amazon Bedrock for response augmentation.
The final response is delivered back to the user.

In the subsequent sections, we will detail the steps necessary to create a vector store, ingest and retrieve vector data, and enforce multi-tenant data isolation.

Prerequisites

To follow the steps outlined in this post, you will need the following resources:

Access to an Amazon Aurora PostgreSQL-compatible database.
Amazon S3 for document storage.
Amazon Bedrock for managing knowledge bases.

Additionally, clone the AWS samples repository for data-for-saas-patterns and navigate to the folder samples/multi-tenant-vector-database/amazon-aurora/aws-managed:

git clone https://github.com/aws-samples/data-for-saas-patterns.git
cd samples/multi-tenant-vector-database/amazon-aurora/aws-managed

Create a vector store with Amazon Aurora PostgreSQL-compatible

Begin by configuring the Aurora PostgreSQL database to enable the pgvector extension and establish the necessary schema for the vector store. Note that the steps differ slightly between the self-managed and fully managed approaches, necessitating distinct schema and table names. Execute all SQL commands from the 1_build_vector_db_on_aurora.sql script using psql, the Amazon RDS console query editor, or any PostgreSQL query editor tool to configure the vector store.

Create and verify the pgvector extension:

CREATE EXTENSION IF NOT EXISTS vector;

SELECT extversion FROM pg_extension WHERE extname='vector';

Create a schema and vector table:

CREATE SCHEMA aws_managed;

CREATE TABLE aws_managed.kb (id uuid PRIMARY KEY, embedding vector(1024), chunks text, metadata jsonb, tenantid varchar(10));

Create the index:

CREATE INDEX on aws_managed.kb USING hnsw (embedding vector_cosine_ops);

Create a user and grant permissions:

CREATE ROLE bedrock_user LOGIN;
-- password bedrock_user
GRANT ALL ON SCHEMA aws_managed to bedrock_user;
GRANT ALL ON TABLE aws_managed.kb to bedrock_user;

Upon executing these commands, the schema should reflect the vector table and index:

d aws_managed.kb;
                       Table "aws_managed.kb"
  Column   |         Type          | Collation | Nullable | Default 
-----------+-----------------------+-----------+----------+---------
 id        | uuid                  |           | not null | 
 embedding | vector(1024)          |           |          | 
 chunks    | text                  |           |          | 
 metadata  | jsonb                 |           |          | 
 tenantid  | character varying(10) |           |          | 
Indexes:
    "kb_pkey" PRIMARY KEY, btree (id)
    "kb_embedding_idx" hnsw (embedding vector_cosine_ops)

The vector table comprises the following fields:

id – A UUID serving as the primary key for the vector store.
embedding – The vector field for storing vector embeddings, where the dimension (1024) corresponds to the embedding model’s size. The Amazon Titan Embeddings V2 model accommodates flexible dimensions (1024, 512, 256).
chunks – A text field for storing raw text from the source data in segments.
metadata – A JSON metadata field (utilizing the jsonb data type) for source attribution, particularly relevant when employing managed ingestion via Amazon Bedrock Knowledge Bases.
tenantid – This field identifies and associates data and chunks with specific tenants in a SaaS multi-tenant environment, also serving as a key for data retrieval filtering.

Ingest the vector data

To ingest data, create the knowledge base and configure the data source, embedding model, and vector store. Knowledge bases can be established through the Amazon Bedrock console or via code. For code-based creation, refer to the bedrock_knowledgebase_managed_rag.ipynb notebook, which details the necessary IAM policies, roles, and step-by-step instructions.

After configuring the knowledge base with the data source and vector store, begin uploading documents to an S3 bucket for ingestion into the vector store. Amazon Bedrock Knowledge Bases simplifies this process with a single API call:

upload_file_to_s3(
    "../multi_tenant_survey_reports/Home_Survey_Tenant1.pdf", bucket_name, object_name="multi_tenant_survey_reports/Home_Survey_Tenant1.pdf"
)

start_job_response = bedrock_agent_client.start_ingestion_job(
    knowledgeBaseId=kb_id, dataSourceId=ds_id
)
wait_for_ingestion(start_job_response)

Once the ingestion job is complete, query the Aurora PostgreSQL-compatible database to confirm that the vector embeddings have been successfully ingested and stored in the vector table. The number of rows created corresponds to the number of chunks the document was divided into, based on the standard fixed-size chunking strategy. Efficient chunking can significantly influence the quality of data retrieval. For further insights on chunking strategies, refer to How content chunking works for knowledge bases. Verify the stored data and chunk count by executing the following SQL command:

SELECT count(*) FROM aws_managed.kb ;
 count 
-------
     2
(1 row)

Preparing for prompt augmentation with vector similarity search

In RAG, a vector similarity search is performed to identify relevant source data (text chunks) that will enhance the prompt with domain-specific information before it is sent to the LLM. The user’s natural language question is first converted into vector embeddings, followed by a retrieval of vector data from the database that closely aligns with the input vector embedding.

Utilizing the Amazon Bedrock Knowledge Bases APIs abstracts the retrieval process based on the configured vector store, alleviating the need for complex SQL-based search queries. Below is an example showcasing the Retrieve API of Amazon Bedrock Knowledge Bases:

def retrieve(query, kbId, numberOfResults=5):
    response = bedrock_agent_runtime.retrieve(
        retrievalQuery={"text": query},
        knowledgeBaseId=kbId,
        retrievalConfiguration={
            "vectorSearchConfiguration": {"numberOfResults": numberOfResults}
        },
    )
    return response

This Retrieve API can be employed to extract vector data chunks from the Tenant1 document based on any natural language question posed by the user:

question = "What is the condition of the roof in my survey report?"
response = retrieve(question, kb_id)
print(response)

Building an augmented prompt

The subsequent step involves constructing the augmented prompt to be sent to a foundation model (FM). Within Amazon Bedrock, users can select from various foundation models. In this instance, we utilize Anthropic Claude on Amazon Bedrock to generate responses to user inquiries, enriched with contextual data. The retrieved data chunks from the vector store serve to enhance the prompt with pertinent domain-specific information prior to submission to the FM. Below is an example of invoking Anthropic’s Claude FM on Amazon Bedrock using the InvokeModel API:

Enforce multi-tenant data isolation

To delve into multi-tenancy and data isolation, additional tenants can be onboarded by uploading their documents into the knowledge base data source. Following the ingestion of these new documents, retrieval can be executed using the same natural language question as before. This will yield data chunks from all documents, as the data is pooled into a single table for all tenants. The output from the Retrieve API will reflect data chunks from multiple tenant documents:

question = "What is the condition of the roof in my survey report?"
response = retrieve(question, kb_id)
print(response)

Ensuring tenant data isolation is paramount in a multi-tenant SaaS deployment. The Retrieve API must be tenant-aware, retrieving only tenant-scoped data from the vector store. Amazon Bedrock Knowledge Bases features metadata and filtering, which can be utilized to implement tenant data isolation within vector stores. To enable filters, all documents in the data source must be tagged with their respective tenant metadata. Each document should include a metadata.json file containing its corresponding tenantid metadata tag, as illustrated in the following code:

Upon completing the tagging for all documents, upload the metadata.json file to the S3 bucket and ingest these metadata files into the knowledge base:

Next, update the retrieval function to incorporate filtering based on the tenantid tag. During retrieval, a filter configuration utilizes the tenantid to ensure that only tenant-specific data chunks are retrieved from the underlying vector store of the knowledge base. The following code demonstrates the updated retrieve function with metadata filtering enabled:

Finally, you can pose the same question and retrieve tenant-specific document chunks using the knowledge base’s metadata and filtering feature. The output will exclusively consist of document chunks pertinent to the tenantid specified as the filter key value.

Best practices for multi-tenant vector store deployments

When selecting a vector store for generative AI applications, various factors such as performance, indexing strategies, and semantic search capabilities warrant consideration. For further insights, refer to Key considerations when choosing a database for your generative AI applications.

When deploying a multi-tenant vector store solution in production, adhere to the following best practices for scaling:

Optimize chunk size and strategy according to your specific use case. Smaller chunk sizes suit smaller documents or scenarios where some context loss is acceptable, such as in simple Q&A applications. Conversely, larger chunks maintain extended context but may exceed model context limits and increase costs. For more details, see A practitioner’s guide to data for Generative AI.
Evaluate and validate an appropriate embedding model for your use case. The characteristics of the embedding model (including its dimensions) influence both query performance and search quality. Different models exhibit varying recall rates, with some smaller models potentially outperforming larger ones.
For highly selective queries that filter out most results, consider employing a B-tree index (e.g., on the tenantid attribute) to assist the query planner. For low-selectivity queries, an approximate index like HNSW may be more suitable. For further details on configuring HNSW indexes, see Best practices for querying vector data for gen AI apps in PostgreSQL.
Utilize Amazon Aurora Optimized Reads to enhance query performance when your vector workload surpasses available memory. For more information, see Improve the performance of generative AI workloads on Amazon Aurora with Optimized Reads and pgvector.
Monitor and optimize query performance using PostgreSQL query plans. For more details, see Monitor query plans for Amazon Aurora PostgreSQL.

In general, fully-managed features alleviate the burden of undifferentiated work, allowing you to concentrate on developing features that will enhance customer satisfaction.

Clean up

To prevent incurring future charges, ensure that all resources created in the prerequisites section and the knowledge bases established are deleted.

About the Authors

Josh Hart is a Principal Solutions Architect at AWS, specializing in assisting ISV customers in the UK with building and modernizing their SaaS applications on AWS.
Nihilson Gnanadason is a Senior Solutions Architect at AWS, dedicated to helping ISVs in the UK build, run, and scale their software products on AWS.

Tech Optimizer