Self-managed multi-tenant vector search with Amazon Aurora PostgreSQL

February 21, 2025

Organizations are increasingly harnessing the capabilities of generative AI and machine learning (ML) tools to extract value from vast amounts of unstructured data, including text, images, and audio files. A particularly effective method for navigating this unstructured data landscape is the application of specialized machine learning models that convert source data, such as text, into vectors. This transformation allows for efficient searching and retrieval of information. With advancements in natural language processing (NLP), computer vision, and Retrieval Augmented Generation (RAG), vector-based searching has emerged as a vital resource for organizations aiming to fully leverage their data assets.

Amazon Aurora PostgreSQL-Compatible Edition, enhanced by the open-source pgvector vector search extension, provides a robust and scalable framework for establishing a vector store. This integration empowers developers to utilize the extensibility of PostgreSQL while seamlessly incorporating vector search functionalities into their generative AI applications.

Many organizations aspire to integrate generative AI applications into their software-as-a-service (SaaS) offerings. These applications typically adopt a “multi-tenant” architecture, which ensures that the data of one tenant (for instance, a customer of the SaaS application) remains isolated from that of others, thereby adhering to security and privacy standards. In a multi-tenant setup, multiple clients share the same application instance and often the same database. Without adequate isolation measures, there exists a risk that one tenant could inadvertently access or alter another tenant’s data, potentially leading to significant privacy violations and security issues. When querying to retrieve vector embeddings, it is essential that these queries are tenant-aware, ensuring that only tenant-specific data is retrieved from the vector store, necessitating a robust mechanism for enforcing tenant isolation.

Solution overview

Consider a multi-tenant scenario where users submit home survey requests for properties they intend to purchase. Home surveyors conduct evaluations of these properties and subsequently update their findings. The resulting home survey report, containing the updated findings, is stored in an Amazon Simple Storage Service (Amazon S3) bucket. The home survey company aims to introduce a feature that enables users to pose natural language inquiries regarding the property. To facilitate this, embedding models are employed to convert the home survey documents into vector embeddings. These vector embeddings, along with the original document data, are then ingested into a vector store. The RAG approach further enriches the prompt sent to the large language model (LLM) with contextual data, allowing for more accurate responses to user queries.

There are two primary methodologies for constructing a vector-based application. The first is a self-managed approach, wherein developers directly manage the conversion of vector embeddings and craft SQL queries to insert and retrieve data from the vector table. This method places the onus of pipeline management on the developer. For those seeking a more streamlined experience, a fully-managed approach utilizing Amazon Bedrock Knowledge Bases is available. This option alleviates complexities such as vector embedding conversion, SQL query management, and data pipeline integration, offering a low-code implementation (discussed in Part 2).

This article focuses on the self-managed approach, which entails writing code to process structured or unstructured data, segment it into smaller chunks, send it to an embedding model, and subsequently store the resulting vector embeddings in Amazon Aurora. An illustrative implementation using Aurora PostgreSQL-Compatible is depicted in the accompanying diagram.

The architecture’s high-level steps include:

Data ingestion via an AWS Lambda function.
The Lambda function segments the document into smaller chunks and invokes an embedding model in Amazon Bedrock to convert each chunk into vector embeddings.
The vector embeddings are stored in Aurora using pgvector.
A user submits a natural language query.
A query processor Lambda function calls an embedding model in Amazon Bedrock to convert the natural language query into embeddings.
A SQL query is executed against the vector store to retrieve similar documents.
The matching documents are sent to a text-based model in Amazon Bedrock to enhance the response.
The final response is delivered to the user.

Key terminologies related to vectors include:

Vector embeddings – Numerical floating-point representations of data in a multi-dimensional format, encapsulating meaningful information and semantic relationships.
Embedding model – Models such as Amazon Titan Embeddings G1 that convert raw data into vector embeddings.
Dimension – The size of the vector output from the embeddings model. Amazon Titan Embeddings V2 supports flexible dimensions (1,024, 512, 256).
Chunking – The process of dividing data into smaller pieces prior to conversion into vector embeddings, enhancing retrieval efficiency. For more on chunking strategies, refer to How content chunking and parsing works for Amazon Bedrock knowledge bases.
pgvector – An open-source PostgreSQL extension that supports vector and similarity search capabilities.
HNSW index – A graph-based indexing method that clusters similar vectors into increasingly dense layers, facilitating efficient searches over smaller vector sets.
IVFFlat index – An indexing method that categorizes vectors into lists, enabling searches within a subset of those lists closest to the query vector.

The subsequent sections will guide you through the steps necessary to create a vector store, ingest and retrieve vector data, and enforce multi-tenant data isolation.

Prerequisites

To follow the steps outlined in this article, you will require the following resources:

Additionally, clone the AWS samples repository for data-for-saas-patterns and navigate to the folder samples/multi-tenant-vector-database/amazon-aurora/self-managed:

git clone https://github.com/aws-samples/data-for-saas-patterns.git
cd samples/multi-tenant-vector-database/amazon-aurora/self-managed

Create a vector store with Aurora PostgreSQL-compatible

Begin by configuring the Aurora PostgreSQL database to enable the pgvector extension and establish the necessary schema for the vector store. Ensure you utilize a user with rds_superuser privileges to enable the pgvector extension. The steps may vary slightly between the self-managed and fully managed approaches, allowing for distinct schema and table names for separate testing. Execute all SQL commands from the 1_build_vector_db_on_aurora.sql script using psql or the Amazon RDS console query editor to configure the vector store:

Create and verify the pgvector extension (requires rds_superuser privileges). The following query returns the installed version of pgvector:

CREATE EXTENSION IF NOT EXISTS vector;
SELECT extversion FROM pg_extension WHERE extname='vector';

extversion
-----------
0.7.0

Create a schema and the vector table (requires database owner privileges):

CREATE SCHEMA self_managed;
CREATE TABLE self_managed.kb (id uuid PRIMARY KEY, embedding vector(1024), 
chunks text, metadata jsonb, tenantid bigint);

Create the index:

CREATE INDEX on self_managed.kb USING hnsw (embedding vector_cosine_ops);

Enable row-level security:

ALTER TABLE self_managed.kb enable row level security;
CREATE POLICY tenant_policy ON self_managed.kb USING 
(tenantid = current_setting('self_managed.kb.tenantid')::bigint);

Create the app_user role and grant permissions:

CREATE ROLE app_user LOGIN;
password app_user
GRANT ALL ON SCHEMA self_managed to app_user;
GRANT SELECT ON TABLE self_managed.kb to app_user;

Upon executing all commands, the schema should encompass the vector table, index, and row-level security policy:

d self_managed.kb;
                      Table "self_managed.kb"
  Column   |         Type          | Collation | Nullable | Default 
-----------+-----------------------+-----------+----------+---------
 id        | uuid                  |           | not null | 
 embedding | vector(1024)          |           |          | 
 chunks    | text                  |           |          | 
 metadata  | jsonb                 |           |          | 
 tenantid  | bigint                |           |          | 
Indexes:
    "kb_pkey" PRIMARY KEY, btree (id)
    "kb_embedding_idx" hnsw (embedding vector_cosine_ops)
Policies:
    POLICY "tenant_policy"
      USING (((tenantid)::text = ((current_setting('self_managed.kb.tenantid'::text))::character varying)::text))

The fields in the vector table are defined as follows:

id – The UUID field serves as the primary key for the vector store.
embedding – The vector field designated for storing vector embeddings, with the argument 1024 indicating the number of dimensions.
chunks – A text field for storing raw text from the source data in chunks.
metadata – A JSON metadata field (utilizing the jsonb data type) for storing source attribution, particularly relevant when employing managed ingestion through Amazon Bedrock Knowledge Bases.
tenantid – This field identifies and associates data and chunks with specific tenants in a multi-tenant SaaS environment.

To enhance retrieval performance, pgvector supports HNSW and IVFFlat index types. Starting with a HNSW index is generally simpler due to its ease of management and superior query performance. For highly selective queries that filter out most results, consider using a b-tree index or iterative index scans. For further insights into these indexing techniques, refer to A deep dive into IVFFlat and HNSW techniques. For the latest updates, consider checking out DAT423.

Ingest the vector data

The self-managed ingestion process involves coding to take your data, send it to an embedding model, and store the resulting vector embeddings in your vector store. Embeddings serve as numerical representations of real-world objects in a multi-dimensional space, capturing the properties and relationships of real-world data. To convert data into embeddings, the Amazon Titan Text Embeddings model can be utilized. Below is an example code snippet demonstrating how to convert text data into vector embeddings using the Amazon Titan embeddings model. The sample code for the self-managed approach can be found in the notebook 2_sql_based_self_managed_rag.ipynb.

def generate_vector_embeddings(data):
    body = json.dumps({
        "inputText": data,
    })

    # Invoke embedding model 
    response = bedrock_runtime.invoke_model(
        body=body, 
        modelId='amazon.titan-embed-text-v2:0' , 
        accept='application/json', 
        contentType='application/json'
    )
    
    response_body = json.loads(response['body'].read())
    embedding = response_body.get('embedding')
    return embedding

With the function for generating vector embeddings established, the next step is to insert these embeddings into the database. The RDS Data API can be employed to execute insert queries and simplify the connection to the Aurora database cluster. This API streamlines secure network access to your database, eliminating the need to manage connections and drivers for the Aurora PostgreSQL database. Below is the function for inserting vector embeddings using the RDS Data API:

def insert_into_vector_db(embedding, chunk, metadata, tenantid):
    # Insert query parameters
    params = []
    params.append({"name": "id", "value": {"stringValue": str(uuid.uuid4())}})
    params.append({"name": "embedding", "value": {"stringValue": str(embedding)}})
    params.append({"name": "chunks", "value": {"stringValue": chunk}})
    params.append({"name": "metadata", "value": {"stringValue": json.dumps(metadata)}, "typeHint": "JSON"})
    params.append({"name": "tenantid", "value": {"longValue": tenantid}})

    # Invoke the Insert query using RDS Data API
    response = rdsData.execute_statement(
        resourceArn=cluster_arn,
        secretArn=secret_arn,
        database=db_name,
        sql="INSERT INTO self_managed.kb(id, embedding, chunks, metadata, tenantid) VALUES (:id::uuid,:embedding::vector,:chunks, :metadata::jsonb, :tenantid::bigint)",
        parameters=params
    )
    return response

The input document must then be divided into smaller chunks. In this instance, PyPDFLoader is utilized to load and parse PDF documents, while the LangChain framework’s RecursiveCharacterTextSplitter class is employed for chunking the documents. Developers can select their chunking strategy and implement custom code as needed.

# Load the document
file_name = "../multi_tenant_survey_reports/Home_Survey_Tenant1.pdf"
loader = PyPDFLoader(file_name)
doc = loader.load()

# Split documents into chunks
text_splitter = RecursiveCharacterTextSplitter(
   chunk_size=10000,
   chunk_overlap=150
)
chunks = text_splitter.split_documents(doc)

# Generate vector embeddings and insert into vector db
for chunk in chunks:
   embedding = generate_vector_embeddings(chunk.page_content)
   insert_response = insert_into_vector_db(embedding, chunk.page_content, file_name, "Tenant1")

After inserting the vector embeddings, connect to the Aurora PostgreSQL database using psql to verify that the data has been successfully added to the vector table:

CREATE EXTENSION IF NOT EXISTS vector;
SELECT extversion FROM pg_extension WHERE extname='vector';

extversion
-----------
0.7.0

Retrieve the vector data

The retrieval of vector data plays a crucial role in the RAG process, enhancing the prompt with domain-specific information before it is sent to the LLM. The natural language question posed by the end-user is first converted into a vector embedding. The query retrieves vectors from the database that are most similar to the input vector embedding, measured by the distance between the vectors, returning the closest matches. The pgvector extension supports various distance functions (including L2 distance, inner product, cosine distance, L1 distance, Hamming distance, and Jaccard distance) for identifying nearest neighbor matches to a vector.

The following function illustrates self-managed data retrieval using the cosine distance operator ():

CREATE EXTENSION IF NOT EXISTS vector;
SELECT extversion FROM pg_extension WHERE extname='vector';

extversion
-----------
0.7.0

By combining the generate_vector_embeddings() and query_vector_database() retrieval functions, developers can implement self-managed vector searches for any user query:

CREATE EXTENSION IF NOT EXISTS vector;
SELECT extversion FROM pg_extension WHERE extname='vector';

extversion
-----------
0.7.0

Augment the vector data

The subsequent step involves augmenting the raw data into the prompt sent to an LLM. Within Amazon Bedrock, users can select from various foundation models (FMs). This example utilizes Anthropic Claude on Amazon Bedrock to generate responses to user inquiries, supplemented with augmented context. The data chunks retrieved from the vector store can enhance the prompt with contextual and domain-specific information prior to submission to the LLM. Below is an example of how to invoke the LLM on Amazon Bedrock using the InvokeModel API:

CREATE EXTENSION IF NOT EXISTS vector;
SELECT extversion FROM pg_extension WHERE extname='vector';

extversion
-----------
0.7.0

Initially, generate the vector embedding of the user’s natural language question. Subsequently, query the vector store to retrieve all data chunks that are semantically related to the generated embedding. These retrieved data chunks are then incorporated into the prompt as context before being sent to the LLM. The following code illustrates how to define the prompt and augment it with domain data retrieved from the vector store. This example is also accessible in the samples repository:

CREATE EXTENSION IF NOT EXISTS vector;
SELECT extversion FROM pg_extension WHERE extname='vector';

extversion
-----------
0.7.0

Enforce multi-tenant data isolation

To explore multi-tenancy and data isolation, additional tenants can be onboarded by inserting their documents into the vector store:

CREATE EXTENSION IF NOT EXISTS vector;
SELECT extversion FROM pg_extension WHERE extname='vector';

extversion
-----------
0.7.0

After inserting new documents, retrieve the vector data using the same question as in previous examples. This will yield data chunks from all documents, as the data is pooled within a single vector store for all tenants. The sample output of the following query will demonstrate that the retrieved data contains chunks from multiple tenant documents:

CREATE EXTENSION IF NOT EXISTS vector;
SELECT extversion FROM pg_extension WHERE extname='vector';

extversion
-----------
0.7.0

In this self-managed approach, PostgreSQL’s built-in row-level security is utilized to achieve tenant isolation. This mechanism provides an additional layer of protection against misconfigurations and prevents data from crossing tenant boundaries.

For insights on using row-level security with the RDS Data API for multi-tenant data access, refer to Enforce row-level security with the RDS Data API.

To implement tenant isolation, the query_vector_database() function must be updated to support the row-level security feature of PostgreSQL. As part of querying the vector store, the current tenant ID must be set for the request using the SET command in conjunction with the SQL command to query the vector table. Review the modified query_vector_database_using_rls() function that enables tenant isolation:

CREATE EXTENSION IF NOT EXISTS vector;
SELECT extversion FROM pg_extension WHERE extname='vector';

extversion
-----------
0.7.0

Finally, submit the same question to retrieve tenant-specific document chunks using PostgreSQL’s row-level security feature. The output will reflect document chunks specific to the tenant ID provided as the filter key:

CREATE SCHEMA self_managed;
CREATE TABLE self_managed.kb (id uuid PRIMARY KEY, embedding vector(1024), 
chunks text, metadata jsonb, tenantid bigint);

Clean up

To prevent incurring future charges, it is advisable to delete all resources created in the prerequisites section.

Tech Optimizer