Vector embeddings have revolutionized our engagement with unstructured data across various applications, primarily through the capabilities of generative AI. These embeddings serve as mathematical representations that facilitate semantic search, recommendation systems, and an array of natural language processing tasks by encapsulating the core meaning of text, images, and other content in a machine-readable format.
For organizations developing applications that leverage Retrieval-Augmented Generation (RAG) or other AI-driven solutions, it is crucial to maintain current vector embeddings. As new data is added or modified within a database, ensuring that the corresponding embeddings are generated promptly and accurately is vital for preserving the quality and relevance of AI functionalities.
While Amazon Bedrock provides managed RAG solutions that automate embedding generation and retrieval, many organizations have specific needs that prompt them to develop custom vector database solutions. This can be accomplished using PostgreSQL alongside the open-source pgvector extension. Such requirements may include seamless integration with existing applications, tailored performance optimizations, or specific data processing workflows.
Solution overview
When establishing a vector database utilizing Aurora PostgreSQL with the pgvector extension, a dependable system for creating and updating embeddings in response to data changes is essential. The general workflow encompasses the following steps:
- Identify when new or modified data necessitates embedding.
- Transmit the content to Amazon Bedrock embedding models.
- Receive the generated embedding vectors.
- Store these vectors alongside the original data.
This discussion employs Amazon Bedrock with the Amazon Titan foundation model (FM), which delivers production-grade vector embeddings without the burden of additional infrastructure management. Amazon Titan embeddings excel at capturing semantic relationships within text data, facilitating similarity searches and recommendations directly where the data resides.
While we focus on Amazon Titan for its balance of performance and simplicity for most production workloads, alternatives such as Cohere Embed or Anthropic’s Claude are also supported by Amazon Bedrock, allowing flexibility in selecting the embedding solution that aligns with your specific semantic search or document retrieval needs. Additionally, options like Amazon SageMaker AI with custom models or open-source solutions such as Sentence Transformers are available for smaller datasets.
Prerequisites
Before embarking on any of the implementation approaches discussed, ensure that you meet the following prerequisites:
The GitHub repository offers a preconfigured environment that fulfills these requirements. It contains an AWS Cloud Development Kit (AWS CDK) application that provisions the necessary components.
For detailed deployment instructions and access to the source code referenced, please consult the README.md file in the repository.
Implementation approaches
We delve into five distinct implementation approaches for automating this workflow, each characterized by unique attributes:
- Direct synchronous calls using database triggers and the aws_ml extension, offering simplicity and immediate consistency, albeit at the expense of slower performance during transactions.
- AWS Lambda-orchestrated synchronous calls via database triggers and the aws_lambda extension, providing enhanced separation of duties while still experiencing slower performance during transactions.
- Lambda-orchestrated asynchronous calls using event-driven invocations that enhance database performance, albeit with temporary inconsistency.
- Queue-based asynchronous processing utilizing Amazon SQS and batch processing, which offers superior scalability and resilience for high-volume scenarios, albeit introducing additional architectural components.
- Scheduled periodic asynchronous updates via the pg_cron extension, presenting a straightforward method for applications where real-time embedding updates are not critical.
For our tests, we utilize two database tables:
documents: This table stores document metadata and content, including fields for title, content, and processing status tracking (PENDING/PROCESSING/COMPLETED/ERROR).document_embeddings: This table retains vector embeddings (1536 dimensions for Titan) linked to documents through a foreign key.
We assume that client applications will store and modify text within the documents table. Further references to this setup can be found in the GitHub repository, which includes:
- The AWS CDK stack for deploying an Aurora Serverless PostgreSQL database with an Amazon EC2 bastion host.
- The
init-public.sqlscript for installing database extensions in thepublicdatabase schema. - A dedicated folder containing AWS CDK, SQL, and Lambda code for each approach.
Now, let us explore the five approaches and analyze their respective benefits and limitations.
Design considerations
When implementing these approaches in a production environment, it is advisable to carefully assess your specific requirements and consider the following limitations to design an architecture that best meets your needs:
- API rate limits – Amazon Bedrock imposes rate limits that vary by model and account. High-volume applications may necessitate request throttling or batching.
- Token limits – Text embedding models have maximum token limits. Extended text fields may require chunking strategies not addressed in these examples.
- Cost implications – Each approach carries distinct cost implications based on the frequency of API calls, Lambda invocations, and additional AWS services utilized.
- Latency requirements – The trade-off between real-time embedding generation and system performance must be evaluated according to your application needs and business requirements.
- Database performance – Synchronous approaches may impact database throughput and ingestion times, particularly during peak load periods.
- Error handling – More complex approaches provide enhanced error handling and retry capabilities.
Approach 1: Database triggers with the aws_ml extension (synchronous)
This approach employs PostgreSQL triggers to detect data changes and immediately invokes Amazon Bedrock using the aws_ml extension to generate embeddings. The workflow is illustrated in the accompanying diagram.
This trigger monitors the text column in the documents table. Whenever content changes, the trigger calls the store_embedding function, which performs the following actions:
- Generate vector embeddings by invoking the
generate_embeddingfunction. - Store the embedding results in the
document_embeddingstable.
The generate_embedding function executes a synchronous call to Amazon Bedrock, passing the modified text and record identifier as parameters. The following sequence diagram illustrates the step-by-step workflow, depicting how the function interacts with Amazon Bedrock and how the embedding is generated and returned:
The PostgreSQL trigger automatically executes in response to specific database events, such as:
- Data insertions (INSERT)
- Changes to existing records (UPDATE)
For this embedding generation workflow, we configure triggers to run after data insertions or updates to the content column. An example and full implementation can be found in the project repository under the dedicated folder, 01_rds_bedrock.
The database trigger invokes a function that utilizes the aws_ml PostgreSQL extension, which facilitates synchronous calls to Amazon Bedrock directly from within the Aurora database. To implement this, ensure that your cluster has a role with the necessary permissions associated with Amazon Bedrock.
Aurora supports the required minimum version 2.0 of the aws_ml extension starting from PostgreSQL versions 16.1, 15.5, and 14.10. The aws_ml 2.0 version provides two additional functions for invoking Amazon Bedrock services: aws_bedrock.invoke_model and aws_bedrock.invoke_model_get_embeddings. The following code snippet demonstrates how to utilize the aws_ml extension from within a database function and employ the result to store vector embeddings in the dedicated database table:
-- Function to generate embeddings directly in the database using Amazon Bedrock
CREATE OR REPLACE FUNCTION generate_embedding(input_text TEXT)
RETURNS vector(1536) AS $$
DECLARE
embedding_result vector(1536);
BEGIN
-- Call Amazon Bedrock to generate embedding
EXECUTE $embed$ SELECT aws_bedrock.invoke_model_get_embeddings(
model_id := 'amazon.titan-embed-text-v2:0',
content_type := 'application/json',
json_key := 'embedding',
model_input := json_build_object('inputText', )::text)$embed$
INTO embedding_result
USING input_text;
RETURN embedding_result;
END;
$$ LANGUAGE plpgsql;
-- Function to process the embedding result and store it
CREATE OR REPLACE FUNCTION store_embedding()
RETURNS TRIGGER AS $$
DECLARE
embedding_vector vector(1536);
BEGIN
-- Generate embedding using Bedrock
embedding_vector := generate_embedding(NEW.content);
-- Insert or update the embedding in document_embeddings table
INSERT INTO document_embeddings (document_id, embedding)
VALUES (NEW.id, embedding_vector)
ON CONFLICT (document_id)
DO UPDATE SET
embedding = embedding_vector,
updated_at = CURRENT_TIMESTAMP;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER trigger_store_embedding
BEFORE INSERT OR UPDATE OF content ON documents
FOR EACH ROW WHEN (OLD.content IS DISTINCT FROM NEW.content)
EXECUTE FUNCTION store_embedding();
This approach combines simplicity with real-time consistency by integrating functionality into database workflows. It eliminates the need for additional infrastructure while ensuring that content and vector representations remain synchronized. However, the synchronous processing model affects transaction times and presents scaling considerations, offering a straightforward implementation path with minimal development effort. The following table outlines additional pros and cons of this first approach.
| Pros | Cons |
| Minimalist implementation approach: This solution requires the fewest components compared to other solutions mentioned in this post, eliminating the need for external services or middleware layers and simplifying the debugging process. | Extended transaction duration: As embedding generation occurs synchronously within database transactions, insert and update operations may take significantly longer to complete, increasing lock contention and potentially impacting application performance, especially for operations modifying multiple rows simultaneously. |
| Real-time consistency: Embeddings are generated at the moment data is written, ensuring that vector representations are always in sync with the underlying content. This reduces scenarios where stale embeddings might exist, providing more accurate search and recommendation results immediately after content changes. | Timeout risks: When processing large documents or high transaction volumes, the time required for embedding generation can exceed database connection timeout settings, posing significant operational risks and potentially causing application errors or data inconsistencies if transactions are interrupted. |
| Simplicity: The architecture operates without additional AWS services beyond the database and Amazon Bedrock, reducing complexity and operational costs. This solution is particularly appealing for organizations with limited DevOps resources. | Limited error resilience: The trigger-based approach provides minimal capabilities for handling API errors, rate limiting, and retry logic. Failed embedding generation attempts can block critical database operations without built-in fallback mechanisms, necessitating custom error handling implementation. |
| Scaling limitations: The embedding generation workload scales directly with database write operations, creating a tight coupling between database activity and API usage. During high-traffic periods, this can lead to Amazon Bedrock API throttling or quota exceeding issues that directly impact database performance. |
Approach 2: Database triggers with the aws_lambda extension (synchronous)
This approach utilizes PostgreSQL triggers to invoke a Lambda function synchronously, which then calls Amazon Bedrock to generate embeddings. The workflow is illustrated in the accompanying diagram.
With this approach, embeddings for new content are automatically created through the following process:
- A database trigger activates whenever new content is inserted into or updated in the
documentstable. - For each newly inserted or updated row, the system synchronously invokes the Lambda function.
- The Lambda function utilizes Amazon Bedrock to generate embeddings.
- Generated embedding vectors are returned to the database trigger function.
- The embedding data is subsequently stored in the
document_embeddingstable.
This process occurs within the same database transaction, ensuring that documents are processed immediately while maintaining a separation between database and AI components.
When external systems need to react to database changes, invoking Lambda functions from within PostgreSQL can effectively decouple the embedding vector generation logic from the database.
Both Amazon Relational Database Service (Amazon RDS) for PostgreSQL and Aurora PostgreSQL-Compatible support this integration through the aws_lambda extension, which provides the invoke method. This capability alleviates the need for intermediate polling mechanisms or additional application logic.
In this approach, we employ the aws_lambda PostgreSQL extension’s invoke method with the RequestResponse invocation type parameter, ensuring synchronous execution so that the database waits for the Lambda function to return a response before proceeding with subsequent operations. The following code illustrates this in practice:
-- Create function to generate embeddings using Lambda
CREATE OR REPLACE FUNCTION generate_embeddings_from_lambda(text_content TEXT)
RETURNS vector(1536)
LANGUAGE plpgsql
AS $$
DECLARE
lambda_response JSON;
embedding_vector vector(1536);
BEGIN
-- Invoke Lambda function synchronously (RequestResponse)
SELECT payload FROM aws_lambda.invoke(
aws_commons.create_lambda_function_arn('arn:aws:lambda:::function:embeddings_function_sync'),
json_build_object('inputText', text_content)::json,
'RequestResponse'
) INTO lambda_response;
SELECT (lambda_response->>'body')::jsonb->'embedding'
INTO embedding_vector;
RETURN embedding_vector;
END;
$$;
The Lambda function code that follows invokes the Amazon Titan model through Amazon Bedrock. This pattern decouples the database from the large language model (LLM), allowing flexibility in utilizing various APIs and services for generating embedding vectors as needed:
async function generateEmbedding(text: string): Promise {
console.log('generateEmbedding - Input text:', text);
const command = new InvokeModelCommand({
modelId: 'amazon.titan-embed-text-v2:0',
contentType: 'application/json',
accept: 'application/json',
body: JSON.stringify({
inputText: text
}),
});
const response = await bedrockClient.send(command);
const responseBody = JSON.parse(new TextDecoder().decode(response.body));
console.log('generateEmbedding - Embedding length:', responseBody.embedding.length);
console.log('generateEmbedding - First few values:', responseBody.embedding.slice(0, 5));
return responseBody.embedding;
}
This approach, utilizing database triggers with the aws_lambda extension, decouples embedding generation from core database functions while maintaining synchronous processing. This architecture allows for more sophisticated processing and improved error handling through Lambda, although it still faces transaction duration challenges and introduces Lambda-specific considerations such as cold starts. The following table outlines additional pros and cons of this second approach.
| Pros | Cons |
| Logic decoupling: Separates embedding generation logic from database code, allowing for independent updates and management of each component. | Transaction blocking: Still blocks database transactions while waiting for both Lambda execution and Amazon Bedrock API responses. |
| Enhanced processing capabilities: Enables more complex preprocessing and postprocessing operations in Lambda using full programming languages rather than PL/pgSQL. | Lambda cold starts: Introduces additional latency when Lambda functions need to initialize from cold starts, particularly with infrequent writes. |
| Improved monitoring: Offers better error handling and observability through Amazon CloudWatch logs, metrics, and alarms for operational insights. | Timeout risks: Database operations may fail if Lambda execution exceeds configured timeouts during embedding generation. |
| Additional configuration: Requires setup of IAM permissions, VPC configurations, and network access between the database and Lambda. |
Approach 3: Database triggers with the aws_lambda extension (asynchronous)
This approach employs PostgreSQL triggers to invoke a Lambda function asynchronously, which subsequently generates embeddings and writes them back to the database. The workflow is illustrated in the accompanying diagram.
In this scenario, the database trigger invokes the Lambda function asynchronously, allowing it to return immediately without blocking the database transaction. Consequently, database operations can proceed without waiting for Amazon Bedrock to complete the embedding vector generation. This approach is advantageous when minimizing overhead in database transactions is a priority.
Once Amazon Bedrock generates the vector embeddings, the Lambda function utilizes the Amazon RDS Data API to write the results to the document_embeddings table. For a detailed understanding of this workflow, refer to the following sequence diagram, which illustrates each step of the embedding process.
The Amazon RDS Data API updates vector embeddings from the Lambda function. This API is an HTTP-based interface for accessing Amazon RDS databases without managing connections; therefore, a persistent connection to the database is not required. The Amazon RDS Data API simplifies IAM-based authentication without managing credentials, conserving database resources during idle periods and enhancing scalability.
To see this in action, here’s the Lambda function implementation code:
async function updateDatabaseEmbedding(documentId: string, embedding: number[]): Promise {
console.log('Updating database for document:', documentId);
const params = {
secretArn: process.env.DB_SECRET_ARN,
resourceArn: process.env.DB_CLUSTER_ARN,
database: process.env.DB_NAME,
sql: 'SELECT "03_rds_lambda_bedrock_async".update_document_embedding(:documentId::UUID, :embedding::vector)',
parameters: [
{
name: 'documentId',
value: { stringValue: documentId }
},
{
name: 'embedding',
value: { stringValue: `[${embedding.join(',')}]` }
}
]
};
console.log('RDS Params:', JSON.stringify(params, null, 2));
try {
const command = new ExecuteStatementCommand(params);
await rdsClient.send(command);
console.log('Successfully updated embedding for document:', documentId);
} catch (error) {
console.error('Error updating database:', error);
throw error;
}
}
This asynchronous Lambda approach prioritizes database performance by decoupling embedding generation from transaction processing. This creates a nonblocking architecture that significantly enhances write operation speed and eliminates timeout risks, although it introduces eventual consistency and more complex error handling patterns. This design is particularly well-suited for high-volume write scenarios where immediate embedding availability isn’t strictly required. The following table outlines additional pros and cons of this third approach.
| Pros | Cons |
| Nonblocking transactions: Database operations complete quickly without waiting for embedding generation, enhancing overall application responsiveness. | Eventual consistency: Data may temporarily exist without embeddings, creating a time window where vector search results might be incomplete or inaccurate. |
| Enhanced write performance: Database write operations complete faster, supporting higher throughput for content creation and updates. | Complex error handling: Managing and retrying failed Lambda invocations becomes more challenging without an immediate feedback loop. |
| Timeout elimination: No risk of transaction timeouts caused by Amazon Bedrock API latency since processing occurs after database commit. | Status tracking complexity: Monitoring embedding generation progress and handling edge cases where embeddings fail to generate becomes more difficult. |
| High-volume scalability: Better scalability for high-volume insert and update scenarios, with the embedding generation workload distributed over time. |
Approach 4: Amazon SQS queue with Lambda batch processing (asynchronous)
This approach leverages database triggers to send messages to an Amazon SQS queue, which are subsequently batch processed by a Lambda function that generates embeddings for multiple records simultaneously. The workflow is illustrated in the accompanying diagram.
In this solution, the trigger still invokes the Lambda function asynchronously, reducing transaction time after inserting text into the documents table. Incorporating Amazon SQS facilitates batch processing of LLM requests, enhancing the retry mechanism and allowing systems to operate independently while scaling during high-volume periods.
To implement this approach, you can refer to the repository instructions in the 04_rds_lambda_sqs section, which includes:
- AWS CDK code to implement this pattern.
- SQL code for triggers and procedures.
- TypeScript code for Lambda functions.
This architecture takes advantage of Amazon SQS’s flexible batching capabilities, allowing you to configure both the number of messages processed together and the time window for accumulating messages. You can tailor these parameters to your specific requirements, as demonstrated in the following AWS CDK code that creates a Lambda Function with SQS integration:
// Create Consumer Lambda
const consumerFunction = new nodejs.NodejsFunction(this, 'ConsumerFunction', {
runtime: lambda.Runtime.NODEJS_LATEST,
handler: 'index.handler',
functionName: 'embeddings_function_consumer',
entry: path.join(__dirname, '../lambda/consumer.ts'),
// additional configurations
});
// Add SQS event source to consumer lambda with batching
consumerFunction.addEventSource(new lambdaEventSources.SqsEventSource(queue, {
batchSize: 10, // Process up to 10 messages per batch
maxBatchingWindow: cdk.Duration.seconds(30), // Wait up to 30 seconds to gather messages
}));
This architecture optimizes for scale and resilience by introducing message queuing and batch processing. It can enhance cost-efficiency and throughput when managing high volumes of embedding requests while providing robust error handling through Amazon SQS’s built-in retry mechanisms. However, the trade-off involves increased latency between content creation and embedding availability, making it more suitable for high-scale production systems where operational robustness is paramount. The following table outlines additional pros and cons of this fourth approach.
| Pros | Cons |
| Scalability: Highly scalable and resilient architecture capable of managing production workloads with traffic spikes and sustained high volume. | Increased embedding latency: Longer delays between data insertion and embedding availability due to queuing and batch processing. |
| Efficient resource usage: Batching embedding requests to Amazon Bedrock reduces API calls and optimizes for cost and throughput limits. | Operational overhead: Additional AWS services increase monitoring requirements and operational complexity. |
| Built-in resilience: Amazon SQS provides automatic retry mechanisms with configurable visibility timeouts and dead-letter queue (DLQ) support. | Integration complexity: Requires careful monitoring and DLQ configuration to handle error cases appropriately. |
| Cost optimization: More cost-effective for high-volume scenarios through batching and efficient resource utilization. | |
| Workload management: Decoupling components allows for better rate limiting and load management across the system. |
Approach 5: Periodic updates scheduled with the pg_cron extension (asynchronous)
This approach employs pg_cron to schedule periodic jobs that check for new or modified records and generate embeddings in batches. pg_cron is an open-source, cron-based job scheduler for PostgreSQL that operates as an extension of the database. The workflow is illustrated in the accompanying diagram.
In this approach, documents are written with a status column set to PENDING to mark them for future processing. A pg_cron job is scheduled to run periodically (every two minutes in this case, but this is configurable) to perform the following actions:
- Fetch all rows requiring processing (SELECT … FOR NO KEY UPDATE SKIP LOCKED … WHERE Status = ‘PENDING’ LIMIT ). The job can be configured to fetch a limited number of rows (LIMIT clause) to standardize the batch size.
- Mark the status of the fetched rows as
PROCESSING. - Generate embedding vectors using Amazon Bedrock.
- Update the
document_embeddingtable with the resulting embedding values. - Update the
documentstable status toCOMPLETEDfor each processed row.
This approach introduces batch processing on the database side, which can scale further by scheduling additional cron jobs. Since this implementation does not utilize database triggers, it has no performance impact on the original database transactions that write documents. However, a few cons related to this approach are worth noting:
For implementation details of this setup, see the SQL code available in the 05_rds_polling section of the repository.
The following code illustrates how we configured the pg_cron schedule to execute every two minutes. pg_cron offers flexible scheduling options to customize job periodicity. You can use standard cron expressions, conveniently named schedules such as @hourly or @daily, or simple interval syntax like 1 minute or 1 day. You can adjust these parameters through the cron.schedule function to control execution frequency based on your workload requirements and processing windows.
Additionally, the pg_cron extension provides various administrative commands that allow you to:
- View currently scheduled jobs.
- Examine job execution outputs and details.
- Remove scheduled jobs when necessary.
-- Schedule the job to run every 2 minutes
SELECT cron.schedule('process_embeddings', '*/2 * * * * *', 'SELECT "05_rds_polling".process_embedding_queue()');
-- To check the scheduled job:
SELECT * FROM cron.job;
-- To check job runs:
SELECT * FROM cron.job_run_details;
-- To stop the job:
SELECT cron.unschedule('process_embeddings');
As previously mentioned, we utilize a dedicated function to retrieve all rows with PENDING status, iterating through them to generate embedding vectors for each document. Throughout this process, we track any processing errors, updating document statuses and maintaining an error counter. If errors are detected upon completion, we use the PostgreSQL RAISE function to notify administrators of issues. In a production environment, you would likely implement more sophisticated error handling strategies—such as an automatic retry mechanism with exponential backoff—alerting systems, or dedicated error logging to ensure reliable processing of all documents.
Let’s examine the complete implementation of the database function to better understand these concepts:
-- Function to process a batch of pending documents
CREATE OR REPLACE FUNCTION process_embedding_queue(batch_size INT DEFAULT 5)
RETURNS void
LANGUAGE plpgsql
AS $$
DECLARE
doc_record RECORD;
processed_count INT := 0;
error_count INT := 0;
BEGIN
SET search_path TO "06_rds_polling", public;
FOR doc_record IN
SELECT id
FROM documents
WHERE processing_status = 'PENDING'
ORDER BY created_at ASC
LIMIT batch_size
FOR NO KEY UPDATE SKIP LOCKED
LOOP
BEGIN
-- Update row setting status as 'PROCESSING'
UPDATE documents
SET processing_status = 'PROCESSING'
WHERE id = doc_record.id;
-- Generate embedding for each document
PERFORM generate_embedding(doc_record.id);
processed_count := processed_count + 1;
EXCEPTION WHEN OTHERS THEN
-- Log error and continue with next document
RAISE NOTICE 'Error processing document %: %', doc_record.id, SQLERRM;
error_count := error_count + 1;
-- Update document status to ERROR
UPDATE documents
SET processing_status = 'ERROR'
WHERE id = doc_record.id;
-- Continue with next document
CONTINUE;
END;
END LOOP;
-- Log processing summary
IF processed_count > 0 OR error_count > 0 THEN
RAISE NOTICE 'Embedding processing complete. Successfully processed: %, Errors: %',
processed_count, error_count;
END IF;
END;
$$;
The pg_cron approach offers a balanced solution that prioritizes database performance and operational simplicity over real-time consistency. By processing embeddings in scheduled batches, this method reduces API pressure and provides robust error handling while keeping the entire solution within the database ecosystem. The primary trade-off is increased latency between content updates and embedding availability, making it well-suited for systems where near real-time vector search is acceptable and operational simplicity is valued. The following table outlines additional pros and cons of this fifth approach.
| Pros | Cons |
| Self-contained architecture: Simplifies implementation with minimal external dependencies, keeping the entire solution within the database. | Increased update latency: Higher delay between data changes and embedding updates based on scheduled frequency. |
| Efficient batch processing: Improves throughput and cost-efficiency with the Amazon Bedrock API through optimized batch requests. | Database load impact: Resource-intensive periodic scans may affect database performance during busy periods. |
| Robust error management: Built-in error handling and retry logic with detailed status tracking for operational visibility. | Query complexity: Requires carefully optimized queries to efficiently identify changed records without excessive table scanning. |
| Processing control: Fine-grained control over processing frequency and batch size to balance resource usage and latency. | |
| API protection: Reduced risk of rate limiting or throttling through controlled, predictable API call patterns. |
Decision tree
After reviewing these five approaches for generating embeddings in Aurora PostgreSQL-Compatible, you may wonder which method is best suited for your specific use case. Here’s a practical decision tree to assist you in making your choice:
Remember, you can always start with a simpler approach, such as Approach 1 or 5, and evolve into a more sophisticated solution as your needs grow. The primary differences lie in how they handle scaling, reliability, and operational complexity.
Note: Regarding performance, indices are crucial in this context. Depending on the type of indices selected to index the vector embeddings, periodic maintenance may be necessary to keep these indices performing optimally, lest you face deteriorating performance or recall percentages for semantic searches.
For implementation details and code examples, refer to our GitHub repository and the additional resources listed in this post.
Clean up
To avoid unnecessary costs, please clean up the AWS resources you deployed as part of implementing the approaches discussed in this post:
- Delete the AWS CloudFormation stacks in the CloudFormation console.
- Delete any additional resources you created.