Supercharging vector search performance and relevance with pgvector 0.8.0 on Amazon Aurora PostgreSQL

June 1, 2025

Efficient vector similarity search has emerged as a cornerstone for the successful implementation of semantic search, recommendation systems, and Retrieval Augmented Generation (RAG). The Amazon Aurora PostgreSQL-Compatible Edition has recently integrated support for pgvector 0.8.0, which significantly enhances vector search capabilities, positioning Aurora as a prime choice for AI-driven applications leveraging PostgreSQL that require semantic search and RAG.

This article delves into how pgvector 0.8.0 on Aurora PostgreSQL-Compatible achieves up to 9x faster query processing and delivers 100x more relevant search results, effectively addressing the scaling challenges faced by enterprise AI applications when implementing vector search at scale.

pgvector 0.8.0 improvements

As vector databases become integral infrastructure components, the ability to perform effective vector searches is essential for powering semantic applications. As organizations expand their AI applications to handle millions or even billions of vectors, the limitations of previous vector search implementations become increasingly evident. pgvector 0.8.0 introduces several crucial enhancements that directly address these production challenges, particularly when executing filtered queries against large datasets:

Performance improvements – pgvector 0.8.0 boasts up to a 5.7x improvement in query performance for specific query patterns compared to version 0.7.4. Detailed exploration of these enhancements follows.
Complete result sets – The new iterative_scan feature in 0.8.0 enhances recall for filter queries requiring an approximate nearest neighbor (ANN) index search, a vital improvement over earlier versions that could yield incomplete results.
Enhanced query planning – Improved cost estimation in 0.8.0 facilitates more efficient execution paths, such as selecting a traditional index like a B-tree for complex filtered searches.
Flexible performance tuning – The introduction of iterative_scan in two modes—relaxed_order and strict_order—allows for tunable accuracy in relation to performance trade-offs.

Challenges of overfiltering

Understanding the significance of this release necessitates an awareness of a fundamental challenge many developers face when transitioning to production with vector searches. In previous iterations of pgvector, combining vector similarity search with traditional SQL filters resulted in filtering occurring after the vector index scan had completed. This led to a phenomenon known as overfiltering, where queries returned fewer results than anticipated, or even none at all. This approach also introduced performance and scalability issues, as the system would retrieve numerous vectors only to discard most during filtering.

Consider an e-commerce service with millions of product embeddings. When searching for “summer dresses” with filters for “women’s clothing” and “size medium,” earlier versions of pgvector would execute the following steps:

Scan the vector index to identify the nearest neighbors to “summer dresses.”
Apply SQL filters such as category = “women’s clothing” and size = “medium” to those neighbors.
Return the remaining results, which could be insufficient or even empty, especially if the filters matched only a small fraction of the data.

The HNSW (Hierarchical Navigable Small World) indexing algorithm in pgvector accelerates vector similarity searches by creating a multi-layered graph structure where vectors are interconnected with their nearest neighbors, enabling efficient navigation through the vector space. With an HNSW index using default search settings (hnsw.ef_search = 40), if only 10% of the data matched the filter, approximately four usable results would be returned, regardless of the number of relevant vectors stored.

Iterative index scans

pgvector 0.8.0 introduces iterative index scans, which significantly enhance query reliability and performance in filtered vector searches. The process unfolds as follows:

Scan the vector index.
Apply any filters (e.g., metadata conditions).
Check if enough results meet both the vector similarity and filter criteria.
If not, continue scanning incrementally until either the required number of matches is found or a configurable limit is reached.

This method prevents premature termination due to overly strict filters (a common issue in prior versions), reducing false negatives and enhancing performance by avoiding full rescans or returning too few results. This feature is particularly beneficial for production-grade vector search applications with complex filtering requirements. For instance, let’s create a table with sample product data:

CREATE TABLE products (
    id bigint GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
    title TEXT,
    description TEXT,
    category TEXT,
    embedding VECTOR(384)
);

-- Create an index on the embedding column using HNSW
CREATE INDEX products_embedding_idx ON products USING hnsw (embedding vector_cosine_ops);

-- Also create an index on the category column for efficient filtering
CREATE INDEX ON products (category);

Now, imagine we have populated this table with tens of millions of product embeddings from various categories. When a user searches for products similar to “comfortable hiking boots” but wants only items from the outdoor gear category, they would execute a query like the following:

SELECT 
    product_name, 
    category, 
    embedding  '[vector for "comfortable hiking boots"]' AS distance
FROM products 
WHERE category = 'outdoor gear'
ORDER BY distance
LIMIT 20;

Before pgvector 0.8.0

With previous versions, if you had 10 million products but only 50,000 were outdoor gear in stock (0.5%), the default HNSW scan would likely return only a few results, missing many relevant products. The workarounds were far from optimal:

Increase hnsw.ef_search to scan more vectors (which negatively impacted performance).
Create separate indexes for each category (complex to maintain).
Implement application-level paging (adding unnecessary complexity).

With pgvector 0.8.0 on Aurora PostgreSQL

Let’s enable iterative scanning and observe the difference:

-- Enable iterative scanning
SET hnsw.iterative_scan = 'relaxed_order';

-- Run the same query
SELECT 
    product_name, 
    category, 
    embedding  '[vector for "comfortable hiking boots"]' AS distance
FROM products 
WHERE category = 'outdoor gear'
ORDER BY distance
LIMIT 20;

Now, pgvector automatically continues scanning the index until it finds enough results to satisfy the query, ensuring users see a complete and relevant set of results while maintaining performance. The threshold for “enough” is configurable, allowing control over how many tuples the system will scan before stopping. For HNSW indexes, this is governed by the hnsw.max_scan_tuples parameter, which defaults to 20,000. Adjusting this based on your dataset and performance goals can yield fine-grained control over the trade-off between recall (the percentage of relevant results actually found) and performance during filtered vector searches.

Note: When using relaxed_order, a final reorder operation may be necessary to ensure proper ordering:

SELECT * FROM (
  -- original query
) p ORDER BY p.distance * 1;

Configuration options for iterative scanning

pgvector 0.8.0 provides three modes for iterative scanning:

off – Traditional behavior, no iterative scanning (default).
strict_order – Iteratively scan while preserving exact distance ordering.
relaxed_order – Iteratively scan with approximate ordering (better performance).

For most production use cases, relaxed_order strikes the best balance between performance and accuracy. This mode allows pgvector to prioritize speed by returning results as they are discovered rather than sorting them perfectly, significantly reducing query latency while typically maintaining 95-99% of result quality compared to strict ordering. In scenarios where sub-second response times are critical (such as recommendation systems and semantic search), this trade-off offers substantial performance gains with minimal impact on user experience. Additionally, the hnsw.scan_mem_multiplier parameter can be configured to enhance recall, specifying the maximum amount of memory to use as a multiple of work_mem (default is 1).

Scaling RAG applications on Aurora PostgreSQL-Compatible

To illustrate how these improvements affect a real-world RAG application, consider an online marketplace with 10 million products, each represented by a 384-dimensional vector embedding derived from product descriptions. Customers can search across the entire catalog or filter by category, price range, or rating. With previous versions of pgvector, filtered searches often missed relevant products unless parameters were meticulously tuned for each query pattern. However, with pgvector 0.8.0 on Aurora PostgreSQL-Compatible, the database automatically adjusts to produce complete results.

Benchmark setup

We created a synthetic dataset of 10 million products with realistic e-commerce characteristics spanning multiple categories. For reproducibility, here’s how we generated the dataset:

Data Generation Process

Product Metadata Generation: Using a Python script with libraries like faker and numpy, we generated realistic product metadata:

import pandas as pd
import numpy as np
from faker import Faker
from sentence_transformers import SentenceTransformer
import random

# Initialize faker for generating realistic text
fake = Faker()
# Initialize the sentence transformer model
model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')

# Define product categories with realistic distribution
categories = {
    'electronics': 0.20,  # 20% of products
    'clothing': 0.25,
    'home_goods': 0.15,
    'beauty': 0.10,
    'books': 0.05,
    'toys': 0.05,
    'sports': 0.05,
    'grocery': 0.10,
    'office': 0.05
}

# Generate synthetic product data
num_products = 10_000_000  # 10 million products
products = []
for i in range(num_products):
    # Select category based on distribution
    category = np.random.choice(
        list(categories.keys()), 
        p=list(categories.values())
    )
    
    # Generate title with category context
    title = fake.catch_phrase()
    
    # More detailed description
    description = fake.paragraph(nb_sentences=3)
    
    # Add some "smart" products for testing Query D
    if i % 50 == 0:  # 2% of products have "smart" in the title
        title = "Smart " + title
        
    products.append({
        'id': i+1,
        'title': title,
        'description': description,
        'category': category,
        # Embedding will be added in the next step
    })
    
    # Print progress
    if i % 100000 == 0:
        print(f"Generated {i} products")

# Convert to DataFrame
df = pd.DataFrame(products)

Embedding Generation: We generated 384-dimensional embeddings using the all-MiniLM-L6-v2 SentenceTransformer model:

# Generate embeddings in batches to manage memory
batch_size = 1000
for i in range(0, len(df), batch_size):
    end = min(i + batch_size, len(df))
    batch = df.iloc[i:end]
    
    # Generate embeddings from title + description
    texts = [f"{row.title}. {row.description}" for _, row in batch.iterrows()]
    embeddings = model.encode(texts)
    
    # Store embeddings
    for j, embedding in enumerate(embeddings):
        df.at[i+j, 'embedding'] = embedding.tolist()
    
    print(f"Generated embeddings for products {i} to {end}")

Data Loading to PostgreSQL: We utilized PostgreSQL’s COPY command for efficient data loading:

# Export data to CSV file (without embeddings column for faster export)
csv_file = "product_metadata.csv"
df[['id', 'title', 'description', 'category']].to_csv(csv_file, index=False)

# Export embeddings separately as binary file for efficient loading
embedding_file = "product_embeddings.binary"
with open(embedding_file, 'wb') as f:
    for _, row in df.iterrows():
        embedding = np.array(row['embedding'], dtype=np.float32)
        f.write(embedding.tobytes())

-- SQL to load data
CREATE TABLE products (
    id bigint GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
    title TEXT,
    description TEXT,
    category TEXT,
    embedding vector(384)
);

-- Load metadata using COPY
COPY products (id, title, description, category)
FROM '/path/to/product_metadata.csv' 
CSV HEADER;

-- Load embeddings using custom function that reads binary data
SELECT load_embeddings('/path/to/product_embeddings.binary', 'products', 'embedding', 384);

The dataset we generated exhibited the following characteristics:

10 million products across 9 categories with a realistic distribution.
384-dimensional embeddings generated from product titles and descriptions.
2% of products containing “smart” in the title for filtered query testing.
Natural language text generated using Faker to ensure variety and realistic content.

Additionally, a B-tree index on the category column was included to optimize filter operations commonly used in vector similarity searches. This dataset mirrors what organizations build for comprehensive product search systems. This setup can be reproduced using the code snippets above, adjusting the scale as needed for your testing environment.

For these tests, we used a product catalog schema:

CREATE TABLE products (
    id bigint GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
    title TEXT,
    description TEXT,
    category TEXT,
    embedding vector(384)
);

-- Create HNSW index for vector similarity search
CREATE INDEX products_embedding_idx ON products 
    USING hnsw (embedding vector_cosine_ops);

-- Create index on category for efficient filtering
CREATE INDEX ON products (category);

We executed the following sample queries:

Query A – Basic search (top 10):

SELECT 
    id, 
    title, 
    description, 
    embedding  %s::vector AS distance
FROM products
ORDER BY distance
LIMIT 10;

Query B – Large result set (top 1,000):

SELECT 
    id, 
    title, 
    description, 
    embedding  %s::vector AS distance
FROM products
ORDER BY distance
LIMIT 1000;

Query C – Category-filtered search:

SELECT 
    product_name, 
    category, 
    embedding  '[vector for "comfortable hiking boots"]' AS distance
FROM products 
WHERE category = 'outdoor gear'
ORDER BY distance
LIMIT 20;

Query D – Complex filtered search:

SELECT 
    product_name, 
    category, 
    embedding  '[vector for "comfortable hiking boots"]' AS distance
FROM products 
WHERE category = 'outdoor gear'
ORDER BY distance
LIMIT 20;

Query E – Very large result set (10,000):

SELECT 
    product_name, 
    category, 
    embedding  '[vector for "comfortable hiking boots"]' AS distance
FROM products 
WHERE category = 'outdoor gear'
ORDER BY distance
LIMIT 20;

Testing methodology

The benchmark was designed to replicate real-world vector search scenarios while providing consistent measurements:

Infrastructure – Two separate Aurora PostgreSQL clusters running on db.r8g.4xlarge instances (powered by AWS Graviton4 processors).
Dataset – 10 million products with 384-dimensional embeddings.
Index configuration – HNSW indexes with identical parameters across tests for fair comparison.
Cache management – Buffer cache cleared between tests to provide consistent cold-start performance.
Query Runs – Queries A, B, and C were executed 100 times each, while the more intensive Queries D and E were run 20 and 5 times, respectively, with reported latency values representing the average across the runs to provide statistical significance and minimize the impact of outliers.
Test configurations – We employed the following configurations:
- 0.7.4 baseline: ef_search=40
- 0.7.4: ef_search=200
- 0.8.0 baseline: ef_search=40, iterative_scan=off
- 0.8.0: ef_search=40, iterative_scan=strict_order
- 0.8.0: ef_search=40, iterative_scan=relaxed_order
- 0.8.0: ef_search=200, iterative_scan=strict_order
- 0.8.0: ef_search=200, iterative_scan=relaxed_order

Performance improvements

Our performance tests revealed significant enhancements with pgvector 0.8.0 across various query patterns. The following table displays p99 latency measurements (in milliseconds) for different configurations.

Query Type	0.7.4 baseline (ef_search=40)	0.7.4 (ef_search=200)	0.8.0 best config	Best configuration	Improvement
A	123.3 ms	394.1 ms	13.1 ms	ef_search=40, relaxed_order	9.4x faster
B	104.2 ms	341.4 ms	83.5 ms	ef_search=200, relaxed_order	1.25x faster
C	128.5 ms	333.4 ms	85.7 ms	ef_search=200, relaxed_order	1.5x faster
D	127.4 ms	318.6 ms	70.7 ms	ef_search=200, relaxed_order	1.8x faster
E	913.4 ms	427.4 ms	160.3 ms	ef_search=200, relaxed_order	5.7x faster

The performance improvements with pgvector 0.8.0 were substantial across various query patterns, even at the scale of 10 million products. For typical e-commerce queries that search within specific categories for products matching certain criteria, runtime decreased from over 120 milliseconds with pgvector 0.7.4 to just 70 milliseconds with 0.8.0, while returning more comprehensive results. Notably, pgvector 0.8.0’s enhanced cost estimation capabilities automatically selected more efficient execution plans. In our filtered query tests, the planner accurately estimated costs, leading to better execution plan selections and more complete result sets.

Recall and result completeness enhancements

Beyond raw performance, the substantial improvement in result quality when working with millions of vectors is noteworthy. Our tests demonstrated significant differences in result completeness. Recall refers to the ratio of X out of Y expected results returned, with 100% indicating perfect recall:

Query	0.7.4 baseline (ef_search=40)	0.7.4 (ef_search=200)	0.8.0 with strict_order	0.8.0 with relaxed_order
Category-filtered search	10%	0%	100%	100%
Complex filtered search	1%	0%	100%	100%
Very large result set	5%	5%	100%	100%

For highly selective queries (such as products in a specific category), pgvector 0.7.4 returned only a fraction of the requested results. With iterative scanning enabled in 0.8.0, we observed up to a 100 times improvement in result completeness, substantially enhancing the user experience. The following query pattern tested these improvements:

SELECT 
    product_name, 
    category, 
    embedding  '[vector for "comfortable hiking boots"]' AS distance
FROM products 
WHERE category = 'outdoor gear'
ORDER BY distance
LIMIT 20;

Different iterative scan modes and ef_search values

A detailed comparison of various pgvector 0.8.0 configurations was conducted to understand the trade-offs between different iterative scan modes and ef_search values.

Configuration	Query A (top 10)	Query B (top 1000)	Query C (filtered)	Query D (complex)	Query E (large)
0.8.0 baseline (ef_search=40, iterative_scan=off)	19.3 ms	18.8 ms	20.0 ms	15.7 ms	99.8 ms
0.8.0 (ef_search=40, iterative_scan=strict_order)	18.1 ms	277.9 ms	197.1 ms	203.2 ms	344.0 ms
0.8.0 (ef_search=40, iterative_scan=relaxed_order)	13.1 ms	164.1 ms	150.8 ms	99.1 ms	397.9 ms
0.8.0 (ef_search=200, iterative_scan=strict_order)	28.8 ms	133.7 ms	128.5 ms	57.9 ms	207.6 ms
0.8.0 (ef_search=200, iterative_scan=relaxed_order)	30.7 ms	83.5 ms	85.7 ms	70.7 ms	160.3 ms

This breakdown illustrates how different combinations affect performance across query types. For simple queries (A), a lower ef_search with relaxed_order yields the best performance. For complex filtered queries (C, D) and large result sets (B, E), higher ef_search values with relaxed_order typically provide the best balance of performance and completeness. The relaxed_order mode significantly enhances performance for most query types while still delivering complete result sets. For applications where exact distance ordering is less critical (such as product recommendations), this mode offers an excellent balance of performance and result quality.

Enhanced cost estimation and query planning

Cost estimation in PostgreSQL pertains to how the database predicts the computational resources (primarily CPU time and memory) required to execute a query. The query planner relies on these cost estimates to determine the most efficient execution path. The query planning with pgvector 0.8.0 demonstrates significant improvements in cost estimation accuracy and planning decisions. These enhancements empower PostgreSQL to make smarter choices regarding when to utilize vector indexes versus sequential scans, resulting in faster query execution, especially for complex queries that combine vector similarity with traditional filters.

To illustrate this, let’s examine the EXPLAIN output for a filtered query (Query C) from both versions. The following code represents the pgvector 0.7.4 query plan (category filter):

SELECT 
    product_name, 
    category, 
    embedding  '[vector for "comfortable hiking boots"]' AS distance
FROM products 
WHERE category = 'outdoor gear'
ORDER BY distance
LIMIT 20;

The following code is the pgvector 0.8.0 query plan with iterative_scan=relaxed_order:

SELECT 
    product_name, 
    category, 
    embedding  '[vector for "comfortable hiking boots"]' AS distance
FROM products 
WHERE category = 'outdoor gear'
ORDER BY distance
LIMIT 20;

These query plans reveal several key improvements in 0.8.0:

More realistic startup costs – The 0.8.0 planner estimates a startup cost of 7,224.63 cost units versus only 116.84 cost units in 0.7.4, which much better reflects the actual computational complexity of vector operations.
Better row estimation – The 0.8.0 planner estimates 1,017,000 filtered rows compared to 987,333 in 0.7.4, indicating a more accurate assessment of the filter’s selectivity.
Complete results – Most importantly, 0.8.0 returns the 10 requested rows, whereas 0.7.4 only found 6.
Efficient use of indexes – With the addition of a category index, both versions can efficiently filter results, but 0.8.0 is more thorough in its index traversal due to iterative scanning.

For complex filters (Query D), the differences are even more pronounced. The following code is the pgvector 0.7.4 query plan (complex filter):

SELECT 
    product_name, 
    category, 
    embedding  '[vector for "comfortable hiking boots"]' AS distance
FROM products 
WHERE category = 'outdoor gear'
ORDER BY distance
LIMIT 20;

The following code is the pgvector 0.8.0 query plan with iterative_scan=relaxed_order:

SELECT 
    product_name, 
    category, 
    embedding  '[vector for "comfortable hiking boots"]' AS distance
FROM products 
WHERE category = 'outdoor gear'
ORDER BY distance
LIMIT 20;

The key difference here is that while 0.7.4 stops after finding only 39 rows (despite requesting 100), the 0.8.0 planner with iterative scanning continues searching until it locates the 100 requested rows, achieving even better runtime. These examples illustrate how the enhanced cost estimation in pgvector 0.8.0 leads to superior execution strategies, particularly when combining vector searches with traditional database filters. The more accurate cost model assists the PostgreSQL optimizer in making smarter decisions about execution paths, resulting in both improved performance and complete result sets.

Scaling to production workloads

The Amazon Aurora I/O-Optimized cluster configuration provides enhanced price-performance and predictable pricing for I/O-intensive workloads, including e-commerce services, payment processing systems, recommendation systems, and RAG applications. This configuration enhances I/O performance with Aurora Optimized Reads through improved buffer cache management, increasing write throughput and lowering latency. For dynamic or variable workloads, Amazon Aurora Serverless v2 offers a production-ready, auto-scaling option that adjusts capacity in fine-grained increments—ideal for quick starts and elastic scaling without compromising performance or availability.

The capability of Aurora PostgreSQL-Compatible to scale read capacity through read replicas, combined with pgvector 0.8.0’s more efficient query processing, provides a robust foundation for enterprise-scale e-commerce applications. Businesses can now confidently develop semantic search, recommendation systems, and RAG applications that maintain high performance and result quality even as their product catalogs expand into millions or billions of vectors.

Semantic search systems

A semantic search use case might encompass product search, document retrieval, and content recommendation. Version 0.8.0 excels in the following aspects:

The noticeable speed improvements (up to 9.4 times faster for basic queries) facilitate real-time search experiences.
The relaxed_order mode is ideal for search interfaces where slight variations in result ordering are imperceptible to users.
Improved filtered queries (Queries C and D) enhance faceted or category-filtered search implementations.
Complete result sets ensure users see the most relevant items, unlike version 0.7.4, which often overlooked key results.

An example implementation might involve e-commerce product search where users expect sub-second results with filtering by product attributes.

Large-scale recommendation systems

A recommendation use case might include content recommendation, “similar items” features, and personalization. Version 0.8.0 provides the following advantages:

Much faster retrieval of larger result sets (Queries B and E) enables systems to fetch more candidates for postprocessing.
Lower latency supports real-time recommendations in high-traffic systems.
The performance on filtered queries bolsters contextual recommendations (for example, “similar products in this category”).
Enhanced recall fosters diversity in recommendations.

An example implementation might involve media streaming services that need to recommend thousands of content items from a catalog of millions in real time.

RAG applications

A RAG use case might involve AI systems that retrieve relevant context before generating responses. Version 0.8.0 offers the following enhancements:

Lower latency improves end-to-end response time for AI systems.
Better performance on filtered queries enables domain-specific retrieval.
Complete result sets ensure the AI has access to the relevant context.
Relaxed ordering is ideal because RAG typically employs top-k retrieval where exact ordering isn’t critical.

An example implementation might involve enterprise AI assistants that query company knowledge bases to answer user inquiries.

Get started with pgvector 0.8.0 on Aurora PostgreSQL-Compatible

To begin utilizing pgvector 0.8.0, follow these steps:

Launch a new Aurora PostgreSQL cluster running versions 17.4, 16.8, 15.12, 14.17, or 13.20 and higher.
Connect to your DB cluster.
After establishing a connection to your database, enable the extension:

CREATE EXTENSION IF NOT EXISTS vector;

Confirm you’re running the latest version for pgvector:

SELECT 
    product_name, 
    category, 
    embedding  '[vector for "comfortable hiking boots"]' AS distance
FROM products 
WHERE category = 'outdoor gear'
ORDER BY distance
LIMIT 20;

Best Practices for pgvector 0.8.0 on Aurora PostgreSQL-Compatible

When deploying pgvector 0.8.0 in production, consider the following best practices to balance performance, recall, and filtering accuracy:

If you don’t need a vector index, don’t use it – For 100% recall and good performance with smaller datasets, a sequential scan might be more appropriate than a vector index. Only utilize vector indexes when you require the performance benefits for large datasets.

For instance, if you have a table with only 10,000 product embeddings, a sequential scan might actually be faster than using a vector index:

SELECT 
    product_name, 
    category, 
    embedding  '[vector for "comfortable hiking boots"]' AS distance
FROM products 
WHERE category = 'outdoor gear'
ORDER BY distance
LIMIT 20;

Creating vector indexes incurs overhead for maintenance and storage, which only becomes beneficial when your dataset grows large enough that sequential scans become prohibitively expensive.

Indexing recommendations
1. Use HNSW with recommended parameters to ensure high search quality and efficient index construction:

-- Enable iterative scanning
SET hnsw.iterative_scan = 'relaxed_order';

-- Run the same query
SELECT 
    product_name, 
    category, 
    embedding  '[vector for "comfortable hiking boots"]' AS distance
FROM products 
WHERE category = 'outdoor gear'
ORDER BY distance
LIMIT 20;

1. Create additional indexes on commonly filtered metadata columns (e.g., category, status, org_id) to enhance performance during post-vector-filtering:

CREATE INDEX my_table_category_idx ON my_table(category);

Query-time tuning (search parameters)

Depending on your use case, adjust these parameters to optimize for recall or performance:

1. For maximum recall with filtering (such as strict compliance or analytical use cases):

-- Enable iterative scanning
SET hnsw.iterative_scan = 'relaxed_order';

-- Run the same query
SELECT 
    product_name, 
    category, 
    embedding  '[vector for "comfortable hiking boots"]' AS distance
FROM products 
WHERE category = 'outdoor gear'
ORDER BY distance
LIMIT 20;

1. For best performance (e.g., interactive or latency-sensitive workloads):

-- Enable iterative scanning
SET hnsw.iterative_scan = 'relaxed_order';

-- Run the same query
SELECT 
    product_name, 
    category, 
    embedding  '[vector for "comfortable hiking boots"]' AS distance
FROM products 
WHERE category = 'outdoor gear'
ORDER BY distance
LIMIT 20;

1. For balanced scenarios (e.g., general-purpose retrieval):

-- Enable iterative scanning
SET hnsw.iterative_scan = 'relaxed_order';

-- Run the same query
SELECT 
    product_name, 
    category, 
    embedding  '[vector for "comfortable hiking boots"]' AS distance
FROM products 
WHERE category = 'outdoor gear'
ORDER BY distance
LIMIT 20;

These recommendations are domain-agnostic and should be tailored to your workload. As a general guideline:

Use strict_order when completeness is critical.
Use relaxed_order when latency is more important than recall.
Tune ef_search higher for complex filtering or larger graphs.

Additionally, consider the following operational best practices:

Graviton4-based instances (R8g series) – These instances deliver excellent vector operation performance. Start with r8g.large for development and testing, and scale to r8g.2xlarge or 4xlarge for production workloads.
Balance memory and performance – Higher values of hnsw.ef_search yield more accurate results but consume more memory.
Index your filter columns – Create standard PostgreSQL indexes on columns used in WHERE clauses.
Monitor and tune – Utilize Amazon CloudWatch Database Insights to identify and optimize slow vector queries.
Consider partitioning for very large tables – For billions of vectors, table partitioning can enhance both query performance and manageability.
Configure iterative scanning appropriately – Start with relaxed_order and adjust the threshold based on your application’s needs.

Tech Optimizer