How LeadSquared accelerated chatbot deployments with generative AI using Amazon Bedrock and Amazon Aurora PostgreSQL | Amazon Web Services

LeadSquared is a cutting-edge SaaS CRM platform offering comprehensive sales, marketing, and onboarding solutions. Designed for industries like BFSI, healthcare, education, real estate, and more, LeadSquared delivers a customized approach for businesses of all sizes. The LeadSquared Service CRM extends beyond basic ticketing by providing centralized support through their Converse omnichannel communications platform, personalized interactions, AI-driven ticket routing, and data-driven insights.

Within Service CRM, the Converse multi-channel messaging platform enables real-time conversations with leads and customers via WhatsApp, SMS, and chatbots. Users of Converse sought quicker onboarding for chatbot functionality and more relevant responses based on their business data. Accelerating this onboarding process posed several challenges, including training the bot to handle frequently asked questions, understanding the customer domain, identifying high-volume yet low-value queries, and managing dialogue effectively. Insufficient training data could lead to chatbots misinterpreting user intent and struggling with unexpected questions. Additionally, dialogue management required careful consideration of user context in responses.

LeadSquared adopted an approach using a large language model (LLM) augmented with customer-specific data to enhance chatbot response quality and streamline the onboarding process. The solution was built using a combination of an Amazon Aurora PostgreSQL-Compatible Edition database, the pgvector extension of PostgreSQL, the supported LLMs in Amazon Bedrock, and Retrieval Augmented Generation (RAG). LeadSquared already uses Amazon Aurora to store essential data, making it logical to use Aurora for storing vector embeddings and performing hybrid searches as needed. Amazon Bedrock offers foundation models (FMs) from Amazon and leading AI startups through an API, allowing LeadSquared to experiment with and select the best models for their needs. The API-based pricing model of Amazon Bedrock provided an affordable scaling solution without the burden of managing the infrastructure for hosting LLMs independently.

The solution retrieves data from outside the language model, such as videos, help documents, case history, existing FAQs, and their knowledge base. It enhances prompts that improve the LLM user responses by incorporating pertinent retrieved data in context. Consequently, LeadSquared can now offer easier chatbot setup, provide a more personalized experience based on end customer-specific data, better understand user intent, improve dialogue management, and automate repetitive tasks.

Prashant Singh, COO and Cofounder at LeadSquared, says, “The integration of RAG capabilities using Amazon Aurora PostgreSQL with the pgvector extension and LLMs available in Amazon Bedrock has empowered our chatbots to deliver natural language responses to out-of-domain inquiries, enhanced dialogue management, and reduced our manual efforts. Consequently, we have observed a 20% improvement in customer onboarding times.”

This post showcases how to build a chatbot similar to LeadSquare’s chatbot. We demonstrate how to use domain-specific knowledge from multiple document formats, including PDFs, videos, text files, and presentations, to produce more effective textual prompts for the underlying generative AI models. Additionally, we show how to code such a chatbot using the pgvector extension and LLMs available in Amazon Bedrock. The built-in PostgreSQL pgvector extension in Aurora facilitates the storage of vector embeddings, empowering the system with semantic search capabilities. Simultaneously, the Amazon Bedrock APIs play a dual role: generating vector embeddings and furnishing pertinent responses for user queries with domain-specific knowledge.

Solution overview

Suppose that your company deploys a Q&A bot on your website. When a customer reaches out with a specific query or issue, the bot retrieves relevant information from your customer database, product manuals, FAQs, and previous support interactions. Your chatbot application uses this data to build a detailed prompt that includes the relevant context. Then, an LLM can use this prompt to generate a coherent and contextually appropriate response that incorporates your business data. This two-step mechanism for producing better results rather than feeding the user’s prompt directly to the LLM is known as Retrieval Augmented Generation (RAG).

The well-known LLMs are trained on general bodies of knowledge, making them less effective for domain-specific tasks. For specialized knowledge-intensive tasks, you can build a language model-based system that accesses knowledge sources beyond the original training data for the LLM. This approach enables more factual consistency, improves the reliability of the generated responses, and helps reduce the problem of hallucination.

Let’s look at the RAG mechanism used in this post in more detail.

Retrieval

RAG first retrieves relevant text from a knowledge base, like Aurora, using similarity search. The text produced in this step contains information related to the user’s query, although not necessarily the exact answer. For example, this first step might retrieve the most relevant FAQ entries, documentation sections, or previous support cases based on the customer’s question. The external data can come from sources such as document repositories, databases, or APIs.

This post uses the pgvector extension to do the initial retrieval step for the RAG technique. pgvector is an open-source extension for PostgreSQL that adds the ability to efficiently store and rapidly search machine learning (ML)-generated vector embeddings representing textual data. It’s designed to work with other PostgreSQL features, including indexing and querying. In an application such as a Q&A bot, you might transform the documents from your knowledge base and frequently asked questions and store them as vectors.

In this post, knowledge resources encompassing various formats like PDFs, texts, videos are transformed into vectorized representations using the Amazon Titan Text Embeddings model. These embeddings are stored within Amazon Aurora PostgreSQL, facilitated by the pgvector extension for subsequent stages.

Generation

After the relevant text is retrieved, RAG uses an LLM to generate a coherent and contextually relevant response. The prompt for the LLM includes the retrieved information and the user’s question. The LLM generates human-like text that’s tailored to the combination of general knowledge from the FM and the domain-specific information retrieved from your knowledge base.

The following diagram illustrates the process.

The steps are as follows:

The first step in the process involves transforming documents and user queries into a standardized format that enables effective search and comparison. This transformation process, known as embedding, converts textual data into semantical equivalents numerical arrays (vectors).
Next, the user submits a query to the application. This orchestrates the entire workflow.
The application generates an embedding for the user query. The same embeddings model used in Step 1 should be used to generate embeddings of the user query in Step 3.
The embedding is then compared to the embeddings representing documents from the knowledge base. This comparison identifies documents that are the most relevant to the query. The application extracts snippets from these documents and appends them to the original query.
This augmented prompt is enriched with relevant context from the knowledge base, chat history, and user profile.
The augmented prompt is forwarded to the LLM to generate the response that the user sees. The advantages of using Amazon Bedrock are that the LLM endpoints are always available.

You can keep the system up to date by updating the knowledge base and its corresponding embeddings as needed to provide accurate responses based on the updated data.

Solution architecture

The following diagram shows the technical components involved in the sample chatbot application and how data flows through the system.

The workflow steps are as follows:

You start from an already existing repository of knowledge resources. The formats can include PDFs, text documents, videos, and more.
With the Amazon Titan Text Embeddings G1 model from Amazon Bedrock, these resources are transformed into vector representations, maintaining their semantic for advanced processing.
The generated vector embeddings are stored within an Aurora PostgreSQL database using the pgvector capabilities for efficient vector storage and retrieval.
A user initiates the process by posing a question, for instance, “How can AWS support vector databases?”
The user’s question is translated (with the same model in Step 2) into its vector embeddings, facilitating subsequent computational comparisons with the embeddings generated from the knowledge base. A semantic search operation is run on the Aurora PostgreSQL database, employing the vectorized representations to identify chunks of knowledge resources with relevant information.
The extracted answers from the search are fed into the Amazon Bedrock model along with the user query. You can use any of the LLMs available in Amazon Bedrock. For demonstration purposes, we use the Anthropic Claude v2.1 model. You can also deploy your own models in Amazon SageMaker JumpStart and use its endpoint.
Using the enhanced context derived from the semantic search, the Claude v2.1 model generates a comprehensive response, which is subsequently delivered back to the user.

For pricing details for Amazon Bedrock, refer to Amazon Bedrock Pricing.

Prerequisites

Complete the following prerequisite steps before you deploy the application:

Set up an Aurora PostgreSQL cluster in a private subnet with pgvector support. The pgvector extension version 0.5.0 is available on Amazon Aurora PostgreSQL 15.4, 14.9, 13.12, 12.16, and higher in all AWS Regions. pgvector v0.5.0 supports Hierarchical Navigable Small World (HNSW) indexing, parallelization of ivfflat index builds, and improved performance of distance calculator functions. Refer to Extension versions for Amazon Aurora PostgreSQL for details about the pgvector versions supported.
Run the following command in the database to enable the pgvector extension:
```
CREATE EXTENSION vector;
```
Set up an Amazon Elastic Compute Cloud (Amazon EC2) instance in a public subnet. This instance acts as a client to deploy the application and to access the database. For instructions, refer to Tutorial: Get started with Amazon EC2 Linux instances. The EC2 instance must have a public IP and a permissive security group for the port to be used by the Streamlit application (default port: 8501).
Create an AWS Secrets Manager database secret for the application to access Aurora. For instructions, see Create an AWS Secrets Manager database secret.
To start using FMs in Amazon Bedrock, request access to the models you intend to use. We use the Claude v2.1 model for the application.
The EC2 instance requires access to the Aurora database cluster, Secrets Manager, and Amazon Bedrock. For the steps to create an AWS Identity and Access Management (IAM) role and attach specific policies granting access to those resources, refer to Creating a role to delegate permissions to an AWS service.
(Optional) Depending on the document formats you plan to use as input sources, set up the following prerequisites:
1. PPT or .docx as a source – Install LibreOffice in your EC2 machine, which will be used to access presentations or .docx file types.
2. Amazon S3 as a source – Create an Amazon Simple Storage Service (Amazon S3) bucket and attach a policy to the role you created earlier to allow access to this bucket. For instructions, refer to Create your first S3 bucket. You can upload the following example document to the S3 bucket under the documentEmbeddings/ prefix to use while testing the application.

Code walkthrough

The application code is written in Python 3.10 using LangChain. LangChain is a framework for developing applications powered by language models. LangChain helps you connect a language model to other sources of data, and also to interact with its environment.

The following is the code walkthrough of the main functions in the application.

Load the source data

The initial step is to identify the source. Depending on the specific source, you call the appropriate LangChain document loader function, as shown in the following code. To learn more about the document loader functions, refer to Document loaders.

from langchain.document_loaders import PyPDFLoader
loader = PyPDFLoader("workdocs-dg.pdf")
pages = loader.load_and_split()

from langchain.document_loaders import S3FileLoader
loader = S3FileLoader("", "Aurora+FAQs.txt")
data = loader.load()

from langchain.document_loaders import YoutubeLoader
loader = YoutubeLoader.from_youtube_url(
"https://www.youtube.com/watch?v=CnWuS-KNjPM&t=1s&ab_channel=AmazonWebServices", add_video_info=True
)
data = loader.load()

from langchain.document_loaders import UnstructuredPowerPointLoader
loader = UnstructuredPowerPointLoader("Containers on AWS.pptx")
data = loader.load()

from langchain.document_loaders import Docx2txtLoader
loader = Docx2txtLoader("Aurora+FAQ.docx")
data = loader.load()

Chunk the data

After you’ve loaded

Tech Optimizer