跳转到主要内容
Chinese, Simplified

category

代理检索增强生成(RAG)应用程序代表了人工智能中的一种先进方法,它将基础模型(FM)与外部知识检索和自主代理功能集成在一起。这些系统动态访问和处理信息,分解复杂任务,使用外部工具,应用推理,并适应各种环境。它们通过执行多步骤流程、做出决策和生成复杂的输出,超越了简单的问答。

在这篇文章中,我们演示了一个使用LlamaIdex框架构建代理RAG应用程序的示例。LlamaIndex是一个将FM与外部数据源连接起来的框架。它有助于从数据库、API、PDF等中摄取、构建和检索信息,为AI应用程序启用代理和RAG。

该应用程序作为一种研究工具,使用亚马逊基岩上的Mistral Large 2 FM生成试剂流的响应。示例应用程序与知名网站(如Arxiv、GitHub、TechCrunch和DuckDuckGo)交互,并可以访问包含文档和内部知识的知识库。

该应用程序可以进一步扩展,以适应更广泛的用例,这些用例需要与内部和外部API进行动态交互,以及集成内部知识库,为用户查询提供更多上下文感知的响应。

解决方案概述

此解决方案使用LlamaIdex框架构建一个包含两个主要组件的代理流:AgentRunner和AgentWorker。AgentRunner充当协调器,管理对话历史、创建和维护任务、执行任务步骤,并为交互提供用户友好的界面。AgentWorker处理逐步推理和任务执行。

为了进行推理和任务规划,我们在亚马逊基岩上使用Mistral Large 2。您可以使用Amazon Bedrock提供的其他文本生成功能模块。有关支持的模型的完整列表,请参阅Amazon Bedrock中支持的基础模型。该代理与GitHub、arXiv、TechCrunch和DuckDuckGo API集成,同时还通过RAG框架访问内部知识,以提供上下文感知的答案。

在此解决方案中,我们提出了构建RAG框架的两种选择:

  • 与Amazon OpenSearch Serverless的文档集成——第一种选择涉及使用LlamaIndex以编程方式加载和处理文档。它使用各种分块策略将文档分成块,然后将这些块存储在Amazon OpenSearch无服务器矢量存储中以供将来检索。
  • 与亚马逊基岩知识库的文档集成——第二种选择使用亚马逊基岩知识数据库,这是一种完全托管的服务,可以处理文档的加载、处理和分块。此服务可以通过几次配置和点击为您快速创建新的矢量存储。您可以从Amazon OpenSearch无服务器、Amazon Aurora PostgreSQL兼容版无服务器和Amazon Neptune Analytics中进行选择。此外,该解决方案还包括文档检索重新排序功能,以增强响应的相关性。

您可以选择最适合您的偏好和开发人员技能水平的RAG实现选项。

下图说明了解决方案架构。

In the following sections, we present the steps to implement the agentic RAG application. You can also find the sample code in the GitHub repository.

Prerequisites

The solution has been tested in the AWS Region us-west-2. Complete the following steps before proceeding:

  1. Set up the following resources:
    1. Create an Amazon SageMaker
    2. Create a SageMaker domain user profile.
    3. Launch Amazon SageMaker Studio, select JupyterLab, and create a space.
    4. Select the instance t3.medium and the image SageMaker Distribution 2.3.1, then run the space.
  2. Request model access:
    1. On the Amazon Bedrock console, choose Model access in the navigation pane.
    2. Choose Modify model access.
    3. Select the models Mistral Large 2 (24.07), Amazon Titan Text Embeddings V2, and Rerank 1.0 from the list, and request access to these models.
  3. Configure AWS Identity and Access Management (IAM) permissions:
    1. In the SageMaker console, go to the SageMaker user profile details and find the execution role that the SageMaker notebook uses. It should look likeAmazonSageMaker-ExecutionRole-20250213T123456.
  4. In the IAM console, create an inline policy for this execution role. that your role can perform the following actions:
    1. Access to Amazon Bedrock services including:
      • Reranking capabilities
      • Retrieving information
      • Invoking models
      • Listing available foundation models
    2. IAM permissions to:
      • Create policies
      • Attach policies to roles within your account
    3. Full access to Amazon OpenSearch Serverless service
  5. Run the following command in the JupyterLab notebook terminal to download the sample code from GitHub:
git init
git remote add origin https://github.com/aws-samples/mistral-on-aws.git
git sparse-checkout init
git sparse-checkout set "notebooks/mistral-llamaindex-agentic-rag"
git pull origin main
  1. Finally, install the required Python packages by running the following command in the terminal:
cd mistral-llamaindex-agentic-rag
pip install -r requirements.txt

Initialize the models

Initialize the FM used for orchestrating the agentic flow with Amazon Bedrock Converse API. This API provides a unified interface for interacting with various FMs available on Amazon Bedrock. This standardization simplifies the development process, allowing developers to write code one time and seamlessly switch between different models without adjusting for model-specific differences. In this example, we use the Mistral Large 2 model on Amazon Bedrock.

Next, initialize the embedding model from Amazon Bedrock, which is used for converting document chunks into embedding vectors. For this example, we use Amazon Titan Text Embeddings V2. See the following code:

# Initialise and configure the BedrockConverse LLM with the Mistral Large 2 model and set it as the default in Settings

from llama_index.llms.bedrock_converse import BedrockConverse
from llama_index.core import Settings
llm = BedrockConverse(model="mistral.mistral-large-2407-v1:0", max_tokens = 2048)
Settings.llm = BedrockConverse(model="mistral.mistral-large-2407-v1:0", max_tokens = 2048)

# Initialise and configure the embedding model with Amazon Titan Text Embeddings V2, and set it as the default in Settings

from llama_index.embeddings.bedrock import BedrockEmbedding
embed_model = BedrockEmbedding(model_name="amazon.titan-embed-text-v2:0")
Settings.embed_model = BedrockEmbedding(model_name="amazon.titan-embed-text-v2:0")

Integrate API tools

Implement two functions to interact with the GitHub and TechCrunch APIs. The APIs shown in this post don’t require credentials. To provide clear communication between the agent and the foundation model, follow Python function best practices, including:

  • Type hints for parameter and return value validation
  • Detailed docstrings explaining function purpose, parameters, and expected returns
  • Clear function descriptions

The following code sample shows the function that integrates with the GitHub API. After the function is created, use the FunctionTool.from_defaults() method to wrap the function as a tool and integrate it seamlessly into the LlamaIndex workflow.

See the code repository for the full code samples of the function that integrates with the TechCrunch API.

# Define a function to search GitHub repositories by topic, sorting by stars or update date, and return top results
import requests
def github_search(topic: str, num_results: int = 3, sort_by: str = "stars") -> list:
    """
    Retrieve a specified number of GitHub repositories based on a given topic, 
    ranked by the specified criteria.

    This function uses the GitHub API to search for repositories related to a 
    specific topic or keyword. The results can be sorted by the number of stars 
    (popularity) or the most recent update, with the most relevant repositories 
    appearing first according to the chosen sorting method.

    Parameters:
    -----------
    topic : str
        The topic or keyword to search for in GitHub repositories.
        The topic cannot contain blank spaces.
    num_results : int, optional
        The number of repository results to retrieve. Defaults to 3.
    sort_by : str, optional
        The criterion for sorting the results. Options include:
        - 'stars': Sort by the number of stars (popularity).
        - 'updated': Sort by the date of the last update (most recent first).
        Defaults to 'stars'.

    Returns:
    --------
    list
        A list of dictionaries, where each dictionary contains information 
        about a repository. Each dictionary includes:
        - 'html_url': The URL of the repository.
        - 'description': A brief description of the repository.
        - 'stargazers_count': The number of stars (popularity) the repository has.
    """

    url = f"https://api.github.com/search/repositories?q=topic:{topic}&sort={sort_by}&order=desc"
    response = requests.get(url).json()
    code_repos = [
        {
            'html_url': item['html_url'],
            'description': item['description'],
            'stargazers_count': item['stargazers_count'],
        }
        for item in response['items'][:num_results]
    ]
    return code_repos

github_tool = FunctionTool.from_defaults(fn=github_search)

For arXiv and DuckDuckGo integration, we use LlamaIndex’s pre-built tools instead of creating custom functions. You can explore other available pre-built tools in the LlamaIndex documentation to avoid duplicating existing solutions.

# Import and configure the ArxivToolSpec and DuckDuckGoSearchToolSpec from LlamaIndex prebuilt tools

from llama_index.tools.arxiv import ArxivToolSpec
from llama_index.tools.duckduckgo import DuckDuckGoSearchToolSpec

arxiv_tool = ArxivToolSpec()
search_tool = DuckDuckGoSearchToolSpec()

api_tools = arxiv_tool.to_tool_list() + search_tool.to_tool_list()

# Consolidate all tools into one list. 
api_tools.extend([news_tool, github_tool])

RAG option 1: Document integration with Amazon OpenSearch Serverless

Next, programmatically build the RAG component using LlamaIndex to load, process, and chunk documents.  store the embedding vectors in Amazon OpenSearch Serverless. This approach offers greater flexibility for advanced scenarios, such as loading various file types (including .epub and .ppt) and selecting advanced chunking strategies based on file types (such as HTML, JSON, and code).

Before moving forward, you can download some PDF documents for testing from the AWS website using the following command, or you can use your own documents. The following documents are AWS guides that help in choosing the right generative AI service (such as Amazon Bedrock or Amazon Q) based on use case, customization needs, and automation potential. They also assist in selecting AWS machine learning (ML) services (such as SageMaker) for building models, using pre-trained AI, and using cloud infrastructure.

# download test documents from below links
!wget -O docs/genai_on_aws.pdf https://docs.aws.amazon.com/pdfs/decision-guides/latest/generative-ai-on-aws-how-to-choose/
generative-ai-on-aws-how-to-choose.pdf?did=wp_card&trk=wp_card#guide
!wget -O docs/ml_on_aws.pdf https://docs.aws.amazon.com/pdfs/decision-guides/latest/machine-learning-on-aws-how-to-choose/
machine-learning-on-aws-how-to-choose.pdf?did=wp_card&trk=wp_card#guide

Load the PDF documents using SimpleDirectoryReader() in the following code. For a full list of supported file types, see the LlamaIndex documentation.

# use Llamaindex to load documents 
from llama_index.core import SimpleDirectoryReader
loader = SimpleDirectoryReader('docs/')
documents = loader.load_data()

Next, create an Amazon OpenSearch Serverless collection as the vector database. Check the utils.py file for details on the create_collection() function.

# Create Amazon OpenSearch Serverless collection 
from utils import *
import sagemaker 
import random

region_name = "us-west-2"
suffix = random.randrange(1, 500)
collection_name = "llamaindex-blog-"+str(suffix)
notebook_execution_role = sagemaker.get_execution_role()
endpoint = create_collection(collection_name, notebook_execution_role)

After you create the collection, create an index to store embedding vectors:

## create an index in the collection
index_name = "pdf-rag"
create_index(index_name, endpoint, emb_dim=1024)

Next, use the following code to implement a document search system using LlamaIndex integrated with Amazon OpenSearch Serverless. It first sets up AWS authentication to securely access OpenSearch Service, then configures a vector client that can handle 1024-dimensional embeddings (specifically designed for the Amazon Titan Embedding V2 model). The code processes input documents by breaking them into manageable chunks of 1,024 tokens with a 20-token overlap, converts these chunks into vector embeddings, and stores them in the OpenSearch Serverless vector index. You can select a different or more advanced chunking strategy by modifying the transformations parameter in the VectorStoreIndex.from_documents() method. For more information and examples, see the LlamaIndex documentation.

import boto3
from llama_index.vector_stores.opensearch import  OpensearchVectorStore,   OpensearchVectorClient
from opensearchpy import RequestsHttpConnection, AWSV4SignerAuth
from llama_index.core import VectorStoreIndex, StorageContext
from llama_index.core.node_parser import SentenceSplitter

## integrate Amazon OpenSearch Serverless collection and index to llamaindex 

dim = 1024 # Amazon Titan Embedding V2 model dimension 
service = 'aoss'
credentials = boto3.Session().get_credentials()
awsauth = AWSV4SignerAuth(credentials, region_name, service)

client = OpensearchVectorClient(
    endpoint, 
    index_name, 
    dim, 
    embedding_field="vector", 
    text_field="chunk",
    http_auth=awsauth,
    use_ssl=True,
    verify_certs=True,
    connection_class=RequestsHttpConnection,
)

# initialise vector store and save document chunks to the vector store 
vector_store = OpensearchVectorStore(client)
storage_context = StorageContext.from_defaults(vector_store=vector_store)

index = VectorStoreIndex.from_documents(
    documents, 
    storage_context=storage_context,
    transformations=[SentenceSplitter(chunk_size=1024, chunk_overlap=20)]
)

You can add a reranking step in the RAG pipeline, which improves the quality of information retrieved by making sure that the most relevant documents are presented to the language model, resulting in more accurate and on-topic responses:

from llama_index.postprocessor.bedrock_rerank import AWSBedrockRerank
reranker = AWSBedrockRerank(
    top_n=3,
    model_id="amazon.rerank-v1:0",#  another rerank model option is: cohere.rerank-v3-5:0
    region_name="us-west-2",
)
query_engine = index.as_query_engine(
    similarity_top_k=10,
    node_postprocessors=[reranker],
)
Use the following code to test the RAG framework. You can compare results by enabling or disabling the reranker model.
response = query_engine.query(
    "In which situation should I use Amazon Bedrock over Amazon SageMaker?",
)

Next, convert the vector store into a LlamaIndex QueryEngineTool, which requires a tool name and a comprehensive description. This tool is then combined with other API tools to create an agent worker that executes tasks in a step-by-step manner. The code initializes an AgentRunner to orchestrate the entire workflow, analyzing text inputs and generating responses. The system can be configured to support parallel tool execution for improved efficiency.

# create QueryEngineTool based on the OpenSearch vector store 

from llama_index.core.tools import QueryEngineTool, ToolMetadata
oss_tool = QueryEngineTool(
        query_engine=query_engine,
        metadata=ToolMetadata(
            name="oss_guide_tool",
            description="""
            These decision guides help users select appropriate AWS machine learning and generative AI services based on specific needs. 
            They cover pre-built solutions, customizable platforms, and infrastructure options for ML workflows, 
            while outlining how generative AI can automate processes, personalize content, augment data, reduce costs, 
            and enable faster experimentation in various business contexts.""",
        ),
    )

all_tools = api_tools +[oss_tool]

agent_worker = FunctionCallingAgentWorker.from_tools(
    all_tools, 
    llm=llm, 
    verbose=True, # Set verbose=True to display the full trace of steps. 
    system_prompt = system_prompt,
    # allow_parallel_tool_calls = True  # Uncomment this line to allow multiple tool invocations
)
agent = AgentRunner(agent_worker)
response = agent.chat(text_input)

You have now completed building the agentic RAG application using LlamaIndex and Amazon OpenSearch Serverless. You can test the chatbot application with your own questions. For example, ask about the latest news and features regarding Amazon Bedrock, or inquire about the latest papers and most popular GitHub repositories related to generative AI.

RAG option 2: Document integration with Amazon Bedrock Knowledge Bases

In this section, you use Amazon Bedrock Knowledge Bases to build the RAG framework. You can create an Amazon Bedrock knowledge base on the Amazon Bedrock console or follow the provided notebook example to create it programmatically. Create a new Amazon Simple Storage Service (Amazon S3) bucket for the knowledge base, then upload the previously downloaded files to this S3 bucket. You can select different embedding models and chunking strategies that work better for your data. After you create the knowledge base, remember to sync the data. Data synchronization might take a few minutes.

To enable your newly created knowledge base to invoke the rerank model, you need to modify its permissions. First, open the Amazon Bedrock console and locate the service role that matches the one shown in the following screenshot.

Choose the role and add the following provided IAM permission policy as an inline policy. This additional authorization grants your knowledge base the necessary permissions to successfully invoke the rerank model on Amazon Bedrock.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "bedrock:InvokeModel",
            "Resource": "arn:aws:bedrock:us-west-2::foundation-model/amazon.rerank-v1:0"
        },
        {
            "Effect": "Allow",
            "Action": "bedrock:Rerank",
            "Resource": "*"
        }
    ]
}

Use the following code to integrate the knowledge base into the LlamaIndex framework. Specific configurations can be provided in the retrieval_config parameter, where numberOfResults is the maximum number of retrieved chunks from the vector store, and overrideSearchType has two valid values: HYBRID and SEMANTIC. In the rerankConfiguration, you can optionally provide a rerank modelConfiguration and numberOfRerankedResults to sort the retrieved chunks by relevancy scores and select only the defined number of results. For the full list of available configurations for retrieval_config, refer to the Retrieve API documentation.

# Configure a knowledge base retriever using AmazonKnowledgeBasesRetriever
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.retrievers.bedrock import AmazonKnowledgeBasesRetriever

# maximum number of relevant text chunks that will be retrieved
# If you need quick, focused answers: lower numbers (1-3)
# If you need detailed, comprehensive answers: higher numbers (5-10)
top_k = 10

# search mode options: HYBRID, SEMANTIC
# HYBRID search combines the strengths of semantic search and keyword search 
# Balances semantic understanding with exact matching
# https://docs.llamaindex.ai/en/stable/examples/retrievers/bedrock_retriever/
search_mode = "HYBRID"

kb_retriever = AmazonKnowledgeBasesRetriever(
    knowledge_base_id=knowledge_base_id,
    retrieval_config={
        "vectorSearchConfiguration": {
            "numberOfResults": top_k,
            "overrideSearchType": search_mode,
            'rerankingConfiguration': {
                'bedrockRerankingConfiguration': {
                    'modelConfiguration': {
                        'modelArn': 'arn:aws:bedrock:us-west-2::foundation-model/amazon.rerank-v1:0'
                    },
                    'numberOfRerankedResults': 3
                },
                'type': 'BEDROCK_RERANKING_MODEL'
            }
        },
        
    }
)
kb_engine = RetrieverQueryEngine(retriever=kb_retriever)

Like the first option, you can create the knowledge base as a QueryEngineTool in LlamaIndex and combine it with other API tools. Then, you can create a FunctionCallingAgentWorker using these combined tools and initialize an AgentRunner to interact with them. By using this approach, you can chat with and take advantage of the capabilities of the integrated tools.

# Create a query tool for Bedrock Knowledge Base
kb_tool = QueryEngineTool(
        query_engine=kb_engine,
        metadata=ToolMetadata(
            name="kb_tool",
            description="""
            These decision guides help users select appropriate AWS machine learning and generative AI services based on specific needs. 
            They cover pre-built solutions, customizable platforms, and infrastructure options for ML workflows, 
            while outlining how generative AI can automate processes, personalize content, augment data, reduce costs, 
            and enable faster experimentation in various business contexts.""",
        ),
    )

# Update the agent to include all API tools and the Knowledge Base tool.
all_tools = api_tools +[kb_tool]

agent_worker = FunctionCallingAgentWorker.from_tools(
    all_tools, 
    llm=llm, 
    verbose=True, # Set verbose=True to display the full trace of steps. 
    system_prompt = system_prompt,
    # allow_parallel_tool_calls = True  # Uncomment this line to allow multiple tool invocations
)
agent = AgentRunner(agent_worker)
response = agent.chat(text_input)

Now you have built the agentic RAG solution using LlamaIndex and Amazon Bedrock Knowledge Bases.

清理

当您完成此解决方案的实验后,请使用以下步骤清理AWS资源,以避免不必要的成本:

  • 在Amazon S3控制台中,删除为此解决方案创建的S3存储桶和数据。
  • 在OpenSearch服务控制台中,删除为存储嵌入向量而创建的集合。
  • 在Amazon基岩知识库控制台中,删除您创建的知识库。
  • 在SageMaker控制台中,导航到您的域和用户配置文件,然后启动SageMaker Studio以停止或删除JupyterLab实例。

结论

这篇文章演示了如何使用LlamaIndex和Amazon Bedrock构建一个强大的代理RAG应用程序,超越了传统的问答系统。通过将Mistral Large 2作为编排模型与外部API(GitHub、arXiv、TechCrunch和DuckDuckGo)和内部知识库集成,您创建了一个多功能的技术发现和研究工具。

我们向您展示了实现RAG框架的两种互补方法:一种是使用LlamaIndex和Amazon OpenSearch Serverless的程序化实现,为高级用例提供了最大的灵活性;另一种是一种使用Amazon Bedrock知识库的托管解决方案,它以最少的配置简化了文档处理和存储。您可以使用以下代码示例尝试该解决方案。

有关更多相关信息,请参阅亚马逊基岩、亚马逊基岩知识库、亚马逊OpenSearch无服务器和在亚马逊基岩中使用重新分级模型。请参阅亚马逊基岩中的Mistral AI,了解亚马逊基岩和AWS Marketplace上提供的最新Mistral型号。

本文地址
最后修改
星期一, 九月 22, 2025 - 18:42
Tags
 
Article