【多模态第一部】AWS中的生成式人工智能和多模式代理：释放金融市场新价值的关键

语言 Chinese, Simplified

SEO Title

Generative AI and multi-modal agents in AWS: The key to unlocking new value in financial markets

解决方案概述

下图说明了使用代理将生成式人工智能与多模态数据结合使用的概念架构。所涉及的步骤如下：

金融分析师通过聊天机器人等平台提出问题。
该平台使用一个框架来确定最合适的多模式代理工具来回答这个问题。
一旦被识别，平台就会运行与之前识别的工具链接的代码。
该工具根据财务分析师的要求生成财务数据分析。
在总结结果时，大型语言模型会检索并向财务分析师报告。

技术架构

多模式代理根据业务用户的自然语言提示编排各种工具，以生成见解。对于非结构化数据，该代理使用AWS Lambda函数和AI服务，如用于文档分析的Amazon Textract、用于语音识别的Amazon Transcribe、用于NLP的Amazon Comprehend和用于智能搜索的Amazon Kendra。对于结构化数据，代理使用SQL连接器和SQLAlchemy来分析数据库，其中包括Amazon Athena。该代理还利用Lambda中的Python和Amazon SageMaker SDK进行计算和定量建模。该代理还具有长期内存，用于在Amazon DynamoDB中存储提示和结果。多模式代理驻留在SageMaker笔记本中，并根据Streamlit UI中业务用户的英语提示协调这些工具。

The key components of the technical architecture are as follows:

Data storage and analytics – The quarterly financial earning recordings as audio files, financial annual reports as PDF files, and S&P stock data as CSV files are hosted on Amazon Simple Storage Service (Amazon S3). Data exploration on stock data is done using Athena.
Large language models – The large language models (LLMs) are available via Amazon Bedrock, SageMaker JumpStart, or an API.
Agents – We use LangChain’s agents for a non-predetermined chain of calls as user input to LLMs and other tools. In these types of chains, there is an agent that has access to a suite of tools. Each tool has been built for a specific task. Depending on the user input, the agent decides the tool or a combination of tools to call to answer the question. We created the following purpose-built agent tools for our scenario:
- Stocks Querying Tool – To query S&P stocks data using Athena and SQLAlchemy.
- Portfolio Optimization Tool – To build a portfolio based on the chosen stocks.
- Financial Information Lookup Tool – To search for financial earnings information stored in multi-page PDF files using Amazon Kendra.
- Python Calculation Tool – To use for mathematical calculations.
- Sentiment Analysis Tool – To identify and score sentiments on a topic using Amazon Comprehend.
- Detect Phrases Tool – To find key phrases in recent quarterly reports using Amazon Comprehend.
- Text Extraction Tool – To convert the PDF versions of quarterly reports to text files using Amazon Textract.
- Transcribe Audio Tool – To convert audio recordings to text files using Amazon Transcribe.

The agent memory that holds the chain of user interactions with the agent is saved in DynamoDB.

The following sections explain some of the primary steps with associated code. To dive deeper into the solution and code for all the steps shown here, refer to the GitHub repo.

Prerequisites

To run this solution, you must have an API key to an LLM such as Anthropic Claude v2, or have access to Amazon Bedrock foundation models.

To generate responses from structured and unstructured data using LLMs and LangChain, you need access to LLMs through either Amazon Bedrock, SageMaker JumpStart, or API keys, and to use databases that are compatible with SQLAlchemy. AWS Identity and Access Management (IAM) policies are also required, the details which you can find in the GitHub repo.

Key components of a multi-modal agent

There are a few key components components of the multi-modal agent:

Functions defined for tools of the multi-modal agent
Tools defined for the multi-modal agent
Long-term memory for the multi-modal agent
Planner-executor based multi-modal agent (defined with tools, LLMs, and memory)

In this section, we illustrate the key components with associated code snippets.

Functions defined for tools of the multi-modal agent

The multi-modal agent needs to use various AI services to process different types of data—text, speech, images, and more. Some of these functions may need to call AWS AI services like Amazon Comprehend to analyze text, Amazon Textract to analyze images and documents, and Amazon Transcribe to convert speech to text. These functions can either be called locally within the agent or deployed as Lambda functions that the agent can invoke. The Lambda functions internally call the relevant AWS AI services and return the results to the agent. This approach modularizes the logic and makes the agent more maintainable and extensible.

The following function defines how to calculate the optimized portfolio based on the chosen stocks. One way to convert a Python-based function to an LLM tool is to use the BaseTool wrapper.

class OptimizePortfolio(BaseTool):

name = "Portfolio Optimization Tool"
description = """
use this tool when you need to build optimal portfolio or for optimization of stock price.
The stock_ls should be a list of stock symbols, such as ['WWW', 'AAA', 'GGGG'].
"""

def _run(self, stock_ls: List):

session = boto3.Session(region_name=region_name)
athena_client = session.client('athena')

database=database_name
table=table_Name
...

The following is the code for Lambda calling the AWS AI service (Amazon Comprehend, Amazon Textract, Amazon Transcribe) APIs:

def SentimentAnalysis(inputString):
print(inputString)
lambda_client = boto3.client('lambda')
lambda_payload = {"inputString:"+inputString}
response=lambda_client.invoke(FunctionName='FSI-SentimentDetecttion',
InvocationType='RequestResponse',
Payload=json.dumps(inputString))
print(response['Payload'].read())
return response

Tools defined for the multi-modal agent

The multi-modal agent has access to various tools to enable its functionality. It can query a stocks database to answer questions on stocks. It can optimize a portfolio using a dedicated tool. It can retrieve information from Amazon Kendra, Amazon’s enterprise search service. A Python REPL tool allows the agent to run Python code. An example of the structure of the tools, including their names and descriptions, is shown in the following code. The actual tool box of this post has eight tools: Stocks Querying Tool, Portfolio Optimization Tool, Financial Information Lookup Tool, Python Calculation Tool, Sentiment Analysis Tool, Detect Phrases Tool, Text Extraction Tool, and Transcribe Audio Tool.

tools = [
Tool(
name="Financial Information Lookup Tool",
func=run_chain,
description="""
Useful for when you need to look up financial information using Kendra.
"""
),
Tool(
name="Sentiment Analysis Tool",
func=SentimentAnalysis,
description="""
Useful for when you need to analyze the sentiment of a topic.
"""
),
Tool(
name="Detect Phrases Tool",
func=DetectKeyPhrases,
description="""
Useful for when you need to detect key phrases in recent quaterly reports.
"""
),
...
]

Long-term memory for the multi-modal agent

The following code illustrates the configuration of long-term memory for the multi-modal agent. In this code, DynamoDB table is added as memory to store prompts and answers for future reference.

chat_history_table = dynamodb_table_name

chat_history_memory = DynamoDBChatMessageHistory(table_name=chat_history_table, session_id=chat_session_id)
memory = ConversationBufferMemory(memory_key="chat_history",
chat_memory=chat_history_memory, return_messages=True)

Planner-executor based multi-modal agent

The planner-executor based multi-modal agent architecture has two main components: a planner and an executor. The planner generates a high-level plan with steps required to run and answer the prompt question. The executor then runs this plan by generating appropriate system responses for each plan step using the language model with necessary tools. See the following code:

llm = ChatAnthropic(temperature=0, anthropic_api_key=ANTHROPIC_API_KEY, max_tokens_to_sample = 512)
model = llm

planner = load_chat_planner(model)

system_message_prompt = SystemMessagePromptTemplate.from_template(combo_template)
human_message_prompt = planner.llm_chain.prompt.messages[1]
planner.llm_chain.prompt = ChatPromptTemplate.from_messages([system_message_prompt, human_message_prompt])

executor = load_agent_executor(model, tools, verbose=True)
agent = PlanAndExecute(planner=planner, executor=executor, verbose=True, max_iterations=2)

Example scenarios based on questions asked by financial analyst

In this section, we explore two example scenarios to illustrate the end-to-end steps performed by the multi-modal agent based on questions asked by financial analyst.

Scenario 1: Questions by financial analyst related to structured data

In this scenario, the financial analyst asks a question in English related to companies’ stocks to the multi-modal agent. The multi-modal LangChain agent comes up with a multi-step plan and decides what tools to use for each step. The following diagram illustrates an example workflow with the following steps:

The financial analyst asks a financial question in English through the UI to the multi-modal agent.
The agent identifies that it requires the database tool to answer the question. It generates a SQL query using an LLM based on the question and queries the Athena database.
Athena runs the SQL query, retrieves the relevant result (stock price time series of the five companies), and passes the result with relevant data to the agent.
The agent identifies that it requires a second tool to answer the question. It passes the retrieved data to the Python tool for portfolio optimization.
The Python tool calculates the required optimal portfolio, including weights, and passes the answer to the LLM within the agent.
The agent compiles the required information and calculations using the LLM and answers the financial analysist via the UI in English.

The financial analyst asks the following question:

“What are the closing prices of stocks AAAA, WWW, DDD in year 2018? Can you build an optimized portfolio using these three stocks?”

The following code shows the steps identified by the planner of the multi-modal agent:

steps=[Step(value='Use Stocks Querying Tool to find closing stock prices of AAAA, WWW and DDD in year 2018.'),
Step(value='Use Portfolio Optimization Tool to build an optimized portfolio using the three stocks.'),
Step(value='Provide the closing stock prices and optimized portfolio allocation.')]

The agent translated the question prompt into the following SQL query:

SELECT date, aaaa, www, ddd
FROM "blog-stock-prices-db"."stock_prices"
WHERE date BETWEEN '2018-01-01' AND '2018-12-31'
ORDER BY date DESC

The following code shows the final answer by the multi-modal agent based on the question:

The closing stock prices for AAAA, WWW and DDD in 2018 are:

AAAA: $172.44
WWW: $85.91
DDD: $9.82

The optimized portfolio allocation for stocks AAAA, WWW and DDD is:

AAAA: 55%
WWW: 45%
DDD: 0%'

Scenario 2: Questions by financial analyst related to unstructured data

In this scenario, a financial analyst asks multiple questions about the company’s financial health to the multi-modal agent. The agent invokes two tool agents, Amazon Kendra with unstructured data and a Python compute tool, to gather information and perform the necessary calculations. The agent then compiles the information from the tools using its language model and provides the full answer to the analyst. The key of this scenario is that the agent can remember the questions and answers of the previous questions, and incorporates the context into the conversation. The following diagram illustrates the workflow, which consists of the following steps:

The financial analyst asks a financial question in English through the UI to the multi-modal agent.
The agent identifies that it requires the Amazon Kendra tool to answer the question and invokes the Amazon Kendra agent tool.
Amazon Kendra searches the unstructured data corpus in Amazon S3 and retrieves the relevant document containing the answer.
Using Retrieval Augmented Generation (RAG), Amazon Kendra passes the answer from the relevant document to the LLM within the agent.
The agent identifies that it requires a second tool to answer the second part of the question. It passes information from the first tool to the Python compute tool.
The Python tool calculates the required ratio and passes the answer to the LLM within the agent.
The agent compiles the required information and calculations using the LLM and answers the financial analyst via the UI.

The following are questions and answers between the analyst and multi-modal agent.

The agent answers the question “What is Amazon’s net loss for fiscal year ending in December 2022?” with “Amazon reported total net sales of $514.0 billion and a net loss of $2.7 billion for fiscal year 2022.” The source is the Amazon 10K report for the fiscal year ending December 2022, retrieved using the Retrive_Kendra Tool.

For the question “What is the net loss for the same year ?” the agent answers “$2.7 billion.” The source is the Amazon 10K report for the fiscal year ending December 2022, retrieved using the Retrive_Kendra Tool. Note that even though the question is similar to the second question, this question didn’t specify the company name or the year number. However, the agent was able to use the correct company name and year based on the context in conversation memory.

For the question “What is the formula for Net Profit Margin Ratio which is a type of profitability ratio ?” the agent answers “Amazon reported total net sales of $514.0 billion and a net loss of $2.7 billion for fiscal year 2022. The Net Profit Margin Ratio for Amazon for fiscal year 2022 is -0.5%.” The source is Retrieve_Kendra and PythonREPLTool().

Dive deeper into the solution

To dive deeper into the solution and the code shown in this post, check out the GitHub repo.

In the GitHub repo, you will be able to find a Python notebook that has the end-to-end solution, an AWS CloudFormation template for provisioning the infrastructure, unstructured data (earnings reports PDF files, earnings call audio files), and structured data (stocks time series).

In the appendix at the end, different questions asked by financial analyst, agent tools invoked, and the answer from the multi-modal agent has been tabulated.

清理

运行多模式代理后，确保清理掉任何不会被利用的资源。关闭并删除使用的数据库（Athena）。此外，请删除Amazon S3中的数据并停止任何SageMaker Studio笔记本电脑实例，以免产生任何进一步的费用。将Lambda函数和DynamoDB表作为未使用的长期内存的一部分删除。如果您使用SageMaker JumpStart将LLM部署为SageMaker实时端点，请通过SageMaker控制台或SageMaker Studio删除该端点。

结论

这篇文章展示了广泛的AWS存储、AI/ML和计算服务，您可以使用这些服务与LangChain框架和生成式AI一起构建先进的多模式AI解决方案。通过集成NLP、语音识别和ML技术，在这种情况下开发的多模式代理能够分析不同的财务数据模式（结构化和非结构化），并对自然语言（英语）的提示做出响应。这提高了金融分析师的生产力和效率，使他们能够专注于高价值的战略工作。由AWS支持的多模式人工智能解决方案有可能通过自动化重复性任务、增强人类能力和高效生成数据驱动的见解来改变金融市场。对于结构化数据，具有多个数据库的现代数据架构可以与生成式AI和LLM相结合。有关更多详细信息，请参阅重塑数据体验：使用生成式人工智能和现代数据架构来解锁洞察力。此外，有关生成式人工智能、基础模型和大型语言模型的用例，请参阅Amazon Bedrock。

Appendix

The following table contains example questions by a financial analyst to ask the multi-modal agent based on structured and unstructured data.

Question Purpose	Question asked by financial analysts through UI	Agent tools invoked	Primary AWS services used by Agent tools	Answer from multi-modal agent
Prompt the LangChain agent to preprocess and extract information from the collected data	Convert the Amazon`s recent quarters 10k and 10q files to text format	1. Text Extraction Tool	Amazon Textract AWS Lambda Amazon S3	`The text files containing Amazon's financial filings have been cleaned and are ready for analysis.`
Prompt the LangChain agent to transcribe the audio recordings to texts	`Can you transcribe most recent quaterly audio recording of earnings call to text format?`	Transcribe Audio Tool	Amazon Transcribe AWS Lambda Amazon S3	`I have the earnings call transcripts, now I can extract key phrases.`
Prompt the LangChain agent to analyze the company’s financial health and performance	`Analyze Amazon's financial health based on the 10k report. Calculate important financial ratios. Please limit your answer to 5 sentences.`	Financial Information Lookup Tool Python Calculation Tool	Amazon Kendra AWS Lambda Amazon S3	Here is a summary of Amazon’s financial health in 5 sentences: Amazon has a low profit margin indicating low profitability relative to revenue. However, Amazon has a high return on assets indicating efficient use of assets to generate profits. Amazon has a low debt-to-equity ratio indicating low financial leverage and risk. Amazon has a high current ratio indicating good short-term liquidity position and ability to pay off short-term obligations. Overall, Amazon seems to be in a good financial position with efficient operations and low risk.
Prompt the LangChain agent to analyze the company’s financial performance from company’s earnings reports	`What were the main drivers of the company’s revenue growth in the quarter? Please limit your answers to 5 sentences.`	Detect Phrases Tool Financial Information Lookup Tool	Amazon Comprehend Amazon Kendra AWS Lambda Amazon S3	Here is a summary of the main drivers of revenue growth in 5 sentences: Amazon's revenue grew due to an increase in subscription and support revenue from new customers. Existing customer accounts also expanded, contributing to revenue growth. Higher net sales drove increased revenue for the company. Although I could not calculate the exact revenue growth percentage, the key phrases and information extracted point to revenue growth being driven primarily by new subscriptions, existing customer expansion, and greater net sales.
Prompt the LangChain agent to analyze the company’s management team and their plans for the future from company’s earnings reports	`What are the company's top priorities for the coming year? What are the biggest risks facing the company? Please limit your answers to 5 sentences.`	Sentiment Analysis Tool Detect Phrases Tool Financial Information Lookup Tool	Amazon Comprehend Amazon Kendra AWS Lambda Amazon S3	Here is a summary response in 5 sentences: Amazon's priorities are investing in growth and expanding into new markets. However, risks like changes in consumer demand and competition threaten their business. By analyzing key phrases in financial reports, their focus on fulfillment, technology, and new products emerges, as do risks around suppliers, exchange rates, and retail competition. Though facing pressures, Amazon aims to innovate, improve customer experience, and strengthen their position. Continuous progress on price, speed, and selection will drive their competitive advantage.
Prompt the LangChain agent to analyze the company’s strategy from company’s earnings reports	`What are company’s financial and business goals for upcoming financial year/quarter? Please limit your answers to 5 sentences.`	Financial Information Lookup Tool Detect Phrases Tool	Amazon Comprehend Amazon Kendra AWS Lambda Amazon S3	`Here is a summary of Amazon's main goals in 5 sentences: Amazon seeks to grow by improving infrastructure and entering new markets. They will invest in fulfillment and technology to enable expansion. By moving into groceries, healthcare, and entertainment, Amazon can diversify their offerings. Focusing on the customer through competitive pricing, selection, and innovation is key. If Amazon achieves operational excellence and continuous progress, they will strengthen their position.`
Prompt the LangChain agent to perform sentiment analysis on the earnings call recordings	`What is the sentiment around inflation in Amazon's earnings call?`	Sentiment Analysis Tool	Amazon Comprehend AWS Lambda Amazon S3	`The sentiment of mentions of 'inflation' in the Amazon earnings call transcript is negative. Justification: Quote 1: "We're seeing inflationary pressures, especially in fuel and transportation costs." Quote 2: "Inflation is a concern and we're seeing cost inflation, which is an unusual thing for us.`
Prompt the LangChain agent to build an optimal portfolio using the collected data	`What are the closing prices of stocks AAAA, WWW, DDD in year 2018? Can you build an optimized portfolio using these three stocks?`	Portfolio Optimization Tool Stocks Querying Tool	Amazon SageMaker Amazon Athena	`The closing stock prices for AAAA, WWW and DDD in 2018 are:` `AAAA: $172.44` `WWW: $85.91` `DDD: $9.82The optimized portfolio allocation for stocks AAAA, WWW and DDD is:` `AAAA: 55%` `WWW: 45%` `DDD: 0%'`