category
多模式数据是金融业的重要组成部分,包括市场、经济、客户、新闻和社交媒体以及风险数据。金融机构生成、收集和使用这些数据来深入了解财务运营,做出更好的决策,并提高绩效。然而,由于金融系统、数据格式和质量的复杂性和缺乏标准化,以及数据的碎片化和非结构化,多模态数据也面临着挑战。金融客户经常描述从多模态数据中获取财务见解的运营开销,这需要复杂的提取和转换逻辑,导致工作量和成本过高。多模态数据的技术挑战还包括集成和建模不同数据类型的复杂性,组合来自多种模态(文本、图像、音频、视频)的数据的难度,以及对高级计算机科学技能和复杂分析工具的需求。
处理多模态数据的一种越来越受欢迎的方法是使用多模态代理。多模态代理是一种人工智能系统,可以使用其工具包中的正确工具以多种模态理解和分析数据。他们能够将这些不同数据类型的见解联系起来,以获得更全面的理解并产生适当的回应。多模式代理与生成式人工智能相结合,在金融市场中得到了广泛的应用。以下是一些流行的用例:
- 智能报告和市场情报——人工智能可以分析各种财务信息来源,生成市场情报报告,帮助分析师、投资者和公司了解最新趋势。多模式代理可以快速总结冗长的财务报告,为分析师节省大量时间和精力。
- 定量建模和预测——生成模型可以合成大量财务数据,为股票价格预测、投资组合优化、风险建模等应用程序训练机器学习(ML)模型。了解不同数据源的多模式模型可以提供更稳健的预测。
- 合规和欺诈检测——该解决方案可以扩展到包括监控工具,这些工具可以分析通话、电子邮件、聊天、访问日志等通信渠道,以识别潜在的内幕交易或市场操纵。检测跨数据类型的欺诈勾结需要多模态分析。
具有生成性人工智能的多模式代理通过自动化重复和常规任务来提高金融分析师的生产力,为分析师腾出时间专注于高价值的工作。多模式代理可以通过协助研究和分析来增强分析师获得洞察力的能力。多模式代理还可以生成增强的定量分析和财务模型,使分析师能够更快、更准确地工作。
使用AWS实施多模式代理可以大规模整合来自各种结构化和非结构化数据的关键见解。多模式代理可以轻松地将Amazon Bedrock和Amazon SageMaker JumpStart的生成性AI产品的强大功能与AWS Analytics和AI/ML服务的数据处理功能相结合,提供敏捷的解决方案,使财务分析师能够在AWS内以安全和可扩展的方式高效地分析和收集多模式数据的见解。亚马逊提供了一套人工智能服务,支持自然语言处理(NLP)、语音识别、文本提取和搜索:
- Amazon Comprehend is an NLP service that can analyze text for key phrases and analyze sentiment
- Amazon Textract is an intelligent document processing service that can accurately extract text and data from documents
- Amazon Transcribe is an automatic speech recognition service that can convert speech to text
- Amazon Kendra is an enterprise search service powered by ML to find the information across a variety of data sources, including documents and knowledge bases
In this post, we showcase a scenario where a financial analyst interacts with the organization’s multi-modal data, residing on purpose-built data stores, to gather financial insights. In the interaction, we demonstrate how multi-modal agents plan and run the user query and retrieve the results from the relevant data sources. All this is achieved using AWS services, thereby increasing the financial analyst’s efficiency to analyze multi-modal financial data (text, speech, and tabular data) holistically.
The following screenshot shows an example of the UI.
解决方案概述
下图说明了使用代理将生成式人工智能与多模态数据结合使用的概念架构。所涉及的步骤如下:
- 金融分析师通过聊天机器人等平台提出问题。
- 该平台使用一个框架来确定最合适的多模式代理工具来回答这个问题。
- 一旦被识别,平台就会运行与之前识别的工具链接的代码。
- 该工具根据财务分析师的要求生成财务数据分析。
- 在总结结果时,大型语言模型会检索并向财务分析师报告。
技术架构
多模式代理根据业务用户的自然语言提示编排各种工具,以生成见解。对于非结构化数据,该代理使用AWS Lambda函数和AI服务,如用于文档分析的Amazon Textract、用于语音识别的Amazon Transcribe、用于NLP的Amazon Comprehend和用于智能搜索的Amazon Kendra。对于结构化数据,代理使用SQL连接器和SQLAlchemy来分析数据库,其中包括Amazon Athena。该代理还利用Lambda中的Python和Amazon SageMaker SDK进行计算和定量建模。该代理还具有长期内存,用于在Amazon DynamoDB中存储提示和结果。多模式代理驻留在SageMaker笔记本中,并根据Streamlit UI中业务用户的英语提示协调这些工具。
The key components of the technical architecture are as follows:
- Data storage and analytics – The quarterly financial earning recordings as audio files, financial annual reports as PDF files, and S&P stock data as CSV files are hosted on Amazon Simple Storage Service (Amazon S3). Data exploration on stock data is done using Athena.
- Large language models – The large language models (LLMs) are available via Amazon Bedrock, SageMaker JumpStart, or an API.
- Agents – We use LangChain’s agents for a non-predetermined chain of calls as user input to LLMs and other tools. In these types of chains, there is an agent that has access to a suite of tools. Each tool has been built for a specific task. Depending on the user input, the agent decides the tool or a combination of tools to call to answer the question. We created the following purpose-built agent tools for our scenario:
- Stocks Querying Tool – To query S&P stocks data using Athena and SQLAlchemy.
- Portfolio Optimization Tool – To build a portfolio based on the chosen stocks.
- Financial Information Lookup Tool – To search for financial earnings information stored in multi-page PDF files using Amazon Kendra.
- Python Calculation Tool – To use for mathematical calculations.
- Sentiment Analysis Tool – To identify and score sentiments on a topic using Amazon Comprehend.
- Detect Phrases Tool – To find key phrases in recent quarterly reports using Amazon Comprehend.
- Text Extraction Tool – To convert the PDF versions of quarterly reports to text files using Amazon Textract.
- Transcribe Audio Tool – To convert audio recordings to text files using Amazon Transcribe.
The agent memory that holds the chain of user interactions with the agent is saved in DynamoDB.
The following sections explain some of the primary steps with associated code. To dive deeper into the solution and code for all the steps shown here, refer to the GitHub repo.
Prerequisites
To run this solution, you must have an API key to an LLM such as Anthropic Claude v2, or have access to Amazon Bedrock foundation models.
To generate responses from structured and unstructured data using LLMs and LangChain, you need access to LLMs through either Amazon Bedrock, SageMaker JumpStart, or API keys, and to use databases that are compatible with SQLAlchemy. AWS Identity and Access Management (IAM) policies are also required, the details which you can find in the GitHub repo.
Key components of a multi-modal agent
There are a few key components components of the multi-modal agent:
- Functions defined for tools of the multi-modal agent
- Tools defined for the multi-modal agent
- Long-term memory for the multi-modal agent
- Planner-executor based multi-modal agent (defined with tools, LLMs, and memory)
In this section, we illustrate the key components with associated code snippets.
Functions defined for tools of the multi-modal agent
The multi-modal agent needs to use various AI services to process different types of data—text, speech, images, and more. Some of these functions may need to call AWS AI services like Amazon Comprehend to analyze text, Amazon Textract to analyze images and documents, and Amazon Transcribe to convert speech to text. These functions can either be called locally within the agent or deployed as Lambda functions that the agent can invoke. The Lambda functions internally call the relevant AWS AI services and return the results to the agent. This approach modularizes the logic and makes the agent more maintainable and extensible.
The following function defines how to calculate the optimized portfolio based on the chosen stocks. One way to convert a Python-based function to an LLM tool is to use the BaseTool wrapper.
The following is the code for Lambda calling the AWS AI service (Amazon Comprehend, Amazon Textract, Amazon Transcribe) APIs:
Tools defined for the multi-modal agent
The multi-modal agent has access to various tools to enable its functionality. It can query a stocks database to answer questions on stocks. It can optimize a portfolio using a dedicated tool. It can retrieve information from Amazon Kendra, Amazon’s enterprise search service. A Python REPL tool allows the agent to run Python code. An example of the structure of the tools, including their names and descriptions, is shown in the following code. The actual tool box of this post has eight tools: Stocks Querying Tool, Portfolio Optimization Tool, Financial Information Lookup Tool, Python Calculation Tool, Sentiment Analysis Tool, Detect Phrases Tool, Text Extraction Tool, and Transcribe Audio Tool.
Long-term memory for the multi-modal agent
The following code illustrates the configuration of long-term memory for the multi-modal agent. In this code, DynamoDB table is added as memory to store prompts and answers for future reference.
Planner-executor based multi-modal agent
The planner-executor based multi-modal agent architecture has two main components: a planner and an executor. The planner generates a high-level plan with steps required to run and answer the prompt question. The executor then runs this plan by generating appropriate system responses for each plan step using the language model with necessary tools. See the following code:
Example scenarios based on questions asked by financial analyst
In this section, we explore two example scenarios to illustrate the end-to-end steps performed by the multi-modal agent based on questions asked by financial analyst.
Scenario 1: Questions by financial analyst related to structured data
In this scenario, the financial analyst asks a question in English related to companies’ stocks to the multi-modal agent. The multi-modal LangChain agent comes up with a multi-step plan and decides what tools to use for each step. The following diagram illustrates an example workflow with the following steps:
- The financial analyst asks a financial question in English through the UI to the multi-modal agent.
- The agent identifies that it requires the database tool to answer the question. It generates a SQL query using an LLM based on the question and queries the Athena database.
- Athena runs the SQL query, retrieves the relevant result (stock price time series of the five companies), and passes the result with relevant data to the agent.
- The agent identifies that it requires a second tool to answer the question. It passes the retrieved data to the Python tool for portfolio optimization.
- The Python tool calculates the required optimal portfolio, including weights, and passes the answer to the LLM within the agent.
- The agent compiles the required information and calculations using the LLM and answers the financial analysist via the UI in English.
The financial analyst asks the following question:
“What are the closing prices of stocks AAAA, WWW, DDD in year 2018? Can you build an optimized portfolio using these three stocks?”
The following code shows the steps identified by the planner of the multi-modal agent:
The agent translated the question prompt into the following SQL query:
The following code shows the final answer by the multi-modal agent based on the question:
Scenario 2: Questions by financial analyst related to unstructured data
In this scenario, a financial analyst asks multiple questions about the company’s financial health to the multi-modal agent. The agent invokes two tool agents, Amazon Kendra with unstructured data and a Python compute tool, to gather information and perform the necessary calculations. The agent then compiles the information from the tools using its language model and provides the full answer to the analyst. The key of this scenario is that the agent can remember the questions and answers of the previous questions, and incorporates the context into the conversation. The following diagram illustrates the workflow, which consists of the following steps:
- The financial analyst asks a financial question in English through the UI to the multi-modal agent.
- The agent identifies that it requires the Amazon Kendra tool to answer the question and invokes the Amazon Kendra agent tool.
- Amazon Kendra searches the unstructured data corpus in Amazon S3 and retrieves the relevant document containing the answer.
- Using Retrieval Augmented Generation (RAG), Amazon Kendra passes the answer from the relevant document to the LLM within the agent.
- The agent identifies that it requires a second tool to answer the second part of the question. It passes information from the first tool to the Python compute tool.
- The Python tool calculates the required ratio and passes the answer to the LLM within the agent.
- The agent compiles the required information and calculations using the LLM and answers the financial analyst via the UI.
The following are questions and answers between the analyst and multi-modal agent.
The agent answers the question “What is Amazon’s net loss for fiscal year ending in December 2022?” with “Amazon reported total net sales of $514.0 billion and a net loss of $2.7 billion for fiscal year 2022.” The source is the Amazon 10K report for the fiscal year ending December 2022, retrieved using the Retrive_Kendra
Tool.
For the question “What is the net loss for the same year ?” the agent answers “$2.7 billion.” The source is the Amazon 10K report for the fiscal year ending December 2022, retrieved using the Retrive_Kendra
Tool. Note that even though the question is similar to the second question, this question didn’t specify the company name or the year number. However, the agent was able to use the correct company name and year based on the context in conversation memory.
For the question “What is the formula for Net Profit Margin Ratio which is a type of profitability ratio ?” the agent answers “Amazon reported total net sales of $514.0 billion and a net loss of $2.7 billion for fiscal year 2022. The Net Profit Margin Ratio for Amazon for fiscal year 2022 is -0.5%.” The source is Retrieve_Kendra
and PythonREPLTool().
Dive deeper into the solution
To dive deeper into the solution and the code shown in this post, check out the GitHub repo.
In the GitHub repo, you will be able to find a Python notebook that has the end-to-end solution, an AWS CloudFormation template for provisioning the infrastructure, unstructured data (earnings reports PDF files, earnings call audio files), and structured data (stocks time series).
In the appendix at the end, different questions asked by financial analyst, agent tools invoked, and the answer from the multi-modal agent has been tabulated.
清理
运行多模式代理后,确保清理掉任何不会被利用的资源。关闭并删除使用的数据库(Athena)。此外,请删除Amazon S3中的数据并停止任何SageMaker Studio笔记本电脑实例,以免产生任何进一步的费用。将Lambda函数和DynamoDB表作为未使用的长期内存的一部分删除。如果您使用SageMaker JumpStart将LLM部署为SageMaker实时端点,请通过SageMaker控制台或SageMaker Studio删除该端点。
结论
这篇文章展示了广泛的AWS存储、AI/ML和计算服务,您可以使用这些服务与LangChain框架和生成式AI一起构建先进的多模式AI解决方案。通过集成NLP、语音识别和ML技术,在这种情况下开发的多模式代理能够分析不同的财务数据模式(结构化和非结构化),并对自然语言(英语)的提示做出响应。这提高了金融分析师的生产力和效率,使他们能够专注于高价值的战略工作。由AWS支持的多模式人工智能解决方案有可能通过自动化重复性任务、增强人类能力和高效生成数据驱动的见解来改变金融市场。对于结构化数据,具有多个数据库的现代数据架构可以与生成式AI和LLM相结合。有关更多详细信息,请参阅重塑数据体验:使用生成式人工智能和现代数据架构来解锁洞察力。此外,有关生成式人工智能、基础模型和大型语言模型的用例,请参阅Amazon Bedrock。
Appendix
The following table contains example questions by a financial analyst to ask the multi-modal agent based on structured and unstructured data.
Question Purpose | Question asked by financial analysts through UI | Agent tools invoked | Primary AWS services used by Agent tools | Answer from multi-modal agent |
Prompt the LangChain agent to preprocess and extract information from the collected data | Convert the Amazon`s recent quarters 10k and 10q files to text format |
1. Text Extraction Tool |
|
The text files containing Amazon's financial filings have been cleaned and are ready for analysis. |
Prompt the LangChain agent to transcribe the audio recordings to texts | Can you transcribe most recent quaterly audio recording of earnings call to text format? |
|
|
I have the earnings call transcripts, now I can extract key phrases. |
Prompt the LangChain agent to analyze the company’s financial health and performance | Analyze Amazon's financial health based on the 10k report. Calculate important financial ratios. Please limit your answer to 5 sentences. |
|
|
Here is a summary of Amazon’s financial health in 5 sentences: Amazon has a low profit margin indicating low profitability relative to revenue. However, Amazon has a high return on assets indicating efficient use of assets to generate profits. Amazon has a low debt-to-equity ratio indicating low financial leverage and risk. Amazon has a high current ratio indicating good short-term liquidity position and ability to pay off short-term obligations. Overall, Amazon seems to be in a good financial position with efficient operations and low risk. |
Prompt the LangChain agent to analyze the company’s financial performance from company’s earnings reports | What were the main drivers of the company’s revenue growth in the quarter? Please limit your answers to 5 sentences. |
|
|
Here is a summary of the main drivers of revenue growth in 5 sentences: Amazon's revenue grew due to an increase in subscription and support revenue from new customers. Existing customer accounts also expanded, contributing to revenue growth. Higher net sales drove increased revenue for the company. Although I could not calculate the exact revenue growth percentage, the key phrases and information extracted point to revenue growth being driven primarily by new subscriptions, existing customer expansion, and greater net sales. |
Prompt the LangChain agent to analyze the company’s management team and their plans for the future from company’s earnings reports | What are the company's top priorities for the coming year? What are the biggest risks facing the company? Please limit your answers to 5 sentences. |
|
|
Here is a summary response in 5 sentences: Amazon's priorities are investing in growth and expanding into new markets. However, risks like changes in consumer demand and competition threaten their business. By analyzing key phrases in financial reports, their focus on fulfillment, technology, and new products emerges, as do risks around suppliers, exchange rates, and retail competition. Though facing pressures, Amazon aims to innovate, improve customer experience, and strengthen their position. Continuous progress on price, speed, and selection will drive their competitive advantage. |
Prompt the LangChain agent to analyze the company’s strategy from company’s earnings reports | What are company’s financial and business goals for upcoming financial year/quarter? Please limit your answers to 5 sentences. |
|
|
Here is a summary of Amazon's main goals in 5 sentences: Amazon seeks to grow by improving infrastructure and entering new markets. They will invest in fulfillment and technology to enable expansion. By moving into groceries, healthcare, and entertainment, Amazon can diversify their offerings. Focusing on the customer through competitive pricing, selection, and innovation is key. If Amazon achieves operational excellence and continuous progress, they will strengthen their position. |
Prompt the LangChain agent to perform sentiment analysis on the earnings call recordings | What is the sentiment around inflation in Amazon's earnings call? |
|
|
The sentiment of mentions of 'inflation' in the Amazon earnings call transcript is negative. Justification: Quote 1: "We're seeing inflationary pressures, especially in fuel and transportation costs." Quote 2: "Inflation is a concern and we're seeing cost inflation, which is an unusual thing for us. |
Prompt the LangChain agent to build an optimal portfolio using the collected data | What are the closing prices of stocks AAAA, WWW, DDD in year 2018? Can you build an optimized portfolio using these three stocks? |
|
|
The closing stock prices for AAAA, WWW and DDD in 2018 are: AAAA: $172.44 WWW: $85.91 DDD: $9.82The optimized portfolio allocation for stocks AAAA, WWW and DDD is: AAAA: 55% WWW: 45% DDD: 0%' |
- 登录 发表评论
- 3 次浏览
最新内容
- 10 hours 17 minutes ago
- 10 hours ago
- 12 hours 15 minutes ago
- 12 hours 20 minutes ago
- 12 hours ago
- 12 hours 27 minutes ago
- 14 hours ago
- 14 hours ago
- 14 hours ago
- 15 hours 1 minute ago