category
代理是使用大型语言模型(LLM)的最强大、最吸引人的方法之一。对LLM的兴趣激增,使得代理在人工智能驱动的用例中非常普遍。
使用代理可以让LLM访问工具。这些工具提供了无限多的可能性。有了这些工具,LLM可以搜索网络、计算、运行代码等等。
LangChain库提供了大量的预构建工具选择。然而,在许多现实世界的项目中,我们经常会发现现有工具只能满足这么多需求。这意味着我们必须修改现有的工具或构建全新的工具。
本章将探讨如何在LangChain中为代理构建自定义工具。我们将从几个简单的工具开始,帮助我们理解典型的工具构建模式,然后再转到更复杂的工具,使用其他ML模型为我们提供更多的能力,如描述图像。
构建工具
工具的核心是使用一些输入的对象,通常以字符串(文本)的格式,并将一些有用的信息作为字符串输出。
事实上,它们只不过是我们在任何代码中都能找到的一个简单函数。唯一的区别是,工具从LLM获取输入,并将其输出提供给LLM。
考虑到这一点,工具相对简单。幸运的是,我们可以很快为我们的代理构建工具。
code notebook here!)
简单计算器工具
我们将从一个简单的自定义工具开始。该工具是一个简单的计算器,可以根据圆的半径计算圆的周长。
简单的计算器工具
要创建该工具,我们执行以下操作:
from langchain.tools import BaseTool
from math import pi
from typing import Union
class CircumferenceTool(BaseTool):
name = "Circumference calculator"
description = "use this tool when you need to calculate a
circumference using the radius of a circle"
def _run(self, radius: Union[int, float]):
return float(radius)*2.0*pi
def _arun(self, radius: int):
raise NotImplementedError("This tool does not support async")
在这里,我们使用LangChain中的BaseTool对象初始化了我们的自定义圆周工具类。我们可以将BaseTool视为LangChain工具所需的模板。
LangChain需要两个属性才能将对象识别为有效工具。这些是名称和描述参数。
该描述是LLM用来决定是否需要使用的工具的自然语言描述。工具描述应该非常明确地说明它们做什么、何时使用和何时不使用。
在我们的描述中,我们没有定义何时不使用该工具。这是因为LLM似乎能够确定何时需要此工具。如果工具使用过度,在描述中添加“何时不使用”会有所帮助。
接下来,我们有两个方法,_run和_arun。使用工具时,默认情况下会调用_run方法。异步使用工具时会调用_arun方法。我们在本章中没有介绍异步工具,因此,目前,我们使用NotImplementedError对其进行初始化。
从这里开始,我们需要初始化会话代理的LLM和会话内存。对于LLM,我们将使用OpenAI的gpt-3.5-turbo模型。为了使用这个,我们需要一个OpenAI API密钥。
准备好后,我们初始化LLM和内存,如下所示:
from langchain.chat_models import ChatOpenAI
from langchain.chains.conversation.memory import
ConversationBufferWindowMemory
# initialize LLM (we use ChatOpenAI because we'll
later define a `chat` agent)
llm = ChatOpenAI(
openai_api_key="OPENAI_API_KEY",
temperature=0,
model_name='gpt-3.5-turbo'
)
# initialize conversational memory
conversational_memory = ConversationBufferWindowMemory(
memory_key='chat_history',
k=5,
return_messages=True
)
在这里,我们用0的温度初始化LLM。在使用工具时,低温很有用,因为它可以减少LLM生成文本中的“随机性”或“创造性”,这是鼓励其遵循工具使用所需的严格说明的理想选择。
在conversation_memory对象中,我们将k=5设置为“记住”之前的五次人工智能交互。
现在我们初始化代理本身。它要求llm和conversation_memory已经初始化。它还需要一个要使用的工具列表。我们有一个工具,但我们仍然把它放在一个列表中。
from langchain.agents import initialize_agent
tools = [CircumferenceTool()]
# initialize agent with tools
agent = initialize_agent(
agent='chat-conversational-react-description',
tools=tools,
llm=llm,
verbose=True,
max_iterations=3,
early_stopping_method='generate',
memory=conversational_memory
)
chat-conversation-react-description的代理类型告诉我们有关此代理的一些信息,它们是:
chat意味着所使用的LLM是一个聊天模型。gpt-4和gpt-3.5-turbo都是聊天模型,因为它们消耗会话历史并产生会话响应。像text-davinci-003这样的模型不是聊天模型,因为它不是设计用于这种方式的。
会话意味着我们将包含conversation_memory。
react指的是react框架,该框架通过赋予模型“与自身对话”的能力来实现多步骤推理和工具使用。
description告诉我们,LLM/代理将根据它们的描述来决定使用哪个工具——我们在前面的工具定义中创建了这些描述。
有了这些,我们可以让我们的代理人计算一个圆的周长。
agent("can you calculate the circumference of a circle that has a radius of 7.81mm")
[1m> Entering new AgentExecutor chain...
[0m
[32;1m[1;3m{
"action": "Final Answer",
"action_input": "The circumference of a circle with a radius of 7.81mm is
approximately 49.03mm."
}[0m
[1m> Finished chain.[0m
{'input': 'can you calculate the circumference of a circle that has a
radius of 7.81mm',
'chat_history': [],
'output': 'The circumference of a circle with a radius of 7.81mm
is approximately 49.03mm.'}
(7.81 * 2) * pi
49.071677249072565
agent 很接近,但不准确——出了问题。我们可以在AgentExecutitor Chain的输出中看到,代理直接跳转到Final Answer操作:
{ "action": "Final Answer", "action_input":
"The circumference of a circle
with a radius of 7.81mm is approximately 49.03mm." }
“最终回答”操作是指代理在决定已完成推理和操作步骤并拥有回答用户查询所需的所有信息时所使用的操作。这意味着代理人决定不使用周长计算器工具。
LLM通常不擅长数学,但这并不能阻止他们尝试做数学。这个问题是由于LLM对其数学能力过于自信。要解决这个问题,我们必须告诉模型它不能进行数学运算。首先,让我们看看当前使用的提示:
# existing prompt
print(agent.agent.llm_chain.prompt.messages[0].prompt.template)
Assistant is a large language model trained by OpenAI. Assistant is
designed to
be able to assist
with a wide range of tasks, from answering simple questions to providing in-depth
explanations
and discussions on a wide range of topics. As a language model, Assistant is
able to generate
human-like text based on the input it receives, allowing it to engage in
natural-sounding
conversations and provide responses that are coherent and relevant to the
topic at hand.
Assistant is constantly learning and improving, and its capabilities are
constantly evolving.
It is able to process and understand large amounts of text, and can use
this knowledge to
provide accurate and informative responses to a wide range of questions.
Additionally,
Assistant is able to generate its own text based on the input it
receives, allowing it to
engage in discussions and provide explanations and descriptions on a
wide range of topics.
Overall, Assistant is a powerful system that can help with a wide
range of tasks and
provide valuable insights and information on a wide range of
topics.
Whether you need help with a specific question or just want to
have a conversation
about a particular topic, Assistant is here to assist.
我们将添加一句话,告诉模型它“数学很糟糕”,永远不应该尝试这样做。
不幸的是,这位助理数学不好。当提供数学问题时,无论多么简单,助理总是参考其可靠的工具,绝对不会试图自己回答数学问题
将其添加到原始提示文本中后,我们将使用agent.agent.create_prompt创建一个新的提示——这将为我们的代理创建正确的提示结构,包括工具描述。然后,我们更新agent.agent.lm_chain.compt。
sys_msg = """Assistant is a large language model trained by OpenAI.
Assistant is designed to be able to assist with a wide range of tasks,
from
answering
simple questions to providing in-depth explanations and discussions on a
wide range
of topics. As a language model, Assistant is able to generate human-like
text based on
the input it receives, allowing it to engage in natural-sounding
conversations and provide
responses that are coherent and relevant to the topic at hand.
Assistant is constantly learning and improving, and its capabilities
are constantly
evolving. It is able to process and understand large amounts of text,
and can use
this knowledge to provide accurate and informative responses to a wide
range of
questions. Additionally, Assistant is able to generate its own text
based on the
input it receives, allowing it to engage in discussions and provide
explanations
and descriptions on a wide range of topics.
Unfortunately, Assistant is terrible at maths. When provided with math
questions,
no matter how simple, assistant always refers to it's trusty tools and
absolutely
does NOT try to answer math questions by itself
Overall, Assistant is a powerful system that can help with a wide range
of tasks
and provide valuable insights and information on a wide range of topics.
Whether you need help with a specific question or just want to have a
conversation about a particular topic, Assistant is here to assist.
"""
new_prompt = agent.agent.create_prompt(
system_message=sys_msg,
tools=tools
)
agent.agent.llm_chain.prompt = new_prompt
现在我们可以再试一次:
agent("can you calculate the circumference of a circle
that has a radius of 7.81mm")
[1m> Entering new AgentExecutor chain...[0m [32;1m[1;3m```json { "action": "Circumference calculator", "action_input": "7.81" } ```[0m Observation: [36;1m[1;3m49.071677249072565[0m Thought:[32;1m[1;3m```json { "action": "Final Answer", "action_input": "The circumference of a circle
with a radius of 7.81mm is approximately 49.07mm." } ```[0m [1m> Finished chain.[0m
{'input': 'can you calculate the circumference of a circle that has a
radius of 7.81mm',
'chat_history': [HumanMessage(content='can you calculate the
circumference of a circle that has a radius of 7.81mm',
additional_kwargs={}),
AIMessage(content='The circumference of a circle
with a radius
of 7.81mm is approximately 49.03mm.', additional_kwargs={})],
'output': 'The circumference of a circle with a radius of 7.81mm
is approximately 49.07mm.'}
我们可以看到,代理现在使用周长计算器工具,从而得到正确的答案。
具有多个参数的工具
在周长计算器中,我们只能输入一个值——半径——通常情况下,我们需要多个参数。
为了演示如何做到这一点,我们将构建一个次寿命计算器。该工具将帮助我们计算给定三角形边长和/或角度组合的三角形的斜边。
我们需要多个输入,因为我们计算的三角形斜边具有不同的值(边和角度)。此外,我们不需要所有的值。我们可以用两个或多个参数的任意组合来计算斜边。
我们这样定义我们的新工具:
from typing import Optional
from math import sqrt, cos, sin
desc = (
"use this tool when you need to calculate the length of a hypotenuse"
"given one or two sides of a triangle and/or an angle (in degrees). "
"To use the tool, you must provide at least two of the following
parameters "
"['adjacent_side', 'opposite_side', 'angle']."
)
class PythagorasTool(BaseTool):
name = "Hypotenuse calculator"
description = desc
def _run(
self,
adjacent_side: Optional[Union[int, float]] = None,
opposite_side: Optional[Union[int, float]] = None,
angle: Optional[Union[int, float]] = None
):
# check for the values we have been given
if adjacent_side and opposite_side:
return sqrt(float(adjacent_side)**2 + float(opposite_side)**2)
elif adjacent_side and angle:
return adjacent_side / cos(float(angle))
elif opposite_side and angle:
return opposite_side / sin(float(angle))
else:
return "Could not calculate the hypotenuse of the triangle.
Need two or more of `adjacent_side`,
`opposite_side`, or `angle`."
def _arun(self, query: str):
raise NotImplementedError("This tool does not support async")
tools = [PythagorasTool()]
在工具描述中,我们用自然语言描述了工具的功能,并指定要“使用该工具,您必须至少提供以下两个参数[‘adjacent_side’、‘opposite_side’、‘angle’]”。此说明是gpt-3.5-turbo了解该功能所需输入格式所需的全部说明。
和以前一样,我们必须更新代理的提示。我们不需要像以前那样修改系统消息,但我们确实需要更新提示中描述的可用工具。
new_prompt = agent.agent.create_prompt(
system_message=sys_msg,
tools=tools
)
agent.agent.llm_chain.prompt = new_prompt
与以前不同的是,我们还必须使用新工具更新agent.tools属性:
agent.tools = tools
现在,我们提出一个问题,指定三个必需参数中的两个:
[1m> Entering new AgentExecutor chain...[0m
WARNING:langchain.chat_models.openai:Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.
<locals>._completion_with_retry in 1.0 seconds
as it raised RateLimitError:
The server had an error while processing your request.
Sorry about that!.
[32;1m[1;3m{
"action": "Hypotenuse calculator",
"action_input": {
"adjacent_side": 34,
"opposite_side": 51
}
}[0m
Observation: [36;1m[1;3m61.29437168288782
[0m
Thought:[32;1m[1;3m{
"action": "Final Answer",
"action_input": "The length of the hypotenuse is
approximately 61.29cm."
}[0m
[1m> Finished chain.[0m
{'input': 'If I have a triangle with two sides of length 51cm and 34cm,
what is the length of the hypotenuse?',
'chat_history': [HumanMessage(content='can you
calculate the
circumference of a circle that has a radius of 7.81mm', additional_kwargs={}),
AIMessage(content='The circumference of a circle
with a radius of
7.81mm is approximately 49.03mm.', additional_kwargs={}),
HumanMessage(content='can you calculate the
circumference of a
circle that has a radius of 7.81mm', additional_kwargs={}),
AIMessage(content='The circumference of a circle
with a radius of
7.81mm is approximately 49.07mm.',
additional_kwargs={})],
'output': 'The length of the hypotenuse is approximately 61.29cm.'}
代理正确地识别正确的参数,并将它们传递给我们的工具。我们可以使用不同的参数重试:
agent("If I have a triangle with the opposite side of length
51cm and
an angle of
20 deg, what is the length of the hypotenuse?")
[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m{
"action": "Hypotenuse calculator",
"action_input": {
"opposite_side": 51,
"angle": 20
}
}[0m
Observation: [36;1m[1;3m55.86315275680817[0m
Thought:[32;1m[1;3m{
"action": "Final Answer",
"action_input": "The length of the hypotenuse is approximately 55.86cm."
}[0m
[1m> Finished chain.[0m
{'input': 'If I have a triangle with the opposite side of length 51cm and an
angle of 20 deg, what is the length of the hypotenuse?',
'chat_history': [HumanMessage(content='can you calculate the
circumference of a circle that has a radius of 7.81mm', additional_kwargs={}),
AIMessage(content='The circumference of a circle with a
radius of
7.81mm is approximately 49.03mm.', additional_kwargs={}),
HumanMessage(content='can you calculate the
circumference of a
circle that has a radius of 7.81mm', additional_kwargs={}),
AIMessage(content='The circumference of a
circle with a radius of
7.81mm is approximately 49.07mm.', additional_kwargs={}),
HumanMessage(content='If I have a triangle with
two sides of
length 51cm and 34cm, what is the length of the
hypotenuse?',
additional_kwargs={}),
AIMessage(content='The length of the hypotenuse is approximately
61.29cm.', additional_kwargs={})],
'output': 'The length of the hypotenuse is approximately 55.86cm.'}
再次,我们看到了正确的工具使用。即使有我们简短的工具描述,代理也可以按照预期和多个参数一致地使用工具。
更高级的工具使用
我们已经看到了两个自定义工具的例子。在大多数情况下,我们可能想做一些更强大的事情——所以让我们试一试。
从HuggingGPT论文[1]中获得灵感,我们将采用一个现有的开源专家模型,该模型已针对我们的LLM无法完成的特定任务进行了培训。
该模型将是“拥抱脸”上的Salesforce/bip图片字幕大型模型。这个模型获取图像并对其进行描述,这是我们用LLM无法做到的。
首先,我们需要初始化模型,如下所示:
# !pip install transformers
import torch
from transformers import BlipProcessor, BlipForConditionalGeneration
# specify model to be used
hf_model = "Salesforce/blip-image-captioning-large"
# use GPU if it's available
device = 'cuda' if torch.cuda.is_available() else 'cpu'
# preprocessor will prepare images for the model
processor = BlipProcessor.from_pretrained(hf_model)
# then we initialize the model itself
model = BlipForConditionalGeneration.
from_pretrained(hf_model).to(device)
我们将遵循以下流程:
- 下载图像。
- 将其作为Python PIL对象(图像数据类型)打开。
- 使用处理器调整图像大小并使其正常化。
- 使用模型创建标题。
让我们从第一步和第二步开始:
import requests
from PIL import Image
img_url =
'https://images.unsplash.com/photo-1616128417859-3a984dd35f02?
ixlib=rb-4.0.3&
ixid=MnwxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8&auto=format
&fit=crop&w=2372&q=80'
image = Image.open(requests.get(img_url, stream=True).raw).
convert('RGB')
image
有了这个,我们下载了一张年轻猩猩坐在树上的照片。我们可以继续查看这张图片的预测标题是什么:
# unconditional image captioning
inputs = processor(image, return_tensors="pt").to(device)
out = model.generate(**inputs, max_new_tokens=20)
print(processor.decode(out[0], skip_special_tokens=True))
there is a monkey that is sitting in a tree
尽管从技术上讲猩猩不是猴子,但这仍然相当准确。我们的代码有效。现在,让我们将这些步骤提炼成我们的代理可以使用的工具。
desc = (
"use this tool when given the URL of an image that you'd like to be "
"described. It will return a simple caption describing the image."
)
class ImageCaptionTool(BaseTool):
name = "Image captioner"
description = desc
def _run(self, url: str):
# download the image and convert to PIL object
image = Image.open(requests.get(img_url, stream=True).raw).
convert('RGB')
# preprocess the image
inputs = processor(image, return_tensors="pt").to(device)
# generate the caption
out = model.generate(**inputs, max_new_tokens=20)
# get the caption
caption = processor.decode(out[0], skip_special_tokens=True)
return caption
def _arun(self, query: str):
raise NotImplementedError("This tool does not support async")
tools = [ImageCaptionTool()]
我们重新初始化代理提示(删除现在不必要的“你不能做数学”指令),并设置tools属性以反映新的工具列表:
sys_msg = """Assistant is a large language model trained by OpenAI.
Assistant is designed to be able to assist with a wide range of tasks,
from answering simple questions to providing in-depth explanations
and discussions on a wide range of topics. As a language model,
Assistant is able to generate human-like
text based on the input it receives, allowing it to engage in
natural-sounding conversations and provide responses that are
coherent and relevant to the topic at hand.
Assistant is constantly learning and improving, and its capabilities
are constantly evolving. It is able to process and understand large
amounts of text, and can use this knowledge to provide accurate and
informative responses to a wide range of questions. Additionally,
Assistant is able to generate its own text based on the input it
receives, allowing it to engage in discussions and provide
explanations and descriptions on a wide range of topics.
Overall, Assistant is a powerful system that can help with a
wide range of tasks and provide valuable insights and information
on a wide range of topics. Whether you need help with a specific
question or just want to have a conversation about a particular
topic, Assistant is here to assist.
"""
new_prompt = agent.agent.create_prompt(
system_message=sys_msg,
tools=tools
)
agent.agent.llm_chain.prompt = new_prompt
# update the agent tools
agent.tools = tools
现在,我们可以继续要求我们的代理描述与上面相同的图像,并将其URL传递到查询中。
agent(f"What does this image show?\n{img_url}")
[1m> Entering new AgentExecutor chain...
[0m
[32;1m[1;3m{
"action": "Image captioner",
"action_input":
"https://images.unsplash.com/photo-1616128417859-3a984dd35f02?ixlib=rb-4.0.3
&ixid=MnwxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8&
auto=format&fit=crop&w=2372&q=80"
}
[0m
Observation: [36;1m[1;3mthere is a monkey
that is sitting in a tree[0m
Thought:[32;1m[1;3m{
"action": "Final Answer",
"action_input": "There is a monkey that
is sitting in a tree."
}[0m
[1m> Finished chain.[0m
{'input': 'What does this image show?\nhttps://images.unsplash.com/
photo-1616128417859-3a984dd35f02?ixlib=rb-4.0.3&ixid=MnwxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8
&auto=format&fit=crop&w=2372&q=80',
'chat_history': [HumanMessage(content='can you calculate the
circumference of a circle that has a radius of 7.81mm', additional_kwargs={}),
AIMessage(content='The circumference of a circle
with a radius
of 7.81mm is approximately 49.03mm.', additional_kwargs={}),
HumanMessage(content='can you calculate the circumference
of a circle that has a radius of 7.81mm', additional_kwargs={}),
AIMessage(content='The circumference of a circle with
a radius
of 7.81mm is approximately 49.07mm.', additional_kwargs={}),
HumanMessage(content='If I have a triangle with two
sides of
length 51cm and 34cm, what is the length of the hypotenuse?',
additional_kwargs={}),
AIMessage(content='The length of the hypotenuse is
approximately 61.29cm.', additional_kwargs={})],
'output': 'There is a monkey that is sitting in a tree.'}
让我们再试试:
img_url = "https://images.unsplash.com/photo-1502680390469-be75c86b636f?
ixlib=rb4.0.3
&ixid=MnwxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8
&auto=format&fit=crop&w=2370&q=80"
agent(f"what is in this image?\n{img_url}")
[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m{
"action": "Image captioner",
"action_input":
"https://images.unsplash.com/photo-1502680390469-be75c86b636f
?ixlib=rb-4.0.3&ixid=MnwxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8
&auto=format&fit=crop&w=2370&q=80"
}[0m
Observation: [36;1m[1;3msurfer riding a wave in the ocean on a clear day[0m
Thought:[32;1m[1;3m{
"action": "Final Answer",
"action_input": "The image shows a surfer riding a wave in the ocean
on a clear day."
}[0m
[1m> Finished chain.[0m
{'input': 'what is in this image?\nhttps://images.unsplash.com/
photo-1502680390469-be75c86b636f?ixlib=rb-4.0.3&ixid=MnwxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8
&auto=format&fit=crop&w=2370&q=80',
'chat_history':
[HumanMessage(content='can you calculate the
circumference of a circle that has a radius of 7.81mm',
additional_kwargs={}),
AIMessage(content='The circumference of a circle with a radius
of 7.81mm is approximately 49.03mm.', additional_kwargs={}),
HumanMessage(content='can you calculate the circumference
of a circle that has a radius of 7.81mm', additional_kwargs={}),
AIMessage(content='The circumference of a circle with a radius
of 7.81mm is approximately 49.07mm.', additional_kwargs={}),
HumanMessage(content='If I have a triangle with two sides of
length 51cm and 34cm, what is the length of the hypotenuse?',
additional_kwargs={}),
AIMessage(content='The length of the hypotenuse is approximately
61.29cm.', additional_kwargs={}),
HumanMessage(content='What does this image show?\n
https://images.unsplash.com/photo-1616128417859-3a984dd35f02
?ixlib=rb-4.0.3
&ixid=MnwxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8
&auto=format&fit=crop&w=2372&q=80', additional_kwargs={}),
AIMessage(content='There is a monkey that is sitting in a tree.',
additional_kwargs={})],
'output': 'The image shows a surfer riding a
wave in the ocean on a clear day.'}
这是另一个准确的描述。让我们尝试更具挑战性的方法:
我们已经探索了如何为LangChain代理构建自定义工具。这一功能使我们能够大规模扩展大型语言模型的可能性。
在我们的简单示例中,我们看到了LangChain工具的典型结构,然后将专家模型添加为工具,由我们的代理作为这些模型的控制器。
当然,我们能做的远比我们在这里展示的要多得多。正如HuggingGPT所展示的那样,工具可以用于与无穷无尽的功能和服务列表集成,也可以用于与一组专家模型进行通信。
我们经常可以使用LangChain的默认工具来运行SQL查询、执行计算或进行矢量搜索。然而,当这些默认工具不能满足我们的需求时,我们现在知道如何构建自己的工具了。
工具书类
[1] Y. Shen, K. Song, et al., HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in HuggingFace (2023)
- 登录 发表评论
- 49 次浏览
最新内容
- 6 hours 7 minutes ago
- 6 hours 9 minutes ago
- 7 hours 9 minutes ago
- 7 hours ago
- 7 hours ago
- 7 hours 17 minutes ago
- 7 hours 18 minutes ago
- 7 hours 20 minutes ago
- 7 hours 21 minutes ago
- 7 hours ago