【智能体】开始在亚马逊Bedrock 智能体中使用 computer

语言 Chinese, Simplified

SEO Title

Getting started with computer use in Amazon Bedrock Agents

解决方案概述

计算机使用工作流示例包括以下步骤：

创建一个Amazon Bedrock代理，并使用自然语言描述代理应该做什么以及它应该如何与用户交互，例如：“您是能够使用Firefox网络浏览器进行网络搜索的计算机用户代理。”
使用CreateAgentActionGroup API将Amazon Bedrock代理支持的计算机使用操作组添加到代理中。
使用需要计算机使用工具的用户查询调用代理，例如，“什么是亚马逊基岩，你能搜索网络吗？”
Amazon Bedrock代理使用其可支配的工具定义，并决定使用计算机操作组单击环境的屏幕截图。使用Amazon Bedrock Agents的返回控制功能，代理会使用它想要执行的一个或多个工具进行响应。与亚马逊基岩代理一起使用计算机需要返回控制能力。
工作流解析代理响应，并在沙盒环境中执行返回的工具。产出物被送回亚马逊基岩代理进行进一步加工。
亚马逊基岩特工继续使用其可用的工具进行回应，直到任务完成。

您可以按照GitHub存储库中的说明，使用AWS云开发工具包（AWS CDK）在us-west-2 AWS区域中重新创建此示例。此演示使用AWS Fargate在美国西部2区的两个可用区部署容器化应用程序。基础设施在虚拟私有云（VPC）内运行，VPC包含每个可用区中的公共子网，互联网网关提供外部连接。该架构得到了基本支持服务的补充，包括用于安全的AWS密钥管理服务（AWS KMS）和用于监控的Amazon CloudWatch，创建了一个有弹性的无服务器容器环境，减轻了管理底层基础设施的需要，同时保持了强大的安全性和高可用性。

下图说明了解决方案架构。

At the core of our solution are two Fargate containers managed through Amazon Elastic Container Service (Amazon ECS), each protected by its own security group. The first is our orchestration container, which not only handles the communication between Amazon Bedrock Agents and end users, but also orchestrates the workflow that enables tool execution. The second is our environment container, which serves as a secure sandbox where the Amazon Bedrock agent can safely run its computer use tools. The environment container has limited access to the rest of the ecosystem and the internet. We utilize service discovery to connect Amazon ECS services with DNS names.

The orchestration container includes the following components:

Streamlit UI – The Streamlit UI that facilitates interaction between the end user and computer use agent
Return control loop – The workflow responsible for parsing the tools that the agent wants to execute and returning the output of these tools

The environment container includes the following components:

UI and pre-installed applications – A lightweight UI and pre-installed Linux applications like Firefox that can be used to complete the user’s tasks
Tool implementation – Code that can execute computer use tool in the environment like “screenshot” or “double-click”
Quart (RESTful) JSON API – An orchestration container that uses Quart to execute tools in a sandbox environment

The following diagram illustrates these components.

Prerequisites

AWS Command Line Interface (CLI), follow instructions here. Make sure to setup credentials, follow instructions here.
Require Python 3.11 or later.
Require Node.js 14.15.0 or later.
AWS CDK CLI, follow instructions here.
Enable model access for Anthropic’s Claude Sonnet 3.5 V2 and for Anthropic’s Claude Sonnet 3.7.
Boto3 version >= 1.37.10.

Create an Amazon Bedrock agent with computer use

You can use the following code sample to create a simple Amazon Bedrock agent with computer, bash, and text editor action groups. It is crucial to provide a compatible action group signature when using Anthropic’s Claude 3.5 Sonnet V2 and Anthropic’s Claude 3.7 Sonnet as highlighted here.

Model	Action Group Signature
Anthropic’s Claude 3.5 Sonnet V2	computer_20241022 text_editor_20241022 bash_20241022
Anthropic’s Claude 3.7 Sonnet	computer_20250124 text_editor_20250124 bash_20250124

import boto3
import time

# Step 1: Create the bedrock agent client

bedrock_agent = boto3.client("bedrock-agent", region_name="us-west-2")

# Step 2: Create an agent

create_agent_response = create_agent_response = bedrock_agent.create_agent(
        agentResourceRoleArn=agent_role_arn, # Amazon Bedrock Agent execution role
        agentName="computeruse",
        description="""Example agent for computer use. 
				This agent should only operate on 
				Sandbox environments with limited privileges.""",
        foundationModel="us.anthropic.claude-3-7-sonnet-20250219-v1:0",      
		instruction="""You are computer use agent capable of using Firefox 
                 web browser for web search.""",
)

time.sleep(30) # wait for agent to be created

# Step 3.1: Create and attach computer action group

bedrock_agent.create_agent_action_group(
    actionGroupName="ComputerActionGroup",
    actionGroupState="ENABLED",
    agentId=create_agent_response["agent"]["agentId"],
    agentVersion="DRAFT",
    parentActionGroupSignature="ANTHROPIC.Computer",
    parentActionGroupSignatureParams={
        "type": "computer_20250124",
        "display_height_px": "768",
        "display_width_px": "1024",
        "display_number": "1",
    },
)

# Step 3.2: Create and attach bash action group

bedrock_agent.create_agent_action_group(
    actionGroupName="BashActionGroup",
    actionGroupState="ENABLED",
    agentId=create_agent_response["agent"]["agentId"],
    agentVersion="DRAFT",
    parentActionGroupSignature="ANTHROPIC.Bash",
    parentActionGroupSignatureParams={
        "type": "bash_20250124",
    },
)

# Step 3.3: Create and attach text editor action group

bedrock_agent.create_agent_action_group(
    actionGroupName="TextEditorActionGroup",
    actionGroupState="ENABLED",
    agentId=create_agent_response["agent"]["agentId"],
    agentVersion="DRAFT",
    parentActionGroupSignature="ANTHROPIC.TextEditor",
    parentActionGroupSignatureParams={
        "type": "text_editor_20250124",
    },
)

# Step 3.4 Create Weather Action Group

bedrock_agent.create_agent_action_group(
        actionGroupName="WeatherActionGroup",
        agentId=create_agent_response["agent"]["agentId"],
        agentVersion="DRAFT",
        actionGroupExecutor = {
            'customControl': 'RETURN_CONTROL',
        },
        functionSchema = {
            'functions': [
                {
                    "name": "get_current_weather",
                    "description": "Get the current weather in a given location.",
                    "parameters": {
                        "location": {
                            "type": "string",
                            "description": "The city, e.g., San Francisco",
                            "required": True,
                        },
                        "unit": {
                            "type": "string",
                            "description": 'The unit to use, e.g., 
									fahrenheit or celsius. Defaults to "fahrenheit"',
                            "required": False,
                        },
                    },
                    "requireConfirmation": "DISABLED",
                }
            ]
        },
)
time.sleep(10)
# Step 4: Prepare agent

bedrock_agent.prepare_agent(agentId=create_agent_response["agent"]["agentId"])

Example use case

In this post, we demonstrate an example where we use Amazon Bedrock Agents with the computer use capability to complete a web form. In the example, the computer use agent can also switch Firefox tabs to interact with a customer relationship management (CRM) agent to get the required information to complete the form. Although this example uses a sample CRM application as the system of record, the same approach works with Salesforce, SAP, Workday, or other systems of record with the appropriate authentication frameworks in place.

In the demonstrated use case, you can observe how well the Amazon Bedrock agent performed with computer use tools. Our implementation completed the customer ID, customer name, and email by visually examining the excel data. However, for the overview, it decided to select the cell and copy the data, because the information wasn’t completely visible on the screen. Finally, the CRM agent was used to get additional information on the customer.

Best practices

The following are some ways you can improve the performance for your use case:

Implement Security Groups, Network Access Control Lists (NACLs), and Amazon Route 53 Resolver DNS Firewall domain lists to control access to the sandbox environment.
Apply AWS Identity and Access Management (IAM) and the principle of least privilege to assign limited permissions to the sandbox environment.
When providing the Amazon Bedrock agent with instructions, be concise and direct. Specify simple, well-defined tasks and provide explicit instructions for each step.
Understand computer use limitations as highlighted by Anthropic here.
Complement return of control with user confirmation to help safeguard your application from malicious prompt injections by requesting confirmation from your users before invoking a computer use tool.
Use multi-agent collaboration and computer use with Amazon Bedrock Agents to automate complex workflows.
Implement safeguards by filtering harmful multimodal content based on your responsible AI policies for your application by associating Amazon Bedrock Guardrails with your agent.

注意事项

计算机使用功能作为AWS服务条款中定义的测试版服务提供给您。它受您与AWS的协议和AWS服务条款以及适用的EULA模型的约束。计算机的使用带来了与标准API功能或聊天界面不同的独特风险。当使用计算机使用功能与互联网交互时，这些风险会加剧。为了尽量减少风险，请考虑采取以下预防措施：

在专用虚拟机或容器中以最小权限操作计算机使用功能，以尽量减少直接系统漏洞或事故
为了防止信息被盗，请避免让计算机使用API访问敏感帐户或数据
限制计算机使用API对所需域的互联网访问，以减少恶意内容的暴露
为了加强适当的监督，让一个人参与敏感任务（例如做出可能对现实世界产生有意义影响的决策）和任何需要肯定同意的事情（例如接受Cookie、执行金融交易或同意服务条款）

您允许Anthropic的Claude查看或访问的任何内容都可能覆盖指令，或导致模型出错或执行意外操作。采取适当的预防措施，例如将Anthropic的克劳德与敏感表面隔离开来，是至关重要的，包括避免与及时注射相关的风险。在启用或请求在您自己的产品中启用计算机使用功能所需的权限之前，请告知最终用户任何相关风险，并酌情获得他们的同意。

清理

使用完此解决方案后，请确保清理所有资源。按照提供的GitHub存储库中的说明进行操作。

结论

跨行业的组织面临着跨应用程序工作流的重大挑战，这些工作流传统上需要手动数据输入或复杂的自定义集成。Anthropic的计算机使用能力与亚马逊基岩代理的整合代表了应对这些挑战的变革性方法。

通过使用Amazon Bedrock Agent作为编排层，组织可以减轻每个应用程序对自定义API开发的需求，受益于企业部署所必需的全面日志记录和跟踪功能，并快速实施自动化解决方案。

当您开始探索使用Amazon Bedrock Agents使用计算机时，请考虑您组织中可以从这种方法中受益的工作流程。从发票处理到客户入职，从人力资源文档到合规报告，潜在的应用程序是巨大而具有变革性的。

我们很高兴看到您将如何使用具有计算机使用功能的亚马逊基岩代理，通过人工智能驱动的自动化安全地简化运营并重新构想业务流程。

资源

要了解更多信息，请参阅以下资源：

本文地址

https://architect.pub/getting-started-computer-use-amazon-bedrock-agents

登录发表评论
25 次浏览

发布日期

星期二, 九月 23, 2025 - 23:03

最后修改

星期三, 九月 24, 2025 - 08:43

【智能体】开始在亚马逊Bedrock 智能体中使用 computer use

category

解决方案概述

Prerequisites

Create an Amazon Bedrock agent with computer use

Example use case

Best practices

注意事项

清理

结论

资源

Tags

最新内容

Content type

Content type

Tags

Tags

category

category

Tags