语言 Chinese, Simplified

SEO Title

Browser Use :Enable AI to control your browser

Quick start

With pip (Python>=3.11):

pip install browser-use

Install the browser:

playwright install chromium --with-deps --no-shell

Spin up your agent:

import asyncio
from dotenv import load_dotenv
load_dotenv()
from browser_use import Agent
from browser_use.llm import ChatOpenAI

async def main():
    agent = Agent(
        task="Compare the price of gpt-4o and DeepSeek-V3",
        llm=ChatOpenAI(model="o4-mini", temperature=1.0),
    )
    await agent.run()

asyncio.run(main())

Add your API keys for the provider you want to use to your .env file.

OPENAI_API_KEY=
ANTHROPIC_API_KEY=
AZURE_OPENAI_ENDPOINT=
AZURE_OPENAI_KEY=
GOOGLE_API_KEY=
DEEPSEEK_API_KEY=
GROK_API_KEY=
NOVITA_API_KEY=

For other settings, models, and more, check out the documentation 📕.

Test with UI

You can test browser-use using its Web UI or Desktop App.

Test with an interactive CLI

You can also use our browser-use interactive CLI (similar to claude code):

pip install "browser-use[cli]"
browser-use

MCP Integration

Browser-use supports the Model Context Protocol (MCP), enabling integration with Claude Desktop and other MCP-compatible clients.

Use as MCP Server with Claude Desktop

Add browser-use to your Claude Desktop configuration:

{
  "mcpServers": {
    "browser-use": {
      "command": "uvx",
      "args": ["browser-use", "--mcp"],
      "env": {
        "OPENAI_API_KEY": "sk-..."
      }
    }
  }
}

This gives Claude Desktop access to browser automation tools for web scraping, form filling, and more.

Connect External MCP Servers to Browser-Use Agent

Browser-use agents can connect to multiple external MCP servers to extend their capabilities:

import asyncio
from browser_use import Agent, Controller
from browser_use.mcp.client import MCPClient
from browser_use.llm import ChatOpenAI

async def main():
    # Initialize controller
    controller = Controller()
    
    # Connect to multiple MCP servers
    filesystem_client = MCPClient(
        server_name="filesystem",
        command="npx",
        args=["-y", "@modelcontextprotocol/server-filesystem", "/Users/me/documents"]
    )
    
    github_client = MCPClient(
        server_name="github", 
        command="npx",
        args=["-y", "@modelcontextprotocol/server-github"],
        env={"GITHUB_TOKEN": "your-github-token"}
    )
    
    # Connect and register tools from both servers
    await filesystem_client.connect()
    await filesystem_client.register_to_controller(controller)
    
    await github_client.connect()
    await github_client.register_to_controller(controller)
    
    # Create agent with MCP-enabled controller
    agent = Agent(
        task="Find the latest report.pdf in my documents and create a GitHub issue about it",
        llm=ChatOpenAI(model="gpt-4o"),
        controller=controller  # Controller has tools from both MCP servers
    )
    
    # Run the agent
    await agent.run()
    
    # Cleanup
    await filesystem_client.disconnect()
    await github_client.disconnect()

asyncio.run(main())

See the MCP documentation for more details.

Demos

远景

告诉你的电脑该做什么，它就会完成。

路线图

代理人

提高代理内存以处理+100个步骤
增强规划能力（加载网站特定上下文）
减少令牌消耗（系统提示、DOM状态）

DOM提取

启用对所有可能的UI元素的检测
改进UI元素的状态表示，以便所有LLM都能理解页面上的内容

工作流

让用户记录一个工作流，我们可以使用浏览器作为回退来重新运行该工作流
即使页面发生变化，也能重新运行工作流

用户体验

为教程执行、工作申请、QA测试、社交媒体等创建各种模板，用户只需复制和粘贴即可。
改进文档
让它更快

并行化

人类的工作是按顺序进行的。如果我们能够并行处理类似的任务，浏览器代理的真正力量就会成为现实。例如，如果你想查找100家公司的联系信息，这一切都可以并行完成，并报告给主代理，主代理处理结果并再次启动并行子任务。

本文地址

https://architect.pub

登录发表评论
13 次浏览

发布日期

星期四, 七月 10, 2025 - 13:57

最后修改

星期四, 七月 10, 2025 - 13:57

热门内容

今日:

总体:

最近浏览：