跳转到主要内容
Chinese, Simplified

category

🌐 Browser-use is the easiest way to connect your AI agents with the browser.

💡 See what others are building and share your projects in our Discord! Want Swag? Check out our Merch store.

🌤️ Skip the setup - try our hosted version for instant browser automation! Try the cloud ☁︎.

Quick start

With pip (Python>=3.11):

pip install browser-use

Install the browser:

playwright install chromium --with-deps --no-shell

Spin up your agent:

import asyncio
from dotenv import load_dotenv
load_dotenv()
from browser_use import Agent
from browser_use.llm import ChatOpenAI

async def main():
    agent = Agent(
        task="Compare the price of gpt-4o and DeepSeek-V3",
        llm=ChatOpenAI(model="o4-mini", temperature=1.0),
    )
    await agent.run()

asyncio.run(main())

Add your API keys for the provider you want to use to your .env file.

OPENAI_API_KEY=
ANTHROPIC_API_KEY=
AZURE_OPENAI_ENDPOINT=
AZURE_OPENAI_KEY=
GOOGLE_API_KEY=
DEEPSEEK_API_KEY=
GROK_API_KEY=
NOVITA_API_KEY=

For other settings, models, and more, check out the documentation 📕.

Test with UI

You can test browser-use using its Web UI or Desktop App.

Test with an interactive CLI

You can also use our browser-use interactive CLI (similar to claude code):

pip install "browser-use[cli]"
browser-use

MCP Integration

Browser-use supports the Model Context Protocol (MCP), enabling integration with Claude Desktop and other MCP-compatible clients.

Use as MCP Server with Claude Desktop

Add browser-use to your Claude Desktop configuration:

{
  "mcpServers": {
    "browser-use": {
      "command": "uvx",
      "args": ["browser-use", "--mcp"],
      "env": {
        "OPENAI_API_KEY": "sk-..."
      }
    }
  }
}

This gives Claude Desktop access to browser automation tools for web scraping, form filling, and more.

Connect External MCP Servers to Browser-Use Agent

Browser-use agents can connect to multiple external MCP servers to extend their capabilities:

import asyncio
from browser_use import Agent, Controller
from browser_use.mcp.client import MCPClient
from browser_use.llm import ChatOpenAI

async def main():
    # Initialize controller
    controller = Controller()
    
    # Connect to multiple MCP servers
    filesystem_client = MCPClient(
        server_name="filesystem",
        command="npx",
        args=["-y", "@modelcontextprotocol/server-filesystem", "/Users/me/documents"]
    )
    
    github_client = MCPClient(
        server_name="github", 
        command="npx",
        args=["-y", "@modelcontextprotocol/server-github"],
        env={"GITHUB_TOKEN": "your-github-token"}
    )
    
    # Connect and register tools from both servers
    await filesystem_client.connect()
    await filesystem_client.register_to_controller(controller)
    
    await github_client.connect()
    await github_client.register_to_controller(controller)
    
    # Create agent with MCP-enabled controller
    agent = Agent(
        task="Find the latest report.pdf in my documents and create a GitHub issue about it",
        llm=ChatOpenAI(model="gpt-4o"),
        controller=controller  # Controller has tools from both MCP servers
    )
    
    # Run the agent
    await agent.run()
    
    # Cleanup
    await filesystem_client.disconnect()
    await github_client.disconnect()

asyncio.run(main())

See the MCP documentation for more details.

Demos


远景


  • 告诉你的电脑该做什么,它就会完成。

路线图


代理人

  • 提高代理内存以处理+100个步骤
  • 增强规划能力(加载网站特定上下文)
  • 减少令牌消耗(系统提示、DOM状态)
     

DOM提取

  • 启用对所有可能的UI元素的检测
  • 改进UI元素的状态表示,以便所有LLM都能理解页面上的内容

工作流

  • 让用户记录一个工作流,我们可以使用浏览器作为回退来重新运行该工作流
  • 即使页面发生变化,也能重新运行工作流

用户体验

  • 为教程执行、工作申请、QA测试、社交媒体等创建各种模板,用户只需复制和粘贴即可。
  • 改进文档
  • 让它更快

并行化

  • 人类的工作是按顺序进行的。如果我们能够并行处理类似的任务,浏览器代理的真正力量就会成为现实。例如,如果你想查找100家公司的联系信息,这一切都可以并行完成,并报告给主代理,主代理处理结果并再次启动并行子任务。
本文地址
最后修改
星期四, 七月 10, 2025 - 13:57
Article