category
🌐 Browser-use is the easiest way to connect your AI agents with the browser.
💡 See what others are building and share your projects in our Discord! Want Swag? Check out our Merch store.
🌤️ Skip the setup - try our hosted version for instant browser automation! Try the cloud ☁︎.
With pip (Python>=3.11):
pip install browser-use
Install the browser:
playwright install chromium --with-deps --no-shell
Spin up your agent:
import asyncio
from dotenv import load_dotenv
load_dotenv()
from browser_use import Agent
from browser_use.llm import ChatOpenAI
async def main():
agent = Agent(
task="Compare the price of gpt-4o and DeepSeek-V3",
llm=ChatOpenAI(model="o4-mini", temperature=1.0),
)
await agent.run()
asyncio.run(main())
Add your API keys for the provider you want to use to your .env
file.
OPENAI_API_KEY=
ANTHROPIC_API_KEY=
AZURE_OPENAI_ENDPOINT=
AZURE_OPENAI_KEY=
GOOGLE_API_KEY=
DEEPSEEK_API_KEY=
GROK_API_KEY=
NOVITA_API_KEY=
For other settings, models, and more, check out the documentation 📕.
You can test browser-use using its Web UI or Desktop App.
You can also use our browser-use
interactive CLI (similar to claude
code):
pip install "browser-use[cli]"
browser-use
Browser-use supports the Model Context Protocol (MCP), enabling integration with Claude Desktop and other MCP-compatible clients.
Add browser-use to your Claude Desktop configuration:
{
"mcpServers": {
"browser-use": {
"command": "uvx",
"args": ["browser-use", "--mcp"],
"env": {
"OPENAI_API_KEY": "sk-..."
}
}
}
}
This gives Claude Desktop access to browser automation tools for web scraping, form filling, and more.
Browser-use agents can connect to multiple external MCP servers to extend their capabilities:
import asyncio
from browser_use import Agent, Controller
from browser_use.mcp.client import MCPClient
from browser_use.llm import ChatOpenAI
async def main():
# Initialize controller
controller = Controller()
# Connect to multiple MCP servers
filesystem_client = MCPClient(
server_name="filesystem",
command="npx",
args=["-y", "@modelcontextprotocol/server-filesystem", "/Users/me/documents"]
)
github_client = MCPClient(
server_name="github",
command="npx",
args=["-y", "@modelcontextprotocol/server-github"],
env={"GITHUB_TOKEN": "your-github-token"}
)
# Connect and register tools from both servers
await filesystem_client.connect()
await filesystem_client.register_to_controller(controller)
await github_client.connect()
await github_client.register_to_controller(controller)
# Create agent with MCP-enabled controller
agent = Agent(
task="Find the latest report.pdf in my documents and create a GitHub issue about it",
llm=ChatOpenAI(model="gpt-4o"),
controller=controller # Controller has tools from both MCP servers
)
# Run the agent
await agent.run()
# Cleanup
await filesystem_client.disconnect()
await github_client.disconnect()
asyncio.run(main())
See the MCP documentation for more details.
Demos
远景
告诉你的电脑该做什么,它就会完成。
路线图
代理人
- 提高代理内存以处理+100个步骤
- 增强规划能力(加载网站特定上下文)
- 减少令牌消耗(系统提示、DOM状态)
DOM提取
- 启用对所有可能的UI元素的检测
- 改进UI元素的状态表示,以便所有LLM都能理解页面上的内容
工作流
- 让用户记录一个工作流,我们可以使用浏览器作为回退来重新运行该工作流
- 即使页面发生变化,也能重新运行工作流
用户体验
- 为教程执行、工作申请、QA测试、社交媒体等创建各种模板,用户只需复制和粘贴即可。
- 改进文档
- 让它更快
并行化
- 人类的工作是按顺序进行的。如果我们能够并行处理类似的任务,浏览器代理的真正力量就会成为现实。例如,如果你想查找100家公司的联系信息,这一切都可以并行完成,并报告给主代理,主代理处理结果并再次启动并行子任务。
- 登录 发表评论
- 2 次浏览
Tags
最新内容
- 1 day 7 hours ago
- 1 day 10 hours ago
- 1 day 11 hours ago
- 1 week 4 days ago
- 1 week 4 days ago
- 1 week 4 days ago
- 1 month 1 week ago
- 1 month 2 weeks ago
- 1 month 2 weeks ago
- 1 month 2 weeks ago