Browser Agents
AI agents that navigate and interact with web browsers. Browser-use, Playwright-based agents, BrowserBase pattern.
Setup walkthrough
pip install browser-use(Browser-Use — open-source browser agent framework).- Install Ollama →
ollama pull qwen2.5-vl:7b(~5 GB — vision-language model for seeing web pages). - Python script:
from browser_use import Agent
import asyncio
async def main():
agent = Agent(
task="Go to wikipedia.org, search for 'artificial intelligence', and extract the first paragraph of the article.",
llm=ChatOllama(model="qwen2.5-vl:7b"),
)
result = await agent.run()
print(result)
asyncio.run(main())
- The agent opens a browser (Chromium via Playwright), navigates to Wikipedia, types the search, clicks the result, extracts text, and returns it. First run in 20-60 seconds.
- For complex multi-step tasks: "Log into my email, find the latest invoice from Amazon, download it, and save to
~/invoices/." The agent handles login, navigation, dropdowns, file downloads. - The agent sees screenshots (VLM) + reads DOM elements (accessibility tree). The VLM decides "what to click" by analyzing the screenshot.
The cheap setup
Used RTX 3060 12 GB (~$200-250, see /hardware/rtx-3060-12gb). Runs Browser-Use with Qwen2-VL 7B at ~20-40 seconds per agent step (see → decide → act). A 5-step task (search, click, scroll, click, extract) takes 2-4 minutes. For automation of daily web tasks (form filling, data extraction, monitoring): $400 handles 10-30 tasks/day. Pair with Ryzen 5 5600 + 32 GB DDR4 + 512 GB NVMe. Total: ~$390-440. The bottleneck is VLM inference speed (5-10 seconds per screenshot) and reasoning quality.
The serious setup
Used RTX 3090 24 GB ($700-900, see /hardware/rtx-3090). Runs Browser-Use with Qwen2-VL 72B at ~30-60 seconds per agent step — the 72B handles complex UIs (enterprise SaaS dashboards, multi-step forms, CAPTCHA workarounds) that confuse the 7B. For production browser automation (web scraping at scale, automated testing): Qwen2-VL 7B on RTX 4090 ($2,000, see /hardware/rtx-4090) achieves 10-15 agent steps per minute. Total: ~$1,500-2,500. Browser agents are a VLM throughput problem — faster screenshot analysis = faster agent.
Common beginner mistake
The mistake: Running a browser agent on your personal Chrome profile (with saved passwords, cookies, banking sessions) — giving the AI access to your entire digital life. Why it fails: The agent can click anything. It can navigate to your email → forward sensitive messages → delete evidence. It can access your bank → initiate transfers. Even if the prompt seems benign ("check my Amazon orders"), the agent might misclick into account settings, change your password, or order 100 copies of a book. The fix: Always use a dedicated browser profile. Create a separate Chrome/Chromium profile with only the logins the agent needs. Use incognito mode + manual login per session for sensitive tasks. Never give a browser agent access to your primary browser with saved passwords and active banking/email sessions. The agent is a program executing LLM decisions — it has no judgment about what's safe to click. Sandbox it.
Recommended setup for browser agents
Browse all tools for runtimes that fit this workload.
Reality check
Local AI workloads have real hardware constraints that vary by task type. VRAM ceiling decides what model fits; bandwidth decides decode speed; compute decides prefill speed. Pick the GPU tier that fits your actual workload, not the spec sheet.
Common mistakes
- Buying for spec-sheet VRAM without modeling KV cache + activation overhead
- Underestimating quantization quality loss below Q4
- Skipping flash-attention support (real perf gap on long context)
- Ignoring sustained-load thermals (laptops thermal-throttle within 30 min)
What breaks first
The errors most operators hit when running browser agents locally. Each links to a diagnose+fix walkthrough.
Before you buy
Verify your specific hardware can handle browser agents before committing money.
Agent workflows run multiple tool calls in sequence — sustained tok/s matters more than peak. The guides below frame the buyer decision.
- best GPU for AI agents — covers sustained-throughput vs peak, multi-tool-call latency, agent loop economics.
- best GPU for Qwen
- best GPU for Llama