Agents
web agents
browser automation ai

Browser Agents

AI agents that navigate and interact with web browsers. Browser-use, Playwright-based agents, BrowserBase pattern.

Setup walkthrough

  1. pip install browser-use (Browser-Use — open-source browser agent framework).
  2. Install Ollamaollama pull qwen2.5-vl:7b (~5 GB — vision-language model for seeing web pages).
  3. Python script:
from browser_use import Agent
import asyncio

async def main():
    agent = Agent(
        task="Go to wikipedia.org, search for 'artificial intelligence', and extract the first paragraph of the article.",
        llm=ChatOllama(model="qwen2.5-vl:7b"),
    )
    result = await agent.run()
    print(result)
asyncio.run(main())
  1. The agent opens a browser (Chromium via Playwright), navigates to Wikipedia, types the search, clicks the result, extracts text, and returns it. First run in 20-60 seconds.
  2. For complex multi-step tasks: "Log into my email, find the latest invoice from Amazon, download it, and save to ~/invoices/." The agent handles login, navigation, dropdowns, file downloads.
  3. The agent sees screenshots (VLM) + reads DOM elements (accessibility tree). The VLM decides "what to click" by analyzing the screenshot.

The cheap setup

Used RTX 3060 12 GB (~$200-250, see /hardware/rtx-3060-12gb). Runs Browser-Use with Qwen2-VL 7B at ~20-40 seconds per agent step (see → decide → act). A 5-step task (search, click, scroll, click, extract) takes 2-4 minutes. For automation of daily web tasks (form filling, data extraction, monitoring): $400 handles 10-30 tasks/day. Pair with Ryzen 5 5600 + 32 GB DDR4 + 512 GB NVMe. Total: ~$390-440. The bottleneck is VLM inference speed (5-10 seconds per screenshot) and reasoning quality.

The serious setup

Used RTX 3090 24 GB ($700-900, see /hardware/rtx-3090). Runs Browser-Use with Qwen2-VL 72B at ~30-60 seconds per agent step — the 72B handles complex UIs (enterprise SaaS dashboards, multi-step forms, CAPTCHA workarounds) that confuse the 7B. For production browser automation (web scraping at scale, automated testing): Qwen2-VL 7B on RTX 4090 ($2,000, see /hardware/rtx-4090) achieves 10-15 agent steps per minute. Total: ~$1,500-2,500. Browser agents are a VLM throughput problem — faster screenshot analysis = faster agent.

Common beginner mistake

The mistake: Running a browser agent on your personal Chrome profile (with saved passwords, cookies, banking sessions) — giving the AI access to your entire digital life. Why it fails: The agent can click anything. It can navigate to your email → forward sensitive messages → delete evidence. It can access your bank → initiate transfers. Even if the prompt seems benign ("check my Amazon orders"), the agent might misclick into account settings, change your password, or order 100 copies of a book. The fix: Always use a dedicated browser profile. Create a separate Chrome/Chromium profile with only the logins the agent needs. Use incognito mode + manual login per session for sensitive tasks. Never give a browser agent access to your primary browser with saved passwords and active banking/email sessions. The agent is a program executing LLM decisions — it has no judgment about what's safe to click. Sandbox it.

Recommended setup for browser agents

Recommended runtimes

Browse all tools for runtimes that fit this workload.

Reality check

Local AI workloads have real hardware constraints that vary by task type. VRAM ceiling decides what model fits; bandwidth decides decode speed; compute decides prefill speed. Pick the GPU tier that fits your actual workload, not the spec sheet.

Common mistakes

  • Buying for spec-sheet VRAM without modeling KV cache + activation overhead
  • Underestimating quantization quality loss below Q4
  • Skipping flash-attention support (real perf gap on long context)
  • Ignoring sustained-load thermals (laptops thermal-throttle within 30 min)

What breaks first

The errors most operators hit when running browser agents locally. Each links to a diagnose+fix walkthrough.

Before you buy

Verify your specific hardware can handle browser agents before committing money.

Hardware buying guidance for Browser Agents

Agent workflows run multiple tool calls in sequence — sustained tok/s matters more than peak. The guides below frame the buyer decision.

Specialized buyer guides
Updated 2026 roundup