RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
Glossary / Agents & agentic AI / Browser Agent
Agents & agentic AI

Browser Agent

A browser agent is an AI-driven program that controls a web browser to automate tasks like form filling, data extraction, or navigation. It uses a local or remote LLM to interpret instructions, generate actions (e.g., click, type, scroll), and process page content. Operators encounter browser agents when running frameworks like Playwright or Puppeteer paired with a local LLM via Ollama or vLLM. The agent typically captures screenshots or DOM snapshots, sends them to the LLM for reasoning, and executes the returned action. Latency depends on model size and hardware: a 7B Q4 model on an RTX 4090 yields ~2-5 seconds per action, while a 70B model may take 10-30 seconds.

Deeper dive

Browser agents extend LLM capabilities to interact with web interfaces. The workflow: 1) The agent loads a target URL, 2) captures the current page state (screenshot or HTML DOM), 3) sends it with a task prompt to the LLM, 4) the LLM outputs a structured action (e.g., click(button#submit)), 5) the agent executes it and repeats. Key challenges: visual grounding (matching LLM output to page elements), context window limits (long pages may exceed 4K-32K tokens), and latency (each action requires a full inference pass). Operators often use smaller quantized models (e.g., Qwen2.5 7B Q4) for speed, or larger models (e.g., Llama 3.1 70B) for complex reasoning. Tools like Browser-Use, Playwright, and LangChain integrate with local LLM backends.

Practical example

An operator runs a browser agent to automate logging into a web app. The agent uses Playwright with an Ollama-served Qwen2.5 7B Q4 model on an RTX 3090. The agent navigates to the login page, captures a screenshot, and the LLM outputs type('#username', 'admin') then type('#password', 'pass123') then click('#login-btn'). Each action takes ~3 seconds. If the page has a CAPTCHA, the agent may fail because the LLM cannot solve it without a vision model.

Workflow example

In a typical setup, an operator installs browser-use and ollama, pulls qwen2.5:7b, and runs python agent.py --task "book a flight on kayak.com". The script launches a Chromium window, iteratively captures screenshots, sends them to Ollama's API, and executes actions. The operator monitors tokens/sec in Ollama logs and adjusts context length (e.g., --num-ctx 8192) if the page is large. If the agent stalls, the operator may switch to a larger model or reduce task complexity.

Reviewed by Fredoline Eruo. See our editorial policy.

Buyer guides
  • Best GPU for local AI →
  • Best laptop for local AI →
  • Best Mac for local AI →
When it doesn't work
  • CUDA out of memory →
  • Ollama running slowly →
  • ROCm not detected →