RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
Glossary / Computer vision / YOLO
Computer vision

YOLO

YOLO (You Only Look Once) is a family of real-time object detection models that process an entire image in a single forward pass, directly predicting bounding boxes and class probabilities. Unlike older two-stage detectors (e.g., R-CNN) that first propose regions and then classify, YOLO divides the image into a grid and predicts objects per cell. This makes it extremely fast, suitable for video or edge deployment. Operators encounter YOLO when they need low-latency detection on local hardware—YOLOv8, for instance, runs at 30+ FPS on an RTX 3060.

Deeper dive

YOLO was introduced by Joseph Redmon in 2015, evolving through several versions (v1–v5, v8, v9, v10). The core idea: treat detection as a regression problem. The model outputs a fixed-size tensor containing bounding box coordinates, confidence scores, and class probabilities for each grid cell. Modern YOLO variants (e.g., Ultralytics YOLOv8) use a CSPDarknet backbone, a PAN-FPN neck, and a decoupled head for classification and regression. They support various model sizes (nano, small, medium, large, xlarge) trading off speed vs. accuracy. Operators often quantize YOLO models to FP16 or INT8 for further speedups on consumer GPUs. YOLO is also used in tracking pipelines (e.g., BoT-SORT) and can be exported to ONNX or TensorRT for optimized inference.

Practical example

An operator running YOLOv8m on an RTX 3060 (12 GB VRAM) can process 640×640 images at ~50 FPS using the PyTorch model. After exporting to TensorRT with FP16, the same model reaches ~80 FPS. The model file is about 50 MB (FP16) and uses ~2 GB VRAM. For a 4K video stream, the operator might downscale frames to 640×640 to maintain real-time performance.

Workflow example

In a typical detection workflow, the operator runs yolo predict model=yolov8m.pt source=video.mp4 using the Ultralytics CLI. The model loads into VRAM, processes each frame, and outputs annotated frames with bounding boxes. If using llama.cpp or ONNX Runtime, the operator would first export via yolo export model=yolov8m.pt format=onnx then run inference with onnxruntime. For real-time webcam detection, yolo predict model=yolov8n.pt source=0 runs at 30+ FPS on a laptop GPU.

Reviewed by Fredoline Eruo. See our editorial policy.

Buyer guides
  • Best GPU for local AI →
  • Best laptop for local AI →
  • Best Mac for local AI →
When it doesn't work
  • CUDA out of memory →
  • Ollama running slowly →
  • ROCm not detected →