YOLO
YOLO (You Only Look Once) is a family of real-time object detection models that process an entire image in a single forward pass, directly predicting bounding boxes and class probabilities. Unlike older two-stage detectors (e.g., R-CNN) that first propose regions and then classify, YOLO divides the image into a grid and predicts objects per cell. This makes it extremely fast, suitable for video or edge deployment. Operators encounter YOLO when they need low-latency detection on local hardware—YOLOv8, for instance, runs at 30+ FPS on an RTX 3060.
Deeper dive
YOLO was introduced by Joseph Redmon in 2015, evolving through several versions (v1–v5, v8, v9, v10). The core idea: treat detection as a regression problem. The model outputs a fixed-size tensor containing bounding box coordinates, confidence scores, and class probabilities for each grid cell. Modern YOLO variants (e.g., Ultralytics YOLOv8) use a CSPDarknet backbone, a PAN-FPN neck, and a decoupled head for classification and regression. They support various model sizes (nano, small, medium, large, xlarge) trading off speed vs. accuracy. Operators often quantize YOLO models to FP16 or INT8 for further speedups on consumer GPUs. YOLO is also used in tracking pipelines (e.g., BoT-SORT) and can be exported to ONNX or TensorRT for optimized inference.
Practical example
An operator running YOLOv8m on an RTX 3060 (12 GB VRAM) can process 640×640 images at ~50 FPS using the PyTorch model. After exporting to TensorRT with FP16, the same model reaches ~80 FPS. The model file is about 50 MB (FP16) and uses ~2 GB VRAM. For a 4K video stream, the operator might downscale frames to 640×640 to maintain real-time performance.
Workflow example
In a typical detection workflow, the operator runs yolo predict model=yolov8m.pt source=video.mp4 using the Ultralytics CLI. The model loads into VRAM, processes each frame, and outputs annotated frames with bounding boxes. If using llama.cpp or ONNX Runtime, the operator would first export via yolo export model=yolov8m.pt format=onnx then run inference with onnxruntime. For real-time webcam detection, yolo predict model=yolov8n.pt source=0 runs at 30+ FPS on a laptop GPU.
Reviewed by Fredoline Eruo. See our editorial policy.