Computer vision

Instance Segmentation

Instance segmentation is a computer vision task that assigns a pixel-level mask to each distinct object instance in an image, while also classifying it. Unlike semantic segmentation, which labels all pixels of the same class with one color (e.g., all cars as 'car'), instance segmentation separates overlapping or adjacent objects of the same class into individual masks. Operators encounter this in models like YOLOv8-seg or SAM (Segment Anything Model). The output is a list of masks, each with a class label and confidence score. VRAM matters because high-resolution images and many instances require more memory for mask decoding.

Deeper dive

Instance segmentation combines object detection and semantic segmentation. First, a model detects bounding boxes and class labels for each object. Then, within each box, a segmentation head predicts a binary mask for that instance. Common architectures include Mask R-CNN (two-stage: region proposal + mask head) and YOLOv8-seg (single-stage, faster). SAM uses a prompt-based approach: given a point or box, it segments the corresponding object. For operators, inference speed varies: YOLOv8-seg runs at ~30 FPS on an RTX 3060 for 640x640 images, while SAM (ViT-H) needs ~1-2 seconds per image on the same GPU. Quantization (e.g., FP16 or INT8) reduces VRAM usage and speeds up inference, but may slightly reduce mask accuracy.

Practical example

On an RTX 3060 (12 GB VRAM), running YOLOv8n-seg (nano) on a 640x640 image uses ~1.5 GB VRAM and processes ~100 images per second. Running SAM (ViT-B) on the same image uses ~3 GB and takes ~0.5 seconds per image. For a 4K image, SAM may need 8+ GB and take several seconds. Operators often resize inputs to balance accuracy and speed.

Workflow example

In a Python script using Ultralytics YOLO: from ultralytics import YOLO; model = YOLO('yolov8n-seg.pt'); results = model('image.jpg') returns a list of masks. Each mask is a binary array. To visualize, use results[0].plot(). For SAM, use from segment_anything import sam_model_registry, SamPredictor; predictor = SamPredictor(sam_model_registry['vit_b'](checkpoint='sam_vit_b_01ec64.pth')); predictor.set_image(image); masks, _, _ = predictor.predict(point_coords=[[500, 375]], point_labels=[1]).

Reviewed by Fredoline Eruo. See our editorial policy.

Buyer guides

When it doesn't work