RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
  1. >
  2. Home
  3. /Learn
  4. /Courses
  5. /Advanced Multi-Modal Systems
  6. /Ch. 3
Advanced Multi-Modal Systems

03. Frame Sampling Strategies

Chapter 3 of 24 · 20 min
KEY INSIGHT

Frame sampling is a design choice with downstream consequences. Uniform sampling fails for variable-pace videos. Scene-aware methods work better for movies. Adaptive importance sampling handles heterogeneous content but requires extra computation.

How you sample frames from video dramatically affects what your model sees. The right strategy depends on your task, video length, and computational budget.

Uniform Sampling

The simplest approach: grab frames at fixed intervals. This preserves temporal coverage but may miss fast action.

def uniform_sample(video_path, fps_target=1):
    container = av.open(video_path)
    video_stream = container.streams.video[0]
    
    video_fps = float(video_stream.average_rate)
    total_frames = video_stream.duration
    
    # Calculate frame interval
    frame_interval = int(video_fps / fps_target)
    
    frames = []
    for i, frame in enumerate(container.decode(video=0)):
        if i % frame_interval == 0:
            frames.append(frame.to_ndarray(format="rgb24"))
    
    return np.stack(frames)  # (T, H, W, 3)

Scene-Aware Sampling

Videos contain shots—continuous sequences from a single camera. Uniform sampling may oversample slow scenes and undersample fast cuts.

def scene_detect_sample(video_path, frames_per_scene=2):
    """Sample based on scene cuts using histogram comparison."""
    import cv2
    
    cap = cv2.VideoCapture(video_path)
    frames = []
    prev_hist = None
    
    while cap.isOpened():
        ret, frame = cap.read()
        if not ret:
            break
        
        hist = cv2.calcHist([frame], [0], None, [256], [0, 256])
        hist = cv2.normalize(hist, hist).flatten()
        
        if prev_hist is not None:
            # Bhattacharyya distance for histogram similarity
            similarity = cv2.compareHist(prev_hist, hist, cv2.HISTCMP_BHATTACHARYYA)
            
            # Scene cut detected (low similarity)
            if similarity > 0.4:
                frames.append(frame)
                frames.append(frame)  # Add second frame from new scene
            elif len(frames) % frames_per_scene == 0:
                frames.append(frame)
        
        prev_hist = hist
    
    return np.array(frames)

Adaptive Sampling with Importance Weighting

Some frames matter more than others. Action-heavy moments deserve more frames. Dense sampling followed by learned importance scoring addresses this.

def importance_weighted_sample(frames, model, max_frames=16):
    """Use a lightweight model to score frame importance."""
    # Extract features with frozen encoder
    with torch.no_grad():
        features = model.forward_features(frames)
    
    # Score by variance (high variance = more action)
    frame_importance = features.var(dim=(1, 2)).mean(dim=-1)
    
    # Select top-k frames
    _, top_indices = torch.topk(frame_importance, min(max_frames, len(frames)))
    top_indices = sorted(top_indices.tolist())
    
    return frames[top_indices]
EXERCISE

Profile the memory usage of loading a 10-minute video at 30 FPS (9000 frames) versus sampling 32 uniform frames. Calculate the reduction factor and identify where memory savings come from.

← Chapter 2
Video Understanding
Chapter 4 →
Temporal Reasoning