Neural network architectures

Residual Network (ResNet)

A Residual Network (ResNet) is a neural network architecture that introduces skip connections (also called shortcut connections) that bypass one or more layers, allowing gradients to flow directly through the network during training. This design solves the vanishing gradient problem in deep networks, enabling training of models with hundreds or thousands of layers. In local AI, ResNet variants like ResNet-50 and ResNet-101 are commonly used as backbones for image classification, object detection, and feature extraction. Operators encounter ResNet when loading pretrained models for computer vision tasks, where the depth and residual connections affect inference speed and VRAM usage.

Deeper dive

ResNet was introduced by He et al. in 2015 and won the ImageNet challenge that year. The core idea is to fit a residual mapping F(x) = H(x) - x instead of the desired underlying mapping H(x) directly, where x is the input to a layer stack. The skip connection adds x to the output of the stacked layers, so the network learns the residual. This allows training very deep networks (e.g., ResNet-152) without degradation. Common variants include ResNet-18, ResNet-34, ResNet-50, ResNet-101, and ResNet-152, where the number indicates the layer count. ResNet-50 is widely used as a backbone in object detection models like Faster R-CNN and YOLO. For local AI, operators loading a ResNet model in PyTorch or ONNX format will see parameters ranging from ~11M (ResNet-18) to ~60M (ResNet-152). Inference latency scales with depth and input resolution, typically requiring 1-5 GB VRAM for batch size 1 at 224x224.

Practical example

An operator running image classification on an RTX 3060 (12 GB VRAM) can load ResNet-50 (25.6M parameters, ~98 MB in FP32) and achieve ~200-300 images/sec at batch size 1. Switching to ResNet-152 (60M parameters, ~230 MB) drops throughput to ~100-150 images/sec. For object detection, a Faster R-CNN with ResNet-50 backbone uses ~1.5 GB VRAM at 800x1333 input, leaving room for batch processing.

Workflow example

In Hugging Face Transformers, loading a ResNet model: from transformers import AutoImageProcessor, ResNetForImageClassification; model = ResNetForImageClassification.from_pretrained("microsoft/resnet-50"). In ONNX Runtime, operators export a ResNet to ONNX and run inference with ort.InferenceSession. In llama.cpp, ResNet is not directly supported; instead, operators use separate tools like OpenCV's DNN module or PyTorch for vision tasks. For MLX on Apple Silicon, mlx.core can load ResNet weights and run inference at ~50-100 images/sec on M1 Max.

Reviewed by Fredoline Eruo. See our editorial policy.

Buyer guides

When it doesn't work