13. Gradient Checkpointing
Chapter 13 of 24 · 20 min
Local verification checkpoint
Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.
EXERCISE
: Measure Memory Impact
import torch
from transformers import AutoModelForCausalLM
# Baseline without checkpointing
model = AutoModelForCausalLM.from_pretrained(
"gpt2", torch_dtype=torch.float16
)
print(f"Without checkpointing: {model.get_memory_footprint() / 1e9:.2f} GB")
# With checkpointing
model.gradient_checkpointing_enable()
print(f"With checkpointing: {model.get_memory_footprint() / 1e9:.2f} GB")
Run this on gpt2-medium and observe the memory footprint difference. The savings scale with model size.