RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
  1. >
  2. Home
  3. /Learn
  4. /Courses
  5. /Model Compression
  6. /Ch. 5
Model Compression

05. Movement Pruning

Chapter 5 of 18 · 15 min
KEY INSIGHT

Movement pruning removes weights that remain small throughout training, identifying parameters whose contribution decreases during optimization rather than at a static checkpoint. Standard magnitude pruning evaluates weights at a single checkpoint—typically after training completes. Movement pruning tracks weight magnitudes across training, identifying weights that begin small and stay small. These weights never contribute meaningfully to the network's learned function. The movement score measures how consistently a weight stays small. Weights that spike during training and return to small values demonstrate dynamic contribution. Weights that remain small throughout indicate persistent dormancy. Movement pruning removes the latter, preserving weights with time-varying importance. ```python class MovementPruner: """ Tracks weight movements across training to identify consistently small weights. """ def __init__(self, model, beta=0.9): self.model = model self.movement_scores = {} self.beta = beta # Exponential moving average decay self._register_hooks() def _register_hooks(self): """Register forward hooks to track weight magnitudes.""" for name, module in self.model.named_modules(): if hasattr(module, 'weight'): self.movement_scores[name] = torch.zeros_like(module.weight) def update_scores(self): """Update running movement scores with current magnitudes.""" for name, module in self.model.named_modules(): if hasattr(module, 'weight') and module.weight is not None: magnitude = module.weight.abs() self.movement_scores[name] = ( self.beta * self.movement_scores[name] + (1 - self.beta) * magnitude ) def prune(self, sparsity): """Prune weights with lowest movement scores.""" for name, module in self.model.named_modules(): if hasattr(module, 'weight') and name in self.movement_scores: scores = self.movement_scores[name] threshold = torch.quantile(scores.flatten(), sparsity) mask = scores > threshold module.weight.data = module.weight.data * mask.float() ``` Movement pruning offers several advantages over magnitude pruning. First, it identifies weights with consistently low contribution rather than those that happen to be small at evaluation time. Second, it tolerates weight magnifications during training that might later revert. Third, the movement pattern itself provides information about weight importance. The computational overhead of movement tracking remains modest. After each training step, the pruner updates exponential moving averages of weight magnitudes. No forward passes beyond those already required for training are needed. The scoring happens during the normal training loop. A failure mode appears when training hyperparameters interact poorly with movement scores. High learning rates cause weights to fluctuate more, reducing the signal-to-noise ratio in movement scores. Very low learning rates cause weights to move less, potentially misclassifying important weights as unimportant. Movement pruning works best with stable training dynamics.

EXERCISE

Implement movement pruning tracking for a transformer model during training. After convergence, compare which weights get pruned under movement versus magnitude criteria. Identify where the criteria disagree.

← Chapter 4
Magnitude Pruning
Chapter 6 →
Knowledge Distillation