RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
Glossary / Training & optimization / Learning Rate Schedule
Training & optimization

Learning Rate Schedule

A learning rate schedule adjusts the step size (learning rate) during training to improve convergence and model quality. In local AI, operators fine-tuning models with Hugging Face Transformers or Unsloth use schedules like cosine decay or linear warmup to prevent overshooting minima early and to refine weights later. The schedule is defined by a starting rate, a decay function, and optional warmup steps. Choosing the right schedule matters because a fixed rate can stall or diverge, wasting GPU hours on consumer hardware.

Deeper dive

The learning rate controls how much weights update per batch. A schedule changes this rate over training steps. Common schedules: constant (rarely optimal), step decay (drop rate at fixed intervals), exponential decay (continuous decay), cosine decay (smoothly reduces rate following a cosine curve), and linear warmup (gradually increase rate from zero to initial rate, then decay). Warmup is critical for large models to avoid early instability. Operators fine-tuning Llama 3.1 8B on an RTX 4090 might use a cosine schedule with 100 warmup steps and a peak rate of 2e-5. The schedule is set in the training script (e.g., Transformers' get_cosine_schedule_with_warmup).

Practical example

Fine-tuning a 7B model on a single RTX 3090 (24 GB VRAM) with batch size 1 and gradient accumulation 4. Using a constant learning rate of 2e-5 may cause loss spikes after a few hundred steps. Switching to a cosine schedule with 10% warmup steps (e.g., 200 warmup out of 2000 total steps) smooths training, achieving lower final perplexity. The schedule is defined in the training arguments: lr_scheduler_type='cosine', warmup_ratio=0.1.

Workflow example

In Hugging Face Transformers, the schedule is set via TrainingArguments when using Trainer. For example: TrainingArguments(lr_scheduler_type='cosine', warmup_steps=100, learning_rate=2e-5). In Unsloth, the get_peft_model call uses the same arguments. Operators monitor loss curves in TensorBoard; a well-chosen schedule shows steady decrease without plateaus or spikes. In MLX, the schedule is passed to the optimizer: optimizer = optim.AdamW(lr=lr_schedule) where lr_schedule is a callable returning the rate at each step.

Reviewed by Fredoline Eruo. See our editorial policy.

Buyer guides
  • Best GPU for local AI →
  • Best laptop for local AI →
  • Best Mac for local AI →
When it doesn't work
  • CUDA out of memory →
  • Ollama running slowly →
  • ROCm not detected →