Training & optimization

Overfitting

Overfitting occurs when a model learns training data too well, including noise and irrelevant patterns, at the cost of generalizing to new data. In practice, an overfitted model memorizes specific examples instead of learning underlying rules, leading to high accuracy on training data but poor performance on validation or test data. Operators encounter overfitting when fine-tuning models on small datasets: the model may produce perfect outputs for training prompts but fail on similar unseen prompts. Techniques like early stopping, dropout, and data augmentation mitigate overfitting by limiting model capacity or exposing it to more varied examples.

Deeper dive

Overfitting is a fundamental failure mode in supervised learning. It arises when a model has too many parameters relative to the amount and diversity of training data, or when training runs too long. The model essentially memorizes the training set, including outliers and noise, rather than learning generalizable features. For neural networks, this often manifests as near-zero training loss but high validation loss. Common indicators: training accuracy continues to improve while validation accuracy plateaus or declines. Operators fine-tuning large language models on custom datasets (e.g., using LoRA on a few hundred examples) must watch for overfitting. Practical countermeasures include: using a validation set to monitor divergence, applying dropout (randomly disabling neurons during training), early stopping (halting when validation loss stops decreasing), weight decay (L2 regularization), and data augmentation (creating synthetic variations). In local AI contexts, overfitting is especially risky because small datasets are common and compute budgets limit extensive hyperparameter tuning.

Practical example

An operator fine-tunes Llama 3.1 8B using LoRA on 500 customer support emails. After 10 epochs, training loss drops to 0.01, but validation loss climbs from 0.5 to 1.2. The model now generates perfect replies to training emails but nonsensical or repetitive answers to new ones. This is overfitting. The operator reduces epochs to 3 and adds a dropout of 0.1, bringing validation loss down to 0.4.

Workflow example

When using Hugging Face Transformers with Trainer, set evaluation_strategy="epoch" and load_best_model_at_end=True. Monitor training logs: if training loss decreases while eval loss increases, stop training early. In llama.cpp fine-tuning scripts, pass --early-stop and --validation-file to halt at the best checkpoint. For LoRA in MLX, use --val-batches and watch the validation perplexity—if it rises, reduce --iters or increase --lora-dropout.

Reviewed by Fredoline Eruo. See our editorial policy.

Buyer guides

When it doesn't work