Training & optimization

Epoch

An epoch is one complete pass through the entire training dataset during model training. In practice, operators fine-tuning a model (e.g., with Hugging Face Transformers or Unsloth) set the number of epochs to control how many times the model sees every example. More epochs can improve performance but risk overfitting. Training for 1–3 epochs is common for fine-tuning small models on consumer hardware; each epoch takes minutes to hours depending on dataset size, batch size, and GPU VRAM.

Deeper dive

During training, the model updates its weights after processing each batch of data. An epoch ends when all training examples have been processed once. The number of epochs is a hyperparameter that balances underfitting (too few epochs) and overfitting (too many). Early stopping—halting training when validation loss stops improving—is often used instead of a fixed epoch count. For fine-tuning large language models on consumer GPUs, operators typically use 1–3 epochs because the dataset is small (e.g., thousands of examples) and the model is already pretrained. Training for more epochs on a small dataset can cause catastrophic forgetting, where the model loses general knowledge.

Practical example

Fine-tuning Llama 3.1 8B on a custom dataset of 5,000 examples with a batch size of 4 on an RTX 4090 (24 GB VRAM) might take ~2 hours per epoch. Setting num_train_epochs=3 in the training script means the model sees each example three times. If validation loss increases after epoch 2, the operator would stop early to avoid overfitting.

Workflow example

In Hugging Face Transformers, you set TrainingArguments(num_train_epochs=3). In Unsloth's notebook, you configure max_steps=-1 and num_train_epochs=2. In llama.cpp's training tool (if used), you specify --epochs 2. The training loop logs progress per step and per epoch; operators watch the loss curve to decide if more epochs help.

Reviewed by Fredoline Eruo. See our editorial policy.

Buyer guides

When it doesn't work