RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
Glossary / Data & datasets / Data Augmentation
Data & datasets

Data Augmentation

Data augmentation is the technique of generating modified copies of existing training data to increase dataset size and diversity without collecting new samples. Operators encounter it when fine-tuning models locally: common augmentations include cropping, rotating, or adding noise to images, and synonym replacement or back-translation for text. Augmentation helps models generalize better and reduces overfitting, especially when the original dataset is small. The operator chooses augmentations that preserve label meaning—rotating a cat photo 10° still shows a cat, but rotating it 180° might not. Augmentation is applied on-the-fly during training, not stored permanently.

Deeper dive

Data augmentation works by applying random transformations to each training batch before feeding it to the model. For images, typical augmentations include random horizontal flips, slight rotations, color jitter, and random cropping. For text, augmentations like random word deletion, synonym replacement, or back-translation (translating to another language and back) create paraphrases. The key constraint is label invariance: the transformation must not change the ground-truth label. Augmentation is especially useful when the dataset has fewer than a few thousand examples. In local fine-tuning with Hugging Face Transformers, augmentation is often implemented via a custom dataset class that applies transforms in the __getitem__ method. Libraries like torchvision for images and nlpaug for text provide ready-made augmentations. The operator must balance augmentation strength: too aggressive can degrade performance by creating unrealistic samples.

Practical example

An operator fine-tunes a vision model (e.g., ResNet-50) on a custom dataset of 500 cat photos. Without augmentation, the model overfits and fails on new angles. Using torchvision.transforms, they add random horizontal flip, rotation ±10°, and color jitter. Each epoch, the model sees different variations, effectively training on thousands of unique images. The operator monitors validation loss: if it stops decreasing, they reduce augmentation intensity.

Workflow example

In a Hugging Face Transformers training script, the operator defines a train_transforms composed of RandomResizedCrop(224), RandomHorizontalFlip(), and ColorJitter(). They pass this to the dataset's set_transform method. During training, each batch is augmented on-the-fly, increasing effective dataset size without extra storage. The operator can disable augmentation for validation by using a separate transform pipeline.

Reviewed by Fredoline Eruo. See our editorial policy.

Buyer guides
  • Best GPU for local AI →
  • Best laptop for local AI →
  • Best Mac for local AI →
When it doesn't work
  • CUDA out of memory →
  • Ollama running slowly →
  • ROCm not detected →