Data & datasets

Data Labeling

Data labeling is the process of annotating raw data (text, images, audio) with tags or categories that teach a model what to predict. In local AI, operators rarely label data themselves—they download pre-labeled datasets from Hugging Face or use labeled outputs from a larger model (e.g., Llama 3.1 70B) to fine-tune a smaller one. The quality and consistency of labels directly determine whether a fine-tuned model generalizes or memorizes noise.

Deeper dive

Data labeling is the bottleneck for supervised fine-tuning. For text, labels might be sentiment tags, instruction-response pairs, or named entities. For images, bounding boxes or segmentation masks. Operators running local fine-tunes (e.g., with Unsloth or Axolotl) typically use existing labeled datasets like OpenAssistant or Dolly rather than labeling from scratch. When custom labeling is needed, tools like Label Studio or a script using a local LLM to generate labels are common. Label quality matters: inconsistent labels cause the model to learn wrong patterns, and small datasets (a few hundred examples) amplify labeling errors. Operators should validate label agreement (inter-annotator consistency) or use a held-out set to check if the fine-tune actually improves on the target task.

Practical example

An operator wants to fine-tune Llama 3.2 3B to answer questions about their internal docs. They export 500 question-answer pairs from a chat log, then manually check each pair for correctness. They upload the labeled JSONL file to Hugging Face, then run unsloth train --model llama3.2-3b --dataset my-qa-dataset on an RTX 4090. If 10% of labels are wrong, the fine-tuned model might hallucinate answers it never saw.

Workflow example

In a typical fine-tuning workflow with Axolotl, the operator prepares a dataset in Alpaca format (instruction, input, output). They run python -m axolotl.cli.train config.yml, where config.yml points to a Hugging Face dataset ID or local JSONL file. The training loop reads each labeled example, computes loss against the model's output, and updates weights. After training, they evaluate on a held-out labeled test set to measure accuracy or BLEU score.

Reviewed by Fredoline Eruo. See our editorial policy.

Buyer guides

When it doesn't work