Annotation
Annotation is the process of adding labels, tags, or metadata to raw data (text, images, audio) to create a training dataset for supervised learning. In local AI, operators encounter annotation when fine-tuning a model on custom data: each example must be paired with a correct output. For text, this means writing prompt-response pairs; for images, drawing bounding boxes or classifying objects. The quality and consistency of annotations directly determine model performance—noisy or sparse labels produce unreliable fine-tuned models.
Practical example
An operator fine-tuning Llama 3.1 8B to answer customer support queries needs a dataset of ~500 annotated examples. Each example is a JSON object with a "prompt" field (e.g., "How do I reset my password?") and a "completion" field (e.g., "Go to Settings > Account > Reset Password."). If the annotations are inconsistent—sometimes using "completion", other times "response"—the training script will fail or learn incorrectly.
Workflow example
In Hugging Face Transformers, annotation is done before training. Operators prepare a CSV or JSONL file with columns like "instruction" and "output". They then load it with datasets.load_dataset('json', data_files='annotations.jsonl') and pass it to the Trainer. In LM Studio, the fine-tuning UI expects a dataset in OpenAI chat format, where each message has a "role" and "content"—annotations define the assistant's expected reply.
Reviewed by Fredoline Eruo. See our editorial policy.