Ground Truth

Ground truth is the correct, real-world answer or label that a model is trained to predict or evaluated against. In supervised learning, ground truth data is the human-verified reference—e.g., an image labeled "cat" or a text transcript of an audio clip. Operators encounter ground truth when building or evaluating datasets: it's the benchmark that defines whether a model's output is right or wrong. Without ground truth, you can't measure accuracy, precision, or recall. In local AI workflows, ground truth often comes from curated datasets like MMLU or HumanEval, which operators use to compare model performance on their own hardware.

When evaluating a local model like Llama 3.1 8B on the MMLU benchmark, the ground truth is the correct multiple-choice answer for each question (e.g., "A"). The model's prediction is compared to this ground truth to compute accuracy. If you're fine-tuning a model on a custom dataset, ground truth labels are the expected outputs—say, a corrected SQL query for a natural language question. Without accurate ground truth, the fine-tune will learn incorrect patterns.

In LM Studio, you can load a model and run a benchmark like MMLU. The software compares each model output against the ground truth answers stored in the benchmark file. In Hugging Face Transformers, you'd load a dataset (e.g., load_dataset('mmlu', 'all')) and compute accuracy by comparing predictions to the 'answer' column—that's the ground truth. If you're building a dataset for fine-tuning, you manually verify each label to ensure ground truth quality before training.

Reviewed by Fredoline Eruo. See our editorial policy.

When it doesn't work

Practical example

Workflow example