RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
Glossary / Large language models / Pre-training
Large language models

Pre-training

Pre-training is the initial phase where a large language model learns from a vast, diverse corpus of text data (e.g., web pages, books) by predicting missing tokens. This builds broad linguistic knowledge and reasoning capabilities. The resulting base model is not yet instruction-tuned; operators typically download these as raw weight files (e.g., Llama 3.1 8B base) before applying fine-tuning or quantization.

Deeper dive

During pre-training, the model processes hundreds of billions of tokens using a self-supervised objective like next-token prediction. This requires massive compute clusters (thousands of GPUs) running for weeks or months. The output is a set of weights that encode statistical patterns of language. Operators rarely run pre-training themselves due to cost; instead, they use pre-trained models from organizations like Meta or Mistral. Pre-training differs from fine-tuning: the former builds general knowledge, the latter adapts to specific tasks or formats. The scale of pre-training directly impacts model quality and size—larger models (e.g., 70B parameters) require more data and compute.

Practical example

Meta's Llama 3.1 8B model was pre-trained on ~15 trillion tokens using 16,000 H100 GPUs. An operator downloading the base model (e.g., from Hugging Face) gets the raw weights—no chat template, no instruction following. Running ollama pull llama3.1:8b actually pulls a version that has been instruction-tuned; the base model would be llama3.1:8b-base. Pre-training cost is estimated at millions of dollars, so operators rely on shared pre-trained weights.

Workflow example

When using Hugging Face Transformers, operators load a pre-trained model via AutoModel.from_pretrained('meta-llama/Llama-3.1-8B'). This downloads the pre-trained weights. To use it for chat, they must then apply a chat template or fine-tune. In LM Studio, selecting a model from the hub shows whether it's 'base' or 'instruct'. Pre-training is the step that created those base weights; operators skip it and start from the published checkpoint.

Reviewed by Fredoline Eruo. See our editorial policy.

Buyer guides
  • Best GPU for local AI →
  • Best laptop for local AI →
  • Best Mac for local AI →
When it doesn't work
  • CUDA out of memory →
  • Ollama running slowly →
  • ROCm not detected →