Large language models

Pre-training

Pre-training is the initial phase where a large language model learns from a vast, diverse corpus of text data (e.g., web pages, books) by predicting missing tokens. This builds broad linguistic knowledge and reasoning capabilities. The resulting base model is not yet instruction-tuned; operators typically download these as raw weight files (e.g., Llama 3.1 8B base) before applying fine-tuning or quantization.

Deeper dive

During pre-training, the model processes hundreds of billions of tokens using a self-supervised objective like next-token prediction. This requires massive compute clusters (thousands of GPUs) running for weeks or months. The output is a set of weights that encode statistical patterns of language. Operators rarely run pre-training themselves due to cost; instead, they use pre-trained models from organizations like Meta or Mistral. Pre-training differs from fine-tuning: the former builds general knowledge, the latter adapts to specific tasks or formats. The scale of pre-training directly impacts model quality and size—larger models (e.g., 70B parameters) require more data and compute.

Practical example

Meta's Llama 3.1 8B model was pre-trained on ~15 trillion tokens using 16,000 H100 GPUs. An operator downloading the base model (e.g., from Hugging Face) gets the raw weights—no chat template, no instruction following. Running ollama pull llama3.1:8b actually pulls a version that has been instruction-tuned; the base model would be llama3.1:8b-base. Pre-training cost is estimated at millions of dollars, so operators rely on shared pre-trained weights.

Workflow example

When using Hugging Face Transformers, operators load a pre-trained model via AutoModel.from_pretrained('meta-llama/Llama-3.1-8B'). This downloads the pre-trained weights. To use it for chat, they must then apply a chat template or fine-tune. In LM Studio, selecting a model from the hub shows whether it's 'base' or 'instruct'. Pre-training is the step that created those base weights; operators skip it and start from the published checkpoint.

Reviewed by Fredoline Eruo. See our editorial policy.

Buyer guides

When it doesn't work