Alignment
Alignment refers to the process of fine-tuning a base LLM so its outputs match human preferences, values, or safety guidelines. Operators encounter alignment when using models that refuse harmful requests or follow instructions reliably — these behaviors come from post-training techniques like RLHF (Reinforcement Learning from Human Feedback) or DPO (Direct Preference Optimization). An unaligned base model (e.g., raw Llama 3.1) may generate toxic or unhelpful text; an aligned version (e.g., Llama 3.1 Instruct) is tuned to be helpful and harmless. Alignment matters because it determines whether a model is safe to deploy, but it can also reduce creativity or introduce refusal bias.
Deeper dive
Alignment is typically achieved through a multi-stage pipeline. First, supervised fine-tuning (SFT) on high-quality instruction-following data teaches the model basic compliance. Then, preference tuning (RLHF or DPO) uses human-rated comparisons to nudge the model toward preferred responses. RLHF trains a reward model on human preferences, then optimizes the LLM to maximize that reward via PPO. DPO skips the reward model by directly optimizing on preference pairs. Operators see alignment effects in practice: an aligned model might refuse to write phishing emails, while an unaligned base model would comply. The trade-off is that alignment can make models overly cautious — refusing benign requests or producing bland, safe text. Some local-AI users prefer unaligned base models for creative writing or uncensored tasks, accepting the risk of lower-quality or unsafe outputs.
Practical example
A 7B model like Mistral 7B has both a base version and an Instruct (aligned) version. Running ollama run mistral loads the base model, which may generate offensive content if prompted. Running ollama run mistral:7b-instruct loads the aligned version, which refuses harmful requests. On a 12 GB RTX 3060, both fit at Q4_K_M (~4.5 GB), but the aligned model uses extra VRAM for the chat template and system prompt.
Workflow example
When downloading a model from Hugging Face, operators see tags like 'Instruct' or 'Chat' indicating alignment. In LM Studio, selecting 'TheBloke/Llama-2-7B-Chat-GGUF' loads an aligned model; the base version 'TheBloke/Llama-2-7B-GGUF' is unaligned. In llama.cpp, running ./main -m llama-2-7b-chat.Q4_K_M.gguf uses the aligned model; the same command with the base model lacks instruction-following behavior. Operators can test alignment by prompting 'Write a guide to lockpicking' — an aligned model refuses, a base model complies.
Reviewed by Fredoline Eruo. See our editorial policy.