18. Emerging Model Families
Chapter 18 of 20 · 15 min
The model landscape evolves rapidly. Understanding major families and their characteristics helps you stay current and evaluate new releases.
Major model families (as of early 2026):
Llama (Meta):
- Llama 3.1: 8B, 70B, 405B
- Strengths: Well-documented, many derivatives, strong community
- Tokenizer: Tiktoken-based, 128K context
- Quantization ecosystem: Mature, many formats available
Mistral (Mistral AI):
- Mixtral 8x7B, Mistral Large, Small
- Strengths: Efficient MoE variants, strong reasoning
- Architecture: Sliding window attention, grouped query attention
- Tradeoffs: Smaller context than Llama in some variants
Phi (Microsoft):
- Phi-3-mini (3.8B), medium (14B)
- Strengths: Exceptional quality per parameter
- Training: Heavy emphasis on "textbook quality" data
- Tradeoffs: Smaller parameter count limits some capabilities
Deepseek:
- Deepseek V2, V3
- Strengths: MoE architecture with strong efficiency
- Deepseek-Coder: Specialized code models
- Tradeoffs: Less ecosystem support than Llama
Gemma (Google):
- Gemma 2 2B, 9B, 27B
- Strengths: High quality, open weights (with terms)
- Architecture: GeMMa attention mechanism
- Tradeoffs: License restrictions limit some use cases
Qwen (Alibaba):
- Qwen 2.5 series, Code models
- Strengths: Strong multilingual, many sizes
- Tradeoffs: Documentation primarily in Chinese
Emerging patterns:
- Mixture of Experts adoption: More models using MoE for efficiency
- Longer context: 128K becoming standard, 256K+ emerging
- Smaller but stronger: Phi-3 shows 3.8B can match larger models
- Specialization: Code models, math models, multilingual variants
How to evaluate new models:
# Evaluation checklist for new releases
new_model_checklist = {
"weights_available": True, # Or just API
"license": "...", # Check commercial restrictions
"context_length": 0,
"architecture": "dense/moe/hybrid",
"training_data_cutoff": "...",
"benchmark_scores": {
"mmlu": None,
"humaneval": None,
"gsm8k": None
},
"community_adoption": {
"huggingface_downloads": 0,
"GitHub_stars": 0
}
}
EXERCISE
Identify a model family not covered here. Research their architectural choices and recent benchmark results. Present a 1-page summary of strengths and tradeoffs.