COURSE · FND · B004
Understanding AI Models
Read a model card and know exactly what it means. Choose the right model for any task.
PREREQUISITES
- B001
- B003
Course B004: Understanding AI Models
Why this course exists
You have seen model cards with "70B parameters" and "Q4_K_M quantization" and wondered what those numbers actually mean for your use case. This course builds the technical vocabulary to read any model card confidently, understand what quantization choices do to model quality, and select models based on benchmark data rather than marketing claims. By the end, you will be able to look at a model card, understand its architecture tradeoffs, estimate VRAM requirements, and make informed decisions about which model to run locally.
What you will know after
- Parse a model card and understand what every specification means
- Calculate VRAM requirements for any model at any quantization level
- Evaluate benchmark scores with knowledge of what they actually measure
- Match model characteristics to task requirements
- Compare models on apples-to-apples metrics you run yourself
CHAPTERS
- 01What the Numbers MeanModel card specs tell you hardware requirements, not quality or suitability for your task.15 min
- 02Parameter Count GuideParameter count predicts hardware requirements better than it predicts capability.10 min
- 03Dense vs Mixture of ExpertsMoE models let you trade memory capacity for parameter count-total params are large, but active params per token are manageable.15 min
- 04Quantization ExplainedQuantization trades VRAM for quality-the more aggressive the compression, the more capability you lose, but optimized methods minimize this loss.15 min
- 05Q4_K_M vs Q8_0 vs Q2_KQ4_K_M hits the sweet spot for most users-significant VRAM savings with only 5% quality loss compared to FP16.15 min
- 06KV Cache and VRAMKV cache can consume 30-50% of inference VRAM at long contexts-model weights are not the only consideration.20 min
- 07Context Length TradeoffsListed context length is not the same as effective retrieval length-test with needle-in-haystack benchmarks to verify.15 min
- 08MMLU Benchmark ExplainedMMLU measures breadth of knowledge and simple reasoning, not capability on the tasks most users care about.15 min
- 09HumanEval for CodeHumanEval measures algorithm problem-solving in Python, not production code quality-use it as a proxy, not the whole picture.20 min
- 10GSM8K for MathGSM8K isolates multi-step reasoning without external knowledge-weak performance indicates reasoning chain failures.20 min
- 11Chatbot Arena EloArena measures human preference on diverse real queries-its main strength is capturing what actual users value, not what proxies measure.15 min
- 12Running Your Own BenchmarksSelf-run benchmarks eliminate contamination risk but require representative test sets and consistent scoring methodology.20 min
- 13Model Selection for ChatFor chat, latency and instruction following matter as much as raw capability-test response timing and conversation coherence, not just single-turn quality.15 min
- 14Model Selection for CodeFor code, benchmark on your actual codebase and dependencies, not just HumanEval-syntax accuracy does not guarantee API correctness.15 min
- 15Model Selection for ReasoningReasoning performance is not captured by knowledge benchmarks-test with multi-step problems that require maintaining intermediate state.15 min
- 16Instruct vs Base ModelsBase models give you maximum flexibility for fine-tuning; instruct models give you ready-to-use instruction following out of the box.20 min
- 17Tokenizer Impact on QualityTokenizer choice affects effective context length for non-English text and inference speed across all text-test with your actual content.20 min
- 18Emerging Model FamiliesNew models appear monthly-focus on architectural characteristics and benchmark patterns rather than specific model rankings that change quickly.15 min
- 19Open vs Closed WeightsOpen weights give you control and privacy; closed models give you convenience and often top quality. The choice depends on your specific constraints.15 min
- 20Model Comparison ProjectModel selection is a structured decision-not just picking the highest benchmark, but matching capabilities to specific requirements within constraints.25 min