01. What the Numbers Mean

Chapter 1 of 20 · 15 min

When you browse a model on Hugging Face or a model's homepage, you encounter a set of specifications that look technical but often lack explanation. Understanding these numbers lets you make informed choices instead of guessing.

The standard model card spec:

Parameters: 7B
Architecture: transformer
Quantization: Q4_K_M
Context length: 4096
VRAM: ~4.9GB

Parameters is the count of individual weights in the model. A 7B model has 7 billion floating-point numbers. Architecture tells you the basic design-nearly all modern language models use variations of the transformer architecture from 2017.

The VRAM specification assumes a specific quantization level. The same model at Q8_0 requires roughly twice the VRAM as at Q4_K_M. Context length tells you the maximum input plus output tokens the model can handle in a single forward pass.

What these numbers cannot tell you:

  • The quality of training data
  • How the model was aligned (fine-tuned method, data quality)
  • Whether the tokenizer matches your use case
  • Actual inference speed on your hardware

These gaps are why benchmarks exist. But even benchmarks have limitations you need to understand.

Local verification checkpoint

Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.

Local verification checkpoint

Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.

EXERCISE

Find three models on Hugging Face with 7B parameters but different quantizations. Record the listed VRAM for each and verify the relative ordering matches what you expect from quantization math.