Normalization
Normalization is a data preprocessing step that rescales input values to a fixed range (e.g., [0,1] or [-1,1]) or adjusts them to have zero mean and unit variance. In local AI, it is applied to model inputs (text tokens, image pixels, audio samples) before inference to match the distribution the model was trained on. Without normalization, activations can saturate or produce unstable outputs. For LLMs, token embeddings are normalized via layer normalization inside the transformer; for image models, pixel values are divided by 255 or standardized per channel. Operators must ensure their preprocessing pipeline matches the model’s expected normalization—mismatch degrades accuracy silently.
Deeper dive
Normalization techniques vary by model type. Batch normalization (used in CNNs) normalizes across the batch dimension during training but is folded into weights at inference. Layer normalization (used in transformers) normalizes across features per token, which is why LLMs don't need batch statistics at runtime. RMSNorm is a simpler variant used in Llama and Mistral models. For text, token IDs are first mapped to embeddings, then layer normalization is applied before attention and feed-forward blocks. Operators rarely write this code manually—Hugging Face Transformers and llama.cpp handle it internally—but understanding normalization helps when converting models between frameworks (e.g., ONNX export may fuse normalization into adjacent layers) or when debugging output drift after quantization.
Practical example
When running an image classification model like ResNet-50 locally via ONNX Runtime, the input tensor must be normalized: typically mean=[0.485, 0.456, 0.406] and std=[0.229, 0.224, 0.225] per ImageNet. If an operator feeds raw pixel values (0–255) without normalization, the model outputs near-random logits. For LLMs, llama.cpp applies layer normalization automatically—no operator action needed—but if you export a model to GGUF from PyTorch, the normalization parameters are baked into the weights.
Workflow example
In a local AI workflow using Hugging Face Transformers, the tokenizer and image processor handle normalization. For example, AutoImageProcessor.from_pretrained('google/vit-base-patch16-224') returns a processor that resizes and normalizes images. The operator only calls processor(images=img, return_tensors='pt'). In llama.cpp, normalization is part of the model graph—no separate step. If you write a custom inference script in Python, you must replicate the normalization manually: e.g., pixel_values = (pixel_values / 255.0 - mean) / std.
Reviewed by Fredoline Eruo. See our editorial policy.