11. Loss Functions

Chapter 11 of 18 · 20 min

Loss functions encode your inductive bias about what the model should learn. A wrong loss function produces a model that optimizes the wrong objective, regardless of training quality.

Classification Losses

import torch
import torch.nn.functional as F

def classification_loss(outputs, targets, config):
    # Standard cross-entropy
    if config.loss == "ce":
        return F.cross_entropy(outputs, targets)
    
    # Label smoothing for better calibration
    if config.loss == "label_smoothing":
        return F.cross_entropy(outputs, targets, label_smoothing=0.1)
    
    # Focal loss for class imbalance
    if config.loss == "focal":
        ce_loss = F.cross_entropy(outputs, targets, reduction='none')
        pt = torch.exp(-ce_loss)
        focal_loss = (1 - pt) ** 2 * ce_loss
        return focal_loss.mean()

Regression Losses

def regression_loss(outputs, targets, config):
    # L1 (MAE) - resistant to outliers
    if config.loss == "l1":
        return F.l1_loss(outputs, targets)
    
    # L2 (MSE) - penalizes large errors more
    if config.loss == "mse":
        return F.mse_loss(outputs, targets)
    
    # Huber - L1 near zero, L2 for large errors
    if config.loss == "huber":
        return F.smooth_l1_loss(outputs, targets, beta=1.0)
    
    # Quantile loss for uncertainty estimation
    if config.loss == "quantile":
        quantiles = [0.1, 0.5, 0.9]
        losses = []
        for i, q in enumerate(quantiles):
            errors = targets - outputs[:, i]
            losses.append(torch.max((q - 1) * errors, q * errors))
        return sum(losses) / len(quantiles)

Multi-Task Losses

Combining losses requires careful weighting:

class MultiTaskLoss(nn.Module):
    def __init__(self, tasks, init_weights=None):
        super().__init__()
        self.tasks = tasks
        if init_weights is None:
            init_weights = {t: 1.0 for t in tasks}
        self.log_vars = nn.Parameter(torch.tensor([init_weights[t] for t in tasks]))
    
    def forward(self, outputs, targets):
        total_loss = 0
        for i, task in enumerate(self.tasks):
            precision = torch.exp(-self.log_vars[i])
            loss = ((outputs[task] - targets[task]) ** 2) * precision + self.log_vars[i]
            total_loss += loss
        return total_loss

Local verification checkpoint

Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.

EXERCISE

Run training with both L1 and L2 losses on your regression dataset. Compare the prediction distributions. Do they differ? Which is more appropriate for your downstream task?