Loss Functions — Custom Training Pipelines (Chapter 11)

Loss functions encode your inductive bias about what the model should learn. A wrong loss function produces a model that optimizes the wrong objective, regardless of training quality.

Classification Losses

import torch
import torch.nn.functional as F

def classification_loss(outputs, targets, config):
    # Standard cross-entropy
    if config.loss == "ce":
        return F.cross_entropy(outputs, targets)
    
    # Label smoothing for better calibration
    if config.loss == "label_smoothing":
        return F.cross_entropy(outputs, targets, label_smoothing=0.1)
    
    # Focal loss for class imbalance
    if config.loss == "focal":
        ce_loss = F.cross_entropy(outputs, targets, reduction='none')
        pt = torch.exp(-ce_loss)
        focal_loss = (1 - pt) ** 2 * ce_loss
        return focal_loss.mean()

Regression Losses

def regression_loss(outputs, targets, config):
    # L1 (MAE) - resistant to outliers
    if config.loss == "l1":
        return F.l1_loss(outputs, targets)
    
    # L2 (MSE) - penalizes large errors more
    if config.loss == "mse":
        return F.mse_loss(outputs, targets)
    
    # Huber - L1 near zero, L2 for large errors
    if config.loss == "huber":
        return F.smooth_l1_loss(outputs, targets, beta=1.0)
    
    # Quantile loss for uncertainty estimation
    if config.loss == "quantile":
        quantiles = [0.1, 0.5, 0.9]
        losses = []
        for i, q in enumerate(quantiles):
            errors = targets - outputs[:, i]
            losses.append(torch.max((q - 1) * errors, q * errors))
        return sum(losses) / len(quantiles)

Multi-Task Losses

Combining losses requires careful weighting:

class MultiTaskLoss(nn.Module):
    def __init__(self, tasks, init_weights=None):
        super().__init__()
        self.tasks = tasks
        if init_weights is None:
            init_weights = {t: 1.0 for t in tasks}
        self.log_vars = nn.Parameter(torch.tensor([init_weights[t] for t in tasks]))
    
    def forward(self, outputs, targets):
        total_loss = 0
        for i, task in enumerate(self.tasks):
            precision = torch.exp(-self.log_vars[i])
            loss = ((outputs[task] - targets[task]) ** 2) * precision + self.log_vars[i]
            total_loss += loss
        return total_loss

Local verification checkpoint

Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.