Hyperparameter Search — Custom Training Pipelines (Chapter 13)

Hyperparameter tuning is search, not guesswork. Systematic search beats intuition for any non-trivial problem.

Grid vs. Random Search

Random search outperforms grid search when some hyperparameters matter more than others:

import itertools
import random

def grid_search(param_grid, n_trials):
    """Grid search - exhaustive, wastes trials on insensitive dims."""
    keys, values = zip(*param_grid.items())
    for combination in itertools.product(*values):
        config = dict(zip(keys, combination))
        yield config

def random_search(param_grid, n_trials):
    """Random search - more efficient for high-dim spaces."""
    for _ in range(n_trials):
        config = {k: random.choice(v) for k, v in param_grid.items()}
        yield config

Optuna for Bayesian Optimization

import optuna

def objective(trial):
    config = {
        'lr': trial.suggest_float('lr', 1e-5, 1e-2, log=True),
        'batch_size': trial.suggest_categorical('batch_size', [16, 32, 64, 128]),
        'weight_decay': trial.suggest_float('weight_decay', 1e-6, 1e-2, log=True),
        'num_layers': trial.suggest_int('num_layers', 2, 8),
        'hidden_dim': trial.suggest_categorical('hidden_dim', [256, 512, 1024]),
    }
    
    model = train_model(config)
    val_loss = evaluate(model, val_loader)
    
    return val_loss

study = optuna.create_study(direction='minimize')
study.optimize(objective, n_trials=50, timeout=3600)  # 1 hour max

Asynchronous Hyperparameter Tuning

# Ray Tune for distributed tuning
from ray import tune

def train_with_tune(config):
    model = build_model(config)
    trainer = pl.Trainer(
        max_epochs=10,
        callbacks=[
            tune.report(val_loss=val_loss)  # Report to scheduler
        ]
    )
    trainer.fit(model, datamodule=data_module)

Local verification checkpoint

Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.