13. Hyperparameter Search
Hyperparameter tuning is search, not guesswork. Systematic search beats intuition for any non-trivial problem.
Grid vs. Random Search
Random search outperforms grid search when some hyperparameters matter more than others:
import itertools
import random
def grid_search(param_grid, n_trials):
"""Grid search - exhaustive, wastes trials on insensitive dims."""
keys, values = zip(*param_grid.items())
for combination in itertools.product(*values):
config = dict(zip(keys, combination))
yield config
def random_search(param_grid, n_trials):
"""Random search - more efficient for high-dim spaces."""
for _ in range(n_trials):
config = {k: random.choice(v) for k, v in param_grid.items()}
yield config
Optuna for Bayesian Optimization
import optuna
def objective(trial):
config = {
'lr': trial.suggest_float('lr', 1e-5, 1e-2, log=True),
'batch_size': trial.suggest_categorical('batch_size', [16, 32, 64, 128]),
'weight_decay': trial.suggest_float('weight_decay', 1e-6, 1e-2, log=True),
'num_layers': trial.suggest_int('num_layers', 2, 8),
'hidden_dim': trial.suggest_categorical('hidden_dim', [256, 512, 1024]),
}
model = train_model(config)
val_loss = evaluate(model, val_loader)
return val_loss
study = optuna.create_study(direction='minimize')
study.optimize(objective, n_trials=50, timeout=3600) # 1 hour max
Asynchronous Hyperparameter Tuning
# Ray Tune for distributed tuning
from ray import tune
def train_with_tune(config):
model = build_model(config)
trainer = pl.Trainer(
max_epochs=10,
callbacks=[
tune.report(val_loss=val_loss) # Report to scheduler
]
)
trainer.fit(model, datamodule=data_module)
Local verification checkpoint
Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.
Local verification checkpoint
Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.
Set up Optuna with 20 trials on your current model. Log the best configuration. Compare to your manually-tuned baseline.