02. Experiment Tracking
Experiment tracking captures the context of machine learning development. Without it, you're flying blind—unable to compare runs, reproduce successes, or diagnose failures. Every training run is an experiment, and experiments need logs.
The fundamental unit is the run: a single execution of training code that produces metrics, artifacts, and metadata. A run captures what you trained (parameters), how well it trained (metrics), what it produced (model artifacts), and the context (data version, environment). Later, you can query runs to find the best-performing model for a given scenario.
Metrics are the backbone of comparison. Track loss curves, accuracy curves, and custom business metrics. The trap is tracking too many metrics without understanding what matters. Define your primary metric before training—it's your optimization target. Secondary metrics are for context and debugging, not decision-making.
Run the above code with MLflow. Navigate to the MLflow UI (mlflow ui) and locate your run. Note the automatically-captured source code, parameters, and metrics. Modify hyperparameters and run again; compare the two runs in the UI.