03. MLflow Setup

Chapter 3 of 24 · 20 min

KEY INSIGHT

The tracking URI determines where data goes. `mlflow.set_tracking_uri()` controls this. Without explicit configuration, MLflow writes to `./mlruns` locally. Explicit configuration enables sharing across processes and future migration. Configuration happens in code or environment variables: ```python import os # Option 1: Environment variable os.environ["MLFLOW_TRACKING_URI"] = "sqlite:///mlflow.db" # Option 2: Explicit in code import mlflow mlflow.set_tracking_uri("sqlite:///mlflow.db") ``` The SQLite backend stores experiments in a local file. This works for single machines but doesn't scale beyond one operator. For collaborative environments, a server-based approach (covered in the next chapter) is necessary. Backend store configuration options: | URI | Backend | Use Case | |-----|---------|----------| | `./mlruns` | Filesystem | Development, single user | | `sqlite:///mlflow.db` | SQLite | Single machine, light load | | `postgresql://host/db` | PostgreSQL | Multi-user, production | MLflow also captures the execution environment. It logs installed packages automatically, ensuring you can reproduce the software context later.

MLflow is the open-source standard for local MLOps. It provides four components: Tracking (experiment logging), Models (model packaging), Model Registry (version management), and Projects (reproducible runs). For local AI, you need at minimum the Tracking component.

Installation is straightforward:

pip install mlflow

That's the core package. Additional components for specific needs:

pip install mlflow[extras]  # Includes sqlalchemy for database backend
pip install mlflow[typecheck]  # Type validation

For production use, you'll want a database backend. SQLite suffices for single-user local setups; PostgreSQL for multi-user or high-volume scenarios.

Local verification checkpoint

Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.

EXERCISE

Install MLflow and configure it with a SQLite backend. Run three experiment variations (different hyperparameters) and verify all runs appear in the backend store. Check the SQLite file directly with sqlite3 mlflow.db ".schema".