09. Prefect for AI

Chapter 9 of 24 · 20 min

Prefect is an alternative orchestrator that emphasizes developer experience and cloud-optional deployment. It shares Airflow's DAG-based model but with a more Pythonic interface and built-in observability.

Installation:

pip install prefect

The core concept is the flow (the pipeline) and tasks (steps within). Unlike Airflow's separate DAG definition, Prefect uses decorators directly in Python code.

# ml_pipeline.py
from prefect import flow, task
from prefect.blocks.system import Secret
import mlflow

@task
def fetch_data(date: str):
    """Fetch training data for the given date."""
    # Implementation
    return f"/data/training-{date}.csv"

@task
def validate_data(path: str) -> bool:
    """Validate dataset quality. Returns True if valid."""
    import pandas as pd
    df = pd.read_csv(path)
    
    # Check completeness
    missing_pct = df.isnull().mean().max()
    if missing_pct > 0.05:
        raise ValueError(f"Data quality failed: {missing_pct:.1%} missing")
    
    return True

@task
def train_model(date: str, validated: bool):
    """Train model and return run ID."""
    mlflow.set_tracking_uri(Secret.load("mlflow-uri").get())
    
    with mlflow.start_run(run_name=f"train-{date}"):
        mlflow.log_param("training_date", date)
        # ... training code ...
    
    return run_id

@task
def promote_model(run_id: str):
    """Promote trained model to production."""
    client = mlflow.MlflowClient()
    model_uri = f"runs:/{run_id}/model"
    client.register_model(model_uri, "spam-classifier")

Local verification checkpoint

Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.

EXERCISE

Install Prefect. Create the flow above with your own training script. Run the flow locally with python ml_pipeline.py. Install the Prefect agent and run the same flow via the agent. Compare the experience.