Prefect for AI — MLOps for Local AI (Chapter 9)

Prefect is an alternative orchestrator that emphasizes developer experience and cloud-optional deployment. It shares Airflow's DAG-based model but with a more Pythonic interface and built-in observability.

Installation:

pip install prefect

The core concept is the flow (the pipeline) and tasks (steps within). Unlike Airflow's separate DAG definition, Prefect uses decorators directly in Python code.

# ml_pipeline.py
from prefect import flow, task
from prefect.blocks.system import Secret
import mlflow

@task
def fetch_data(date: str):
    """Fetch training data for the given date."""
    # Implementation
    return f"/data/training-{date}.csv"

@task
def validate_data(path: str) -> bool:
    """Validate dataset quality. Returns True if valid."""
    import pandas as pd
    df = pd.read_csv(path)
    
    # Check completeness
    missing_pct = df.isnull().mean().max()
    if missing_pct > 0.05:
        raise ValueError(f"Data quality failed: {missing_pct:.1%} missing")
    
    return True

@task
def train_model(date: str, validated: bool):
    """Train model and return run ID."""
    mlflow.set_tracking_uri(Secret.load("mlflow-uri").get())
    
    with mlflow.start_run(run_name=f"train-{date}"):
        mlflow.log_param("training_date", date)
        # ... training code ...
    
    return run_id

@task
def promote_model(run_id: str):
    """Promote trained model to production."""
    client = mlflow.MlflowClient()
    model_uri = f"runs:/{run_id}/model"
    client.register_model(model_uri, "spam-classifier")

Local verification checkpoint

Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.