RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
  1. >
  2. Home
  3. /Learn
  4. /Courses
  5. /MLOps for Local AI
  6. /Ch. 8
MLOps for Local AI

08. Airflow for AI

Chapter 8 of 24 · 15 min
KEY INSIGHT

Airflow's strength is its Python-native DAG definition. If you can write a Python function, you can create an Airflow task. The downside: Airflow is designed for task orchestration, not data processing. Heavy data transformations belong in Spark or Ray; Airflow orchestrates calls to those systems. Define a DAG for ML training: ```python # dags/ml_pipeline.py from datetime import datetime from airflow import DAG from airflow.operators.bash import BashOperator from airflow.operators.python import PythonOperator default_args = { "owner": "ml-team", "depends_on_past": False, "start_date": datetime(2024, 1, 1), } with DAG( dag_id="daily-model-training", default_args=default_args, schedule_interval="0 2 * * *", # 2 AM daily catchup=False, ) as dag: fetch_data = BashOperator( task_id="fetch_training_data", bash_command="python scripts/fetch_data.py --date {{ ds }}" ) validate_data = PythonOperator( task_id="validate_dataset", python_callable=validate_dataset, op_kwargs={"date": "{{ ds }}"} ) train_model = BashOperator( task_id="train_model", bash_command="python scripts/train.py --date {{ ds }}", env={"MLFLOW_TRACKING_URI": "http://mlflow:5000"} ) evaluate_model = PythonOperator( task_id="evaluate_model", python_callable=evaluate_model ) deploy_on_success = BashOperator( task_id="deploy_model", bash_command="python scripts/deploy.py", trigger_rule="all_success" ) fetch_data >> validate_data >> train_model >> evaluate_model >> deploy_on_success ``` The `{{ ds }}` syntax is Airflow's templating—substituted at runtime with the execution date.

Apache Airflow is the dominant workflow orchestrator, with strong ML adoption. It defines workflows as Python code, executes tasks on defined schedules or triggers, and provides a UI for monitoring.

Installation for local development:

pip install apache-airflow

# Initialize database
airflow db init

# Create admin user
airflow users create \
    --username admin \
    --firstname Admin \
    --lastname User \
    --role Admin \
    --email [email protected]

# Start webserver (background)
airflow webserver --port 8080 &

# Start scheduler (background)
airflow scheduler &

Local verification checkpoint

Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.

EXERCISE

Install Airflow locally. Create the DAG above. Run it in local executor mode (airflow dags test daily-model-training 2024-01-01). Verify tasks execute and check the UI for status.

← Chapter 7
Pipeline Orchestration
Chapter 9 →
Prefect for AI