Project Structure for AI — Python for AI — Zero to Useful (Chapter 30)

Well-organized projects survive maintenance. You—not the code—will return to a project months later and need to understand what's where.

A practical structure for AI utilities:

ai_data_pipeline/
├── pyproject.toml
├── src/
│   └── pipeline/
│       ├── __init__.py
│       ├── ingest.py        # Data loading
│       ├── transform.py     # Preprocessing
│       ├── embed.py         # Embedding generation
│       └── store.py         # Vector storage
├── scripts/
│   └── batch_ingest.py      # Entry points
├── tests/
│   ├── test_transform.py
│   └── test_embed.py
├── data/
│   ├── raw/
│   └── processed/
├── output/
│   └── logs/
├── config/
│   ├── development.yaml
│   └── production.yaml
└── .env.example

The src/ layout is now standard—using it means you pip install -e . the package (editable install) and import pipeline works even in development. It also prevents accidental relative imports.

Config files live separately from code. YAML is common:

# config/production.yaml
model:
  provider: openai
  model_name: gpt-4
  temperature: 0.3

vector_store:
  index_name: production_index
  dimension: 1536

processing:
  batch_size: 100
  max_retries: 3
  retry_delay: 2

Local verification checkpoint

Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.