30. Project Structure for AI
Well-organized projects survive maintenance. You—not the code—will return to a project months later and need to understand what's where.
A practical structure for AI utilities:
ai_data_pipeline/
├── pyproject.toml
├── src/
│ └── pipeline/
│ ├── __init__.py
│ ├── ingest.py # Data loading
│ ├── transform.py # Preprocessing
│ ├── embed.py # Embedding generation
│ └── store.py # Vector storage
├── scripts/
│ └── batch_ingest.py # Entry points
├── tests/
│ ├── test_transform.py
│ └── test_embed.py
├── data/
│ ├── raw/
│ └── processed/
├── output/
│ └── logs/
├── config/
│ ├── development.yaml
│ └── production.yaml
└── .env.example
The src/ layout is now standard—using it means you pip install -e . the package (editable install) and import pipeline works even in development. It also prevents accidental relative imports.
Config files live separately from code. YAML is common:
# config/production.yaml
model:
provider: openai
model_name: gpt-4
temperature: 0.3
vector_store:
index_name: production_index
dimension: 1536
processing:
batch_size: 100
max_retries: 3
retry_delay: 2
Local verification checkpoint
Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.
Local verification checkpoint
Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.
Create this directory structure for a fictional "sentiment_analyzer" project. Include a minimal src/sentiment_analyzer/__init__.py that defines a SentimentAnalyzer class. Create a scripts/analyze.py entry point that imports from the package. Set up pyproject.toml with editable install.