20. Classes for AI Tools
Building AI tools requires fighting messiness. Configuration scattered across dicts, functions that share global state, hardcoded paths—these explode into debugging nightmares. Classes give you a structure to contain this complexity.
A practical pattern for AI tools: the configuration-object-as-params approach.
from dataclasses import dataclass, field
from pathlib import Path
@dataclass
class ModelConfig:
model_name: str = "gpt-4"
temperature: float = 0.7
max_tokens: int = 1000
api_key_env: str = "OPENAI_API_KEY"
@dataclass
class IngestionConfig:
input_dir: Path = Path("data/raw")
output_dir: Path = Path("data/processed")
batch_size: int = 32
file_patterns: list[str] = field(default_factory=lambda: ["*.json", "*.csv"])
class DocumentPipeline:
def __init__(self, model_config: ModelConfig, ingestion_config: IngestionConfig):
self.model = model_config
self.ingestion = ingestion_config
def ingest(self):
files = []
for pattern in self.ingestion.file_patterns:
files.extend(self.ingestion.input_dir.glob(pattern))
return files
def process(self, documents):
"""Placeholder for actual processing."""
return [{"text": doc, "tokens": len(doc.split())} for doc in documents]
# Usage
config = DocumentPipeline(
model_config=ModelConfig(model_name="gpt-4", temperature=0.3),
ingestion_config=IngestionConfig(batch_size=64)
)
Dataclasses (@dataclass) eliminate boilerplate __init__ methods. They give you automatic __repr__, __eq__, and readable printing. The field(default_factory=lambda: ...) pattern handles mutable default arguments safely.
Local verification checkpoint
Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.
Local verification checkpoint
Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.
Extend ModelConfig with an APIConfig dataclass containing base_url, timeout, and retry_count. Create a combined ToolConfig that holds both. Instantiate it with custom values and print the config to see the repr.