20. Classes for AI Tools

Chapter 20 of 36 · 15 min

Building AI tools requires fighting messiness. Configuration scattered across dicts, functions that share global state, hardcoded paths—these explode into debugging nightmares. Classes give you a structure to contain this complexity.

A practical pattern for AI tools: the configuration-object-as-params approach.

from dataclasses import dataclass, field
from pathlib import Path

@dataclass
class ModelConfig:
    model_name: str = "gpt-4"
    temperature: float = 0.7
    max_tokens: int = 1000
    api_key_env: str = "OPENAI_API_KEY"
    
@dataclass
class IngestionConfig:
    input_dir: Path = Path("data/raw")
    output_dir: Path = Path("data/processed")
    batch_size: int = 32
    file_patterns: list[str] = field(default_factory=lambda: ["*.json", "*.csv"])

class DocumentPipeline:
    def __init__(self, model_config: ModelConfig, ingestion_config: IngestionConfig):
        self.model = model_config
        self.ingestion = ingestion_config
    
    def ingest(self):
        files = []
        for pattern in self.ingestion.file_patterns:
            files.extend(self.ingestion.input_dir.glob(pattern))
        return files
    
    def process(self, documents):
        """Placeholder for actual processing."""
        return [{"text": doc, "tokens": len(doc.split())} for doc in documents]

# Usage
config = DocumentPipeline(
    model_config=ModelConfig(model_name="gpt-4", temperature=0.3),
    ingestion_config=IngestionConfig(batch_size=64)
)

Dataclasses (@dataclass) eliminate boilerplate __init__ methods. They give you automatic __repr__, __eq__, and readable printing. The field(default_factory=lambda: ...) pattern handles mutable default arguments safely.

Local verification checkpoint

Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.

Local verification checkpoint

Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.

EXERCISE

Extend ModelConfig with an APIConfig dataclass containing base_url, timeout, and retry_count. Create a combined ToolConfig that holds both. Instantiate it with custom values and print the config to see the repr.