26. Performance Profiling
Slow AI pipelines cost time and money. Before optimizing, you need data: where is time actually being spent? Python's profiling tools give you this.
The cProfile module for whole-program analysis:
import cProfile
import pstats
from pstats import SortKey
def simulate_inference():
"""Simulate a slow inference pipeline."""
import time
import random
data = list(range(10000))
# Step 1: Load (simulated)
time.sleep(0.1)
loaded = [x * 2 for x in data]
# Step 2: Preprocess (simulated heavy computation)
time.sleep(0.3)
preprocessed = [x ** 0.5 for x in loaded]
# Step 3: Batch inference simulation
time.sleep(0.5)
results = [random.random() for _ in preprocessed]
return results
# Run profiler
profiler = cProfile.Profile()
profiler.enable()
results = simulate_inference()
profiler.disable()
stats = pstats.Stats(profiler)
stats.strip_dirs() # Remove path info
stats.sort_stats(SortKey.CUMULATIVE) # Sort by cumulative time
stats.print_stats(20) # Top 20 functions
The SortKey.CUMULATIVE sorts by total time spent in a function including subcalls. SortKey.TIME shows only time in that function, excluding what it calls.
For line-by-line profiling, use line_profiler (install with pip install line_profiler):
# %load_ext line_profiler
def slow_function(data):
result = []
for item in data:
# Some computation
processed = item ** 2 + sum(range(100))
result.append(processed)
return result
# %lprun -f slow_function slow_function(list(range(10000)))
Local verification checkpoint
Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.
Write a script that: (1) creates a list of 50,000 random strings, (2) filters for strings containing "ai" or "ml", (3) sorts the filtered results. Profile it. Identify which step takes the most time. Try a different implementation and verify the improvement.