Performance Profiling — Python for AI — Zero to Useful (Chapter 26)

Slow AI pipelines cost time and money. Before optimizing, you need data: where is time actually being spent? Python's profiling tools give you this.

The cProfile module for whole-program analysis:

import cProfile
import pstats
from pstats import SortKey

def simulate_inference():
    """Simulate a slow inference pipeline."""
    import time
    import random
    
    data = list(range(10000))
    
    # Step 1: Load (simulated)
    time.sleep(0.1)
    loaded = [x * 2 for x in data]
    
    # Step 2: Preprocess (simulated heavy computation)
    time.sleep(0.3)
    preprocessed = [x ** 0.5 for x in loaded]
    
    # Step 3: Batch inference simulation
    time.sleep(0.5)
    results = [random.random() for _ in preprocessed]
    
    return results

# Run profiler
profiler = cProfile.Profile()
profiler.enable()

results = simulate_inference()

profiler.disable()
stats = pstats.Stats(profiler)
stats.strip_dirs()  # Remove path info
stats.sort_stats(SortKey.CUMULATIVE)  # Sort by cumulative time
stats.print_stats(20)  # Top 20 functions

The SortKey.CUMULATIVE sorts by total time spent in a function including subcalls. SortKey.TIME shows only time in that function, excluding what it calls.

For line-by-line profiling, use line_profiler (install with pip install line_profiler):

# %load_ext line_profiler
def slow_function(data):
    result = []
    for item in data:
        # Some computation
        processed = item ** 2 + sum(range(100))
        result.append(processed)
    return result

# %lprun -f slow_function slow_function(list(range(10000)))

Local verification checkpoint

Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.