RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
  1. >
  2. Home
  3. /Learn
  4. /Courses
  5. /Python for AI — Zero to Useful
  6. /Ch. 26
Python for AI — Zero to Useful

26. Performance Profiling

Chapter 26 of 36 · 15 min
KEY INSIGHT

Profile before optimizing. You'll often find that your intuition about what's slow is wrong—actual bottlenecks are frequently in places you'd never suspect (data loading, string formatting, logging). Use `time.perf_counter()` for quick measurements, `cProfile` for systematic analysis.

Slow AI pipelines cost time and money. Before optimizing, you need data: where is time actually being spent? Python's profiling tools give you this.

The cProfile module for whole-program analysis:

import cProfile
import pstats
from pstats import SortKey

def simulate_inference():
    """Simulate a slow inference pipeline."""
    import time
    import random
    
    data = list(range(10000))
    
    # Step 1: Load (simulated)
    time.sleep(0.1)
    loaded = [x * 2 for x in data]
    
    # Step 2: Preprocess (simulated heavy computation)
    time.sleep(0.3)
    preprocessed = [x ** 0.5 for x in loaded]
    
    # Step 3: Batch inference simulation
    time.sleep(0.5)
    results = [random.random() for _ in preprocessed]
    
    return results

# Run profiler
profiler = cProfile.Profile()
profiler.enable()

results = simulate_inference()

profiler.disable()
stats = pstats.Stats(profiler)
stats.strip_dirs()  # Remove path info
stats.sort_stats(SortKey.CUMULATIVE)  # Sort by cumulative time
stats.print_stats(20)  # Top 20 functions

The SortKey.CUMULATIVE sorts by total time spent in a function including subcalls. SortKey.TIME shows only time in that function, excluding what it calls.

For line-by-line profiling, use line_profiler (install with pip install line_profiler):

# %load_ext line_profiler
def slow_function(data):
    result = []
    for item in data:
        # Some computation
        processed = item ** 2 + sum(range(100))
        result.append(processed)
    return result

# %lprun -f slow_function slow_function(list(range(10000)))

Local verification checkpoint

Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.

EXERCISE

Write a script that: (1) creates a list of 50,000 random strings, (2) filters for strings containing "ai" or "ml", (3) sorts the filtered results. Profile it. Identify which step takes the most time. Try a different implementation and verify the improvement.

← Chapter 25
Plotting AI Metrics
Chapter 27 →
Optimizing Python Code