RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
  1. >
  2. Home
  3. /Learn
  4. /Courses
  5. /Python for AI — Zero to Useful
  6. /Ch. 27
Python for AI — Zero to Useful

27. Optimizing Python Code

Chapter 27 of 36 · 15 min
KEY INSIGHT

Profile first. Then optimize hot paths. Use numpy for numerical work, `join()` for strings, comprehensions over loops, `lru_cache` for repeated function calls. Never sacrifice correctness for speed—and never assume what the bottleneck is.

Profiling showed you where time goes. Now what? The core optimization tension in Python: readability versus performance. Always optimize for clarity first, then optimize the hot paths that profiling identifies.

Common optimizations for AI pipelines:

# SLOW: Python loop for numerical computation
def slow_square_sum(values):
    total = 0
    for v in values:
        total += v * v
    return total

# FAST: Use numpy vectorized operations
import numpy as np

def fast_square_sum(values):
    arr = np.array(values)
    return float(np.sum(arr ** 2))

# Benchmark
import timeit

values = list(range(100000))
slow_time = timeit.timeit(lambda: slow_square_sum(values), number=10)
fast_time = timeit.timeit(lambda: fast_square_sum(values), number=10)

print(f"Slow (loop): {slow_time:.4f}s")
print(f"Fast (numpy): {fast_time:.4f}s")  # Expect 10-100x speedup

# SLOW: String concatenation in loop
def slow_concat(items):
    result = ""
    for item in items:
        result += item + ", "
    return result

# FAST: Join
def fast_concat(items):
    return ", ".join(items)

List comprehensions are faster than explicit loops (they're optimized C code). Generators (yield) save memory for large datasets. functools.lru_cache memoizes expensive function calls:

from functools import lru_cache

@lru_cache(maxsize=1024)
def expensive_embedding(text: str) -> list[float]:
    """Simulated expensive embedding computation."""
    # In reality, this calls a slow API or model
    return [hash(text + str(i)) % 1000 / 1000 for i in range(10)]

# Second call with same text hits cache
result1 = expensive_embedding("hello world")
result2 = expensive_embedding("hello world")  # Instant, from cache

Local verification checkpoint

Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.

EXERCISE

Create a function that computes a rolling average over a list (each output is the mean of the current element plus the previous N-1 elements). Implement it: (1) with a Python loop, (2) using numpy convolution. Benchmark both with a list of 100,000 floats and window size 100. Show the speedup.

← Chapter 26
Performance Profiling
Chapter 28 →
Virtual Environments Deep Dive