RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
  1. >
  2. Home
  3. /Learn
  4. /Courses
  5. /Python for AI — Zero to Useful
  6. /Ch. 23
Python for AI — Zero to Useful

23. Text Processing with Regex

Chapter 23 of 36 · 15 min
KEY INSIGHT

Compile regex patterns once if you're using them repeatedly: `pattern = re.compile(r'...')`. Then call `pattern.search(text)` instead of `re.search(r'...', text)`. This avoids re-parsing the pattern on every call.

Beyond simple matching, regex excels at structured extraction and validation. AI pipelines frequently need to pull information from logs, parse model outputs, or validate input formats.

Real-world patterns you'll encounter:

import re
from dataclasses import dataclass

@dataclass
class ModelMetrics:
    epoch: int
    loss: float
    accuracy: float

def parse_metrics_log(log_line: str) -> ModelMetrics | None:
    """Extract metrics from log lines like 'Epoch 3: loss=1.23, acc=0.89'."""
    pattern = r'Epoch (\d+):\s*loss=([\d.]+),\s*acc=([\d.]+)'
    match = re.search(pattern, log_line)
    
    if not match:
        return None
    
    return ModelMetrics(
        epoch=int(match.group(1)),
        loss=float(match.group(2)),
        accuracy=float(match.group(3))
    )

# Complex example: extract API responses with varied formats
def extract_confidence(response: str) -> float | None:
    """Extract confidence scores from various LLM response formats."""
    patterns = [
        r'confidence[:\s]+([0-9.]+)',           # "confidence: 0.87"
        r'conf\s*=?\s*([0-9.]+)',               # "conf = 0.87" or like "conf 0.87"
        r'\(([0-9.]+)\s*(?:->|:)\s*\w+\)',      # "(0.87 -> positive)"
        r'\[\s*([0-9.]+)\s*,',                   # "[0.87, 0.12, ...]"
    ]
    
    for pattern in patterns:
        match = re.search(pattern, response, re.IGNORECASE)
        if match:
            return float(match.group(1))
    
    return None

# Test
log = "Epoch 5: loss=0.234, acc=0.891"
metrics = parse_metrics_log(log)
print(metrics)  # ModelMetrics(epoch=5, loss=0.234, accuracy=0.891)

response = "Analysis complete. confidence: 0.923"
print(extract_confidence(response))  # 0.923

The | (pipe) character in type hints means Union type (Python 3.10+). ModelMetrics | None is equivalent to Optional[ModelMetrics].

Local verification checkpoint

Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.

Local verification checkpoint

Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.

EXERCISE

Write a function normalize_whitespace(text: str) -> str that:

  1. Replaces multiple spaces/tabs with single space
  2. Removes leading/trailing whitespace from each line
  3. Collapses multiple blank lines into one

Test it on a string with messy formatting.

← Chapter 22
Regular Expressions
Chapter 24 →
Data Visualization with Matplotlib