RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
  1. >
  2. Home
  3. /Learn
  4. /Courses
  5. /Document Processing with Local AI
  6. /Ch. 14
Document Processing with Local AI

14. Error Handling

Chapter 14 of 18 · 20 min
KEY INSIGHT

Error handling determines system reliability. Categorize failures, implement appropriate recovery strategies, preserve context for debugging, and route failures to dead letter queues for analysis.

Document processing fails constantly. PDFs have password protection, images lack extractable text, file paths contain special characters, and AI models return unexpected formats. reliable error handling prevents cascading failures.

Categorizing Errors

Not all errors are equal. Distinguish between:

  • Recoverable: Can retry or skip gracefully (corrupt PDF page, timeout)
  • Fatal: Cannot proceed (encrypted file, unsupported format)
  • Partial: Some content extracted, some lost (multi-page PDF with one bad page)

Retry Logic with Exponential Backoff

Transient failures often succeed on retry:

import time
import functools

def retry(max_attempts=3, base_delay=1.0, backoff_factor=2.0):
    def decorator(func):
        @functools.wraps(func)
        def wrapper(*args, **kwargs):
            delay = base_delay
            last_exception = None
            for attempt in range(max_attempts):
                try:
                    return func(*args, **kwargs)
                except Exception as e:
                    last_exception = e
                    if attempt < max_attempts - 1:
                        time.sleep(delay)
                        delay *= backoff_factor
            raise last_exception
        return wrapper
    return decorator

@retry(max_attempts=3, base_delay=2.0, backoff_factor=2.0)
def extract_text_from_pdf(path):
    doc = pymupdf.open(path)
    text = ""
    for page in doc:
        text += page.get_text()
    return text

Graceful Degradation

When full extraction fails, attempt partial extraction:

def extract_with_fallback(path):
    try:
        return {"method": "full", "content": extract_text_from_pdf(path)}
    except Exception as e:
        print(f"Full extraction failed: {e}")
        try:
            return {"method": "ocr", "content": ocr_extraction(path)}
        except Exception as e2:
            return {"method": "none", "error": f"Both methods failed: {e}, {e2}"}

Error Context Preservation

Include context when logging failures:

import logging

logger = logging.getLogger(__name__)

def safe_process(path):
    try:
        result = process_document(path)
        return {"status": "success", "result": result}
    except PasswordEncryptedError as e:
        logger.warning(f"Password-protected document skipped: {path}")
        return {"status": "skipped", "reason": "encrypted", "path": path}
    except CorruptPDFError as e:
        logger.error(f"Corrupt PDF could not be processed: {path}", exc_info=True)
        return {"status": "failed", "reason": "corrupt", "path": path}
    except Exception as e:
        logger.error(f"Unexpected error processing {path}: {e}", exc_info=True)
        return {"status": "failed", "reason": "unknown", "error": str(e), "path": path}

Dead Letter Queue Pattern

Failed documents go to a dead letter queue for later analysis:

import shutil
from pathlib import Path

def handle_failure(path, error_info, dead_letter_dir="/errors"):
    dest = Path(dead_letter_dir) / Path(path).name
    shutil.move(path, dest)
    
    error_log = Path(dead_letter_dir) / "error_log.jsonl"
    with open(error_log, "a") as f:
        f.write(json.dumps({
            "original_path": path,
            "failure_path": str(dest),
            "error": error_info,
            "timestamp": datetime.now().isoformat()
        }) + "\n")

Validation Before Processing

Validate documents before processing to fail fast:

def validate_pdf(path):
    try:
        doc = pymupdf.open(path)
        if doc.page_count == 0:
            return False, "PDF has no pages"
        doc.close()
        return True, "Valid PDF"
    except Exception as e:
        return False, str(e)
EXERCISE

Create an error handler class that categorizes exceptions, applies appropriate recovery strategies, logs failures with full context, and updates a metrics counter for each error type.

← Chapter 13
Processing Pipelines
Chapter 15 →
Quality Checks