COURSE · BLD · I019
Edge AI: Mobile and IoT
Learn edge ai: mobile and iot through RunLocalAI's practical lens: edge, mobile, iot and raspberry pi, hardware fit, runtime settings, verification habits and local-vs-cloud tradeoffs.
PREREQUISITES
- B004
- I016
title: Edge AI - Mobile and IoT
description: Deploy neural networks on Raspberry Pi, iOS, and Android with ONNX, TFLite, Core ML, and ML Kit. Master quantization, pruning, benchmarking, power optimization, and OTA updates for real-world edge deployments.
difficulty: intermediate
duration: 8 hours
prerequisites:
- Basic Python proficiency
- Understanding of neural network fundamentals
- Familiarity with command line interface
tags:
- edge-ai
- mobile
- iot
- onnx
- tflite
- coreml
- quantization
- raspberry-pi
- mobile-deployment
order: 4
---
CHAPTERS
- 01Edge AI OverviewEdge devices trade raw throughput for zero-latency, offline operation and bandwidth elimination—these tradeoffs must drive deployment architecture decisions.10 min
- 02Raspberry Pi SetupThermal management and storage quality determine Raspberry Pi deployment reliability more than compute performance.20 min
- 03ONNX RuntimeONNX Runtime's execution provider abstraction enables hardware acceleration portability, but requires explicit provider selection to avoid defaulting to CPU.20 min
- 04Model Conversion to ONNXONNX export preserves model computation graphs, but dynamic control flow and unsupported operators require tracing or scripting to eliminate before conversion.20 min
- 05TFLite ConversionTFLite conversion requires representative calibration data for integer quantization—calibration data distribution directly determines quantized model accuracy.20 min
- 06Core ML for iOSCore ML models benefit from input/output tensor layout matching device expectations—`NHWC` format for vision models avoids expensive transpositions.20 min
- 07ML Kit for AndroidML Kit delegates thread management and hardware selection, but preprocessing pipelines and memory management remain developer's responsibility—copy bitmap data to prevent early recycling errors.20 min
- 08Extreme QuantizationINT8 quantization typically reduces model size 4x with <1% accuracy loss, but aggressive quantization below 4-bit requires QAT or careful accuracy monitoring.20 min
- 092-bit and 3-bit Quantization2-bit and 3-bit quantization requires codebook-based approaches or training-aware quantization to compensate for the severe information loss from <4 discrete values.20 min
- 10Model Pruning for EdgeStructured pruning produces models that map directly to efficient hardware primitives; unstructured sparsity requires specialized sparse matrix libraries that may not be available on edge devices.20 min
- 11Edge BenchmarkingEdge inference exhibits high variance due to thermal throttling—P95 and P99 latency measurements matter more than mean latency for latency-sensitive applications.20 min
- 12Power OptimizationBatching amortizes energy consumed per inference sample by reducing idle time between operations—batching factor of 4-8 typically optimizes power efficiency for edge deployment.20 min
- 13Offline OperationOffline operation combines local model storage, prediction caching, and eventual consistency patterns—model size constraints require pruning and quantization for mobile storage limits.20 min
- 14Edge-Cloud HybridHybrid edge-cloud inference requires graceful decision logic around confidence thresholds, latency budgets, and priority routing—not simply "edge when possible."20 min
- 15Model Updates OTAOTA model updates reduce bandwidth and enable hot-swapping of models in production—but require manifest signing, rollback mechanisms, and careful version compatibility checking.20 min
- 16Edge SecurityEdge security requires defense-in-depth: model encryption at rest, input validation before inference, secure update channels, and hardware-backed key storage—not any single method.20 min
- 17Testing on DeviceDevice testing requires replicating production conditions including memory pressure, thermal states, and network variability—not simply happy-path unit tests on development machines.25 min
- 18Edge Deployment ProjectProduction edge deployments integrate inference engines with model management, security, metrics, and graceful shutdown handling—each component requires the same engineering rigor as the ML model itself.30 min