RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
  1. >
  2. Home
  3. /Learn
  4. /Courses
  5. /Edge AI: Mobile and IoT
COURSE · BLD · I019

Edge AI: Mobile and IoT

Learn edge ai: mobile and iot through RunLocalAI's practical lens: edge, mobile, iot and raspberry pi, hardware fit, runtime settings, verification habits and local-vs-cloud tradeoffs.

18 chapters·10h·Builder track·By Fredoline Eruo
PREREQUISITES
  • B004
  • I016
title: Edge AI - Mobile and IoT
description: Deploy neural networks on Raspberry Pi, iOS, and Android with ONNX, TFLite, Core ML, and ML Kit. Master quantization, pruning, benchmarking, power optimization, and OTA updates for real-world edge deployments.
difficulty: intermediate
duration: 8 hours
prerequisites:
  - Basic Python proficiency
  - Understanding of neural network fundamentals
  - Familiarity with command line interface
tags:
  - edge-ai
  - mobile
  - iot
  - onnx
  - tflite
  - coreml
  - quantization
  - raspberry-pi
  - mobile-deployment
order: 4
---
CHAPTERS
  1. 01Edge AI OverviewEdge devices trade raw throughput for zero-latency, offline operation and bandwidth elimination—these tradeoffs must drive deployment architecture decisions.10 min
  2. 02Raspberry Pi SetupThermal management and storage quality determine Raspberry Pi deployment reliability more than compute performance.20 min
  3. 03ONNX RuntimeONNX Runtime's execution provider abstraction enables hardware acceleration portability, but requires explicit provider selection to avoid defaulting to CPU.20 min
  4. 04Model Conversion to ONNXONNX export preserves model computation graphs, but dynamic control flow and unsupported operators require tracing or scripting to eliminate before conversion.20 min
  5. 05TFLite ConversionTFLite conversion requires representative calibration data for integer quantization—calibration data distribution directly determines quantized model accuracy.20 min
  6. 06Core ML for iOSCore ML models benefit from input/output tensor layout matching device expectations—`NHWC` format for vision models avoids expensive transpositions.20 min
  7. 07ML Kit for AndroidML Kit delegates thread management and hardware selection, but preprocessing pipelines and memory management remain developer's responsibility—copy bitmap data to prevent early recycling errors.20 min
  8. 08Extreme QuantizationINT8 quantization typically reduces model size 4x with <1% accuracy loss, but aggressive quantization below 4-bit requires QAT or careful accuracy monitoring.20 min
  9. 092-bit and 3-bit Quantization2-bit and 3-bit quantization requires codebook-based approaches or training-aware quantization to compensate for the severe information loss from <4 discrete values.20 min
  10. 10Model Pruning for EdgeStructured pruning produces models that map directly to efficient hardware primitives; unstructured sparsity requires specialized sparse matrix libraries that may not be available on edge devices.20 min
  11. 11Edge BenchmarkingEdge inference exhibits high variance due to thermal throttling—P95 and P99 latency measurements matter more than mean latency for latency-sensitive applications.20 min
  12. 12Power OptimizationBatching amortizes energy consumed per inference sample by reducing idle time between operations—batching factor of 4-8 typically optimizes power efficiency for edge deployment.20 min
  13. 13Offline OperationOffline operation combines local model storage, prediction caching, and eventual consistency patterns—model size constraints require pruning and quantization for mobile storage limits.20 min
  14. 14Edge-Cloud HybridHybrid edge-cloud inference requires graceful decision logic around confidence thresholds, latency budgets, and priority routing—not simply "edge when possible."20 min
  15. 15Model Updates OTAOTA model updates reduce bandwidth and enable hot-swapping of models in production—but require manifest signing, rollback mechanisms, and careful version compatibility checking.20 min
  16. 16Edge SecurityEdge security requires defense-in-depth: model encryption at rest, input validation before inference, secure update channels, and hardware-backed key storage—not any single method.20 min
  17. 17Testing on DeviceDevice testing requires replicating production conditions including memory pressure, thermal states, and network variability—not simply happy-path unit tests on development machines.25 min
  18. 18Edge Deployment ProjectProduction edge deployments integrate inference engines with model management, security, metrics, and graceful shutdown handling—each component requires the same engineering rigor as the ML model itself.30 min
← All coursesStart chapter 1 →