RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
  1. >
  2. Home
  3. /Learn
  4. /Courses
  5. /Edge AI: Mobile and IoT
  6. /Ch. 6
Edge AI: Mobile and IoT

06. Core ML for iOS

Chapter 6 of 18 · 20 min
KEY INSIGHT

Core ML models benefit from input/output tensor layout matching device expectations—`NHWC` format for vision models avoids expensive transpositions.

Apple's Core ML framework provides hardware-accelerated inference on iOS devices. Models convert from ONNX or TensorFlow/Keras formats to Core ML packages (.mlmodel). The Neural Engine (ANE) in Apple Silicon and A-series chips delivers 10-40 TOPS for compatible operator workloads.

Prerequisites include Xcode 14+, Python with coremltools, and macOS for conversion. Installation:

pip install coremltools

Conversion from ONNX follows this pattern:

import coremltools as ct
import onnx

# Load ONNX model
onnx_model = onnx.load("model.onnx")

# Convert to Core ML
mlmodel = ct.convert(
    onnx_model,
    inputs=[("input", ct.TensorType(shape=(1, 3, 224, 224)))],
    outputs=[("output", ct.TensorType(shape=(1, 1000)))],
    minimum_deployment_target=ct.target.iOS16
)

# Save the model
mlmodel.save("model.mlmodel")

Keras model conversion uses coremltools.converters:

import coremltools as ct
import tensorflow as tf

model = tf.keras.applications.MobileNetV2(weights="imagenet")

# Convert with input shape specification
mlmodel = ct.convert(
    model,
    inputs=[("input", ct.TensorType(shape=(1, 224, 224, 3)))],
    minimum_deployment_target=ct.target.iOS15
)

mlmodel.save("mobilenetv2.mlmodel")

Swift inference code structure:

import CoreML

func loadAndRunModel(imageData: Data) throws -> [Double] {
    let config = MLModelConfiguration()
    config.computeUnits = .all // .cpuOnly, .gpuOnly, .neuralEngine
    
    let model = try MLModel(contentsOf: modelURL, configuration: config)
    
    let inputShape: [NSNumber] = [1, 224, 224, 3]
    guard let inputTensor = try? MLMultiArray(shape: inputShape, dataType: .float32) else {
        throw ModelError.tensorCreationFailed
    }
    
    // Copy image data into tensor
    // ... pixel copy loop ...
    
    let inputFeature = MLFeatureValue(multiArray: inputTensor)
    let inputFeatures = ["input": inputFeature]
    
    let inputProvider = try? MLDictionaryFeatureProvider(dictionary: inputFeatures)
    let outputFeatures = try? model.prediction(from: inputProvider!)
    
    guard let output = outputFeatures?.multiArrayValue(for: "output") else {
        throw ModelError.outputExtractionFailed
    }
    
    return (0..<output.count).map { output[$0].doubleValue }
}

Model package inspection reveals hardware allocation predictions:

# The .mlmodel file is actually a directory structure
unzip -l model.mlmodel

# Look for the NeuralEngine blob allocation

Performance tuning requires understanding thermal state. ANE execution throttles when device temperature exceeds thresholds—profiling should occur after device thermal equilibrium (15 minutes of moderate use).

EXERCISE

Convert a MobileNet v2 model to Core ML format, inspect the model's expected inputs/outputs in Xcode's ML Model Viewer, and write Swift inference code that processes a camera frame.

← Chapter 5
TFLite Conversion
Chapter 7 →
ML Kit for Android