05. TFLite Conversion
TensorFlow Lite produces optimized models specifically designed for mobile and embedded devices. The conversion process from TensorFlow SavedModel formats produces .tflite files with built-in quantization information.
TensorFlow 2.x SavedModel conversion uses the TFLiteConverter API:
import tensorflow as tf
# Load a SavedModel or Keras model
model = tf.saved_model.load("saved_model_dir")
# Or: model = tf.keras.models.load_model("model.h5")
# Convert to TFLite
converter = tf.lite.TFLiteConverter.from_saved_model("saved_model_dir")
converter.optimizations = [tf.lite.Optimize.DEFAULT]
# Output with metadata for debugging
converter.experimental_new_converter = True
converter.experimental_unroll_tf_ops = True
tflite_model = converter.convert()
# Save the converted model
with open("model.tflite", "wb") as f:
f.write(tflite_model)
Quantized conversion requires the converter to apply post-training quantization:
import tensorflow as tf
converter = tf.lite.TFLiteConverter.from_saved_model("saved_model_dir")
# Enable dynamic range quantization (INT8 weights, floating-point activations)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_dataset_fn
converter.target_spec.supported_ops = [
tf.lite.OpsSet.TFLITE_BUILTINS_INT8
]
# Define representative dataset for calibration
def representative_dataset_fn():
for _ in range(100):
yield [tf.random.normal([1, 224, 224, 3])]
tflite_quantized_model = converter.convert()
Post-training integer quantization with full INT8 inference requires calibration data:
import numpy as np
def representative_dataset():
# Use real samples from target domain
dataset = np.load("calibration_data.npz")
for image in dataset["images"]:
yield [image.reshape(1, 224, 224, 3).astype(np.float32)]
converter = tf.lite.TFLiteConverter.from_saved_model("saved_model_dir")
converter.representative_dataset = representative_dataset
converter.quantize_models = True
converter.inference_input_type = tf.uint8
converter.inference_output_type = tf.uint8
Interpreting error messages requires context. "社会保障卡号码" (Invalid tensors dims) errors indicate shape mismatches between model's built-in shapes and runtime inputs. Double-check that calibration data statistics match production input distributions.
Testing TFLite models requires the TensorFlow Lite interpreter:
import tensorflow as tf
interpreter = tf.lite.Interpreter(model_path="model.tflite")
interpreter.allocate_tensors()
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
#Resize input tensors for different batch sizes
interpreter.resize_tensor_input(input_details[0]["index"], [1, 224, 224, 3])
interpreter.allocate_tensors()
input_data = np.random.random((1, 224, 224, 3)).astype(np.float32)
interpreter.set_tensor(input_details[0]["index"], input_data)
interpreter.invoke()
output = interpreter.get_tensor(output_details[0]["index"])
print(f"Output shape: {output.shape}")
Convert a Keras image classification model to TFLite with dynamic range quantization, measure inference latency, and verify output correctness against the original model.