13. Image Embeddings

Chapter 13 of 18 · 15 min

KEY INSIGHT

Vision embeddings compress visual information into dense vectors capturing semantic content. Understanding embedding dimensionality and normalization affects retrieval accuracy significantly. Image embeddings transform pixel data into fixed-length vectors where semantically similar images cluster together. The embedding model determines what aspects of similarity matter for your use case. ```python import numpy as np from typing import Protocol from abc import ABC, abstractmethod class EmbeddingModel(Protocol): def embed(self, image_path: str) -> np.ndarray: ... def batch_embed(Self, image_paths: list[str]) -> list[np.ndarray]: ... class VertexEmbeddingModel: def __init__(self, model_name: str = "imagen-3.0-fast"): self.model_name = model_name # Vertex does not expose embedding models directly # Use multimodal models with image input async def embed_images(self, image_paths: list[str]) -> list[list[float]]: """ Generate embeddings via multimodal API. Returns list of embedding vectors. """ embeddings = [] for path in image_paths: # Encode image with open(path, "rb") as f: img_b64 = base64.b64encode(f.read()).decode() # Use vision model to generate description # Then embed description as proxy async with AsyncVertexAI() as client: desc_response = await client.messages.create( model="gemini-2.0-flash-thinking", messages=[{ "role": "user", "content": [ {"type": "image", "source": {"type": "base64", "data": img_b64}}, {"type": "text", "text": "Describe this image in exactly 10 words."} ] }] ) desc = desc_response.content[0].text # Embed description text embed_response = await client.models.embed_content( model="text-embedding-005", content=desc ) embeddings.append(embed_response.embedding) return embeddings def compute_similarity( self, emb1: list[float], emb2: list[float] ) -> float: """Cosine similarity between two embeddings""" v1 = np.array(emb1) v2 = np.array(emb2) norm1 = np.linalg.norm(v1) norm2 = np.linalg.norm(v2) return float(np.dot(v1, v2) / (norm1 * norm2)) def batch_similarity_matrix( self, embeddings: list[list[float]] ) -> np.ndarray: """Compute pairwise similarity matrix""" n = len(embeddings) matrix = np.zeros((n, n)) for i in range(n): for j in range(i, n): sim = self.compute_similarity(embeddings[i], embeddings[j]) matrix[i, j] = sim matrix[j, i] = sim return matrix ``` **Common Mistakes:** - Embedding mismatch when different runs use different model versions. Pin model versions. - Ignoring normalization: unnormalized embeddings produce misleading similarity scores. - Batch size limits: large images or batches cause timeout. Resize and chunk.

EXERCISE

Build an image deduplication system using embeddings. Generate embeddings for a folder of images, compute similarity matrix, and cluster duplicate candidates (similarity > threshold).