K-Nearest Neighbors (KNN)
K-Nearest Neighbors (KNN) is a classical machine learning algorithm used for classification or regression. It works by finding the 'k' closest data points (neighbors) to a new input in a feature space and making a prediction based on the majority label (classification) or average value (regression) of those neighbors. Distance metrics like Euclidean or cosine similarity determine 'closeness.' KNN is a non-parametric, lazy learner—it stores the entire training dataset and defers computation until inference, making it memory-intensive and slow for large datasets. In local AI contexts, KNN is rarely used for text generation but appears in retrieval-augmented generation (RAG) pipelines for similarity search, often replaced by approximate nearest neighbor (ANN) algorithms for speed.
Deeper dive
KNN's simplicity is both its strength and weakness. It requires no training phase—just store the data. At inference, it computes distances to all stored points, sorts them, and picks the top k. This O(n) per query becomes prohibitive for large n. Operators tuning RAG systems might encounter KNN when using libraries like scikit-learn or FAISS with exact search. However, for vector databases (e.g., Chroma, Qdrant), exact KNN is impractical beyond ~10k vectors; approximate methods (HNSW, IVF) trade recall for speed. In LLM workflows, KNN is sometimes used for few-shot example selection: given a user query, find k similar examples from a pool to prepend to the prompt. But this is typically done via embedding similarity (cosine) rather than raw KNN on tokenized text. KNN's reliance on the full dataset means VRAM usage scales linearly with dataset size—a 1M vector dataset at 768-dim float32 consumes ~3 GB, which fits on consumer GPUs but slows inference to seconds per query.
Practical example
Consider a RAG pipeline with 100,000 text chunks, each embedded as a 768-dimensional vector (float32). Using exact KNN (k=5) with cosine similarity, a single query requires computing 100,000 dot products (~0.3 GFLOPS) and sorting. On an RTX 4090, this takes ~10 ms—acceptable. But with 1 million chunks, it jumps to ~100 ms. Operators often switch to FAISS with HNSW indexing, reducing latency to <1 ms while maintaining >95% recall.
Workflow example
In a RAG workflow using LangChain and Chroma, the default retrieval is exact KNN on embeddings. When you call vectorstore.similarity_search(query, k=5), Chroma computes distances to all stored vectors. For larger collections, operators configure an ANN index: vectorstore = Chroma(embedding_function, collection_metadata={"hnsw:space": "cosine"}). In llama.cpp, KNN isn't directly used; instead, the model's context window handles few-shot examples. But if you implement custom example selection in Python, you might use sklearn.neighbors.NearestNeighbors with metric='cosine'.
Reviewed by Fredoline Eruo. See our editorial policy.