RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
Glossary / Classical ML algorithms / K-Means Clustering
Classical ML algorithms

K-Means Clustering

K-Means Clustering is an unsupervised learning algorithm that partitions a dataset into K distinct, non-overlapping clusters. Each data point belongs to the cluster with the nearest mean (centroid). The algorithm iteratively assigns points to the closest centroid and recalculates centroids until convergence. Operators encounter K-Means in feature extraction pipelines, such as quantizing model weights or grouping similar embeddings for retrieval-augmented generation (RAG). It is computationally efficient for large datasets but sensitive to initial centroid placement and assumes spherical clusters of similar size.

Practical example

When building a RAG system, an operator might use K-Means to cluster document embeddings from a sentence transformer model. For a corpus of 100,000 documents, setting K=1000 groups similar documents together. At query time, the system first identifies the nearest cluster centroid, then searches only within that cluster, reducing latency from scanning all 100,000 documents to scanning roughly 100. This trade-off sacrifices some recall for speed.

Workflow example

In a local RAG workflow using LangChain, an operator runs from sklearn.cluster import KMeans to cluster embeddings generated by a local model like all-MiniLM-L6-v2. The operator sets n_clusters=500 and fits the model on the embeddings. The resulting cluster labels are stored alongside the documents. During retrieval, the query embedding is compared to centroids, and the nearest cluster's documents are passed to the LLM for answer generation.

Reviewed by Fredoline Eruo. See our editorial policy.

Buyer guides
  • Best GPU for local AI →
  • Best laptop for local AI →
  • Best Mac for local AI →
When it doesn't work
  • CUDA out of memory →
  • Ollama running slowly →
  • ROCm not detected →