Learning paradigms

Unsupervised Learning

Unsupervised learning is a machine learning paradigm where a model finds patterns in data without labeled examples. Unlike supervised learning, there are no correct answers provided during training. The model must infer structure—clusters, anomalies, or latent representations—from the input data alone. Common tasks include clustering (grouping similar data points) and dimensionality reduction (compressing data while preserving key features). For operators, unsupervised learning appears in embedding models (e.g., text embeddings from BERT) that create vector representations without explicit labels, enabling downstream tasks like semantic search or anomaly detection on local hardware.

Deeper dive

Unsupervised learning covers several families: clustering (k-means, DBSCAN), dimensionality reduction (PCA, t-SNE, UMAP), and generative modeling (autoencoders, GANs). In the context of local AI, the most relevant unsupervised technique is representation learning via self-supervised methods—models like BERT or CLIP are pre-trained on unlabeled text or image data by predicting masked tokens or contrasting positive pairs. These pre-trained embeddings can then be used for zero-shot classification or fine-tuned with minimal labels. Operators encounter unsupervised learning when using embedding models for retrieval-augmented generation (RAG) or when training custom autoencoders for anomaly detection on sensor data. The key operator consideration is that unsupervised models often require less human effort to prepare data but may need careful evaluation to ensure learned patterns are meaningful.

Practical example

An operator runs ollama pull nomic-embed-text to get a 137M parameter embedding model. This model was trained with unsupervised contrastive learning on 235M text pairs—no human-labeled categories. When the operator runs ollama run nomic-embed-text "What is the capital of France?", the model outputs a 768-dimensional vector. This vector can be compared to other vectors using cosine similarity to find semantically similar texts, enabling local semantic search without any labeled data.

Workflow example

In a local RAG pipeline, the operator uses an embedding model (e.g., sentence-transformers/all-MiniLM-L6-v2 via Hugging Face) to convert documents into vectors. The workflow: python -c "from sentence_transformers import SentenceTransformer; model = SentenceTransformer('all-MiniLM-L6-v2'); emb = model.encode('Your text here')". These vectors are stored in a local vector database like ChromaDB. When a user queries, the same model encodes the query, and the database retrieves the closest vectors—no labels needed. The operator monitors embedding quality by inspecting nearest-neighbor results for relevance.

Reviewed by Fredoline Eruo. See our editorial policy.