RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38

AI glossary

551 terms across 19 categories. 440 have full definitions today; the rest are cataloged and being written.

We focus depth on terms most relevant to running AI locally. Cloud-only and academic terms are listed for completeness but get less attention.

Core concepts & fields(18)Large language models(56)Transformer & LLM components(43)Natural language processing(28)Notable models & companies(18)Generative AI(23)Frameworks & tools(40)Neural network architectures(29)Hardware & infrastructure(39)Training & optimization(44)Computer vision(24)Agents & agentic AI(18)Evaluation metrics(27)Learning paradigms(23)Ethics, safety & society(23)Specialized domains(21)Data & datasets(34)Classical ML algorithms(27)MLOps & deployment(16)

Core concepts & fields18 terms · 7 defined

Artificial Intelligence (AI)
defined

Artificial Intelligence AI refers to systems that perform tasks typically requiring human intelligence, such as reasonin

Machine Learning (ML)
defined

Machine Learning ML is a field of AI where systems learn patterns from data without being explicitly programmed for ever

Deep Learning (DL)
defined

Deep learning DL is a subset of machine learning that uses multi-layer neural networks to learn patterns from data. In l

Neural Networks
defined

Neural networks are the computational architecture behind modern AI models. They consist of layers of interconnected nod

Artificial General Intelligence (AGI)
defined

Artificial General Intelligence AGI refers to a hypothetical AI system that can perform any intellectual task that a hum

Artificial Superintelligence (ASI)
defined

Artificial Superintelligence ASI refers to a hypothetical AI system that surpasses human intelligence across all domains

Inference (logical)
defined

Inference is the process of running a trained model on input data to generate an output — the "forward pass" that produc

Heuristics
stub
Automated Reasoning
stub
Expert Systems
stub
Narrow AI
stub
Symbolic AI
stub
Knowledge Representation
stub
Strong AI
stub
Cognitive Computing
stub
Weak AI
stub
Computational Intelligence
stub
Connectionism
stub

Large language models56 terms · 52 defined

Large Language Model (LLM)
defined

A Large Language Model is a neural network with billions of parameters trained on massive text corpora to predict the ne

Quantization
defined

Quantization is the process of reducing a model's numeric precision to shrink its memory footprint with minimal quality

Quantization
defined

Quantization is the process of reducing a model's numeric precision to shrink its memory footprint with minimal quality

Inference
defined

Inference is the act of running a trained model to generate predictions, as opposed to training which produces the model

Prompt
defined

A prompt is the input text you provide to a language model to generate a response. It can be a simple question, a set of

Retrieval-Augmented Generation (RAG)
defined

RAG is the pattern of retrieving relevant documents from a knowledge base and including them in the LLM's prompt so the

Hallucination
defined

Hallucination is when an LLM generates plausible-sounding but factually incorrect information — citing papers that don't

Prompt Engineering
defined

Prompt engineering is the practice of crafting model inputs to elicit better outputs without changing the model itself.

LoRA (Low-Rank Adaptation)
defined

LoRA is a parameter-efficient fine-tuning technique that adapts a large pre-trained model by training small low-rank mat

RLHF (Reinforcement Learning from Human Feedback)
defined

RLHF Reinforcement Learning from Human Feedback is a training method that fine-tunes a language model using human prefer

Fine-tuning
defined

Fine-tuning is continued training of a pre-trained model on a smaller, task-specific dataset. Pre-training builds genera

Embedding (Vector Embedding)
defined

An embedding is a fixed-length vector representation of text, image, or other input — typically 384-3072 dimensions — wh

Foundation Model
defined

A foundation model is a large neural network trained on broad data at scale, designed to be adapted for a wide range of

Chain-of-Thought (CoT)
defined

Chain-of-thought prompting is asking a model to show its reasoning step-by-step before giving the final answer. It drama

Latency
defined

Latency measures how fast you get a response. Two metrics matter for local LLMs: Time to First Token TTFT — wall-clock

Vector Database
defined

A vector database stores and retrieves data as high-dimensional vectors embeddings rather than rows or documents. In loc

Alignment
defined

Alignment refers to the process of fine-tuning a base LLM so its outputs match human preferences, values, or safety guid

GGUF
defined

GGUF GGML Unified Format is the file format used by llama.cpp and its ecosystem Ollama, KoboldCPP, LM Studio. A single f

Pre-training
defined

Pre-training is the initial phase where a large language model learns from a vast, diverse corpus of text data e.g., web

System Prompt
defined

A system prompt is the initial instruction or context prepended to a conversation with an LLM. It sets the model's behav

Throughput
defined

Throughput measures how much work a system completes per unit time — typically tokens-per-second across all concurrent r

Instruction Tuning
defined

Instruction tuning is a supervised fine-tuning step where a base language model is trained on instruction, response pair

QLoRA
defined

QLoRA combines LoRA/glossary/lora fine-tuning with 4-bit quantization of the base model. Introduced by Tim Dettmers in 2

Semantic Search
defined

Semantic search retrieves results based on meaning rather than exact keyword matches. Instead of looking for literal wor

Direct Preference Optimization (DPO)
defined

Direct Preference Optimization DPO is a method for fine-tuning language models to align with human preferences without u

Few-Shot Prompting
defined

Few-shot prompting is a technique where you include a small number of input-output examples in the prompt to guide the m

In-Context Learning
defined

In-context learning ICL is a capability of large language models where the model adapts its behavior based solely on exa

Jailbreak
defined

A jailbreak is a prompt designed to bypass the safety guardrails of an LLM, causing it to generate content it would norm

ORPO (Odds Ratio Preference Optimization)
defined

ORPO Odds Ratio Preference Optimization is a fine-tuning method that combines supervised fine-tuning SFT and preference

Prompt Injection
defined

Prompt injection is a security exploit where a crafted input overrides the system prompt or instruction set of an LLM, c

Zero-Shot Prompting
defined

Zero-shot prompting is a technique where you give a language model a task description or instruction without providing a

DoRA (Weight-Decomposed Low-Rank Adaptation)
defined

DoRA Weight-Decomposed Low-Rank Adaptation is a fine-tuning method that improves upon LoRA by decomposing pre-trained we

KV Cache Quantization
defined

KV cache quantization reduces the memory footprint of the key-value KV cache by storing its entries in lower-precision f

Speculative Decoding
defined

Speculative decoding speeds up LLM inference by using a small fast "draft" model to propose the next several tokens, the

Distillation
defined

Distillation is a training technique where a smaller 'student' model learns to mimic the behavior of a larger 'teacher'

Guardrails
defined

Guardrails are runtime constraints or filters applied to an LLM's input and output to enforce safety, compliance, or for

Parameter-Efficient Fine-Tuning (PEFT)
defined

Parameter-Efficient Fine-Tuning PEFT is a set of techniques that adapt a pre-trained large language model to a specific

ReAct
defined

ReAct Reasoning + Acting is a prompting technique that interleaves chain-of-thought reasoning with tool-use actions. In

Red Teaming
defined

Red teaming is the practice of systematically probing an LLM to find failure modes: harmful outputs, jailbreaks, halluci

Chunked Prefill
defined

Chunked prefill is an inference-engine technique that splits long-prompt processing into smaller chunks so the engine ca

Dense Retrieval
defined

Dense retrieval finds documents by computing cosine similarity or dot product between learned vector embeddings of the q

Reranker (Cross-Encoder)
defined

A reranker is a cross-encoder model that scores query/document pairs jointly concatenated as input, producing a relevanc

Hybrid Retrieval
defined

Hybrid retrieval combines dense and sparse retrieval, typically by union-then-rerank or reciprocal rank fusion RRF. The

Constitutional AI
defined

Constitutional AI CAI is a training method that aligns language model behavior using a set of written rules a 'constitut

Grounding
defined

Grounding connects a language model's output to verifiable external sources documents, databases, APIs to reduce halluci

Knowledge Distillation
defined

Knowledge distillation is a technique where a smaller, faster 'student' model is trained to mimic the behavior of a larg

Proximal Policy Optimization (PPO)
defined

Proximal Policy Optimization PPO is a reinforcement learning algorithm used to fine-tune large language models LLMs with

RLAIF (RL from AI Feedback)
defined

RLAIF Reinforcement Learning from AI Feedback is a technique for fine-tuning language models where an AI system, rather

BM25 (Best Matching 25)
defined

BM25 is the canonical sparse-retrieval algorithm: a TF-IDF variant that saturates term frequency a token appearing 100 t

Adapter
stub
Pruning
stub
Sycophancy
defined

Sycophancy in LLMs refers to the model's tendency to agree with a user's stated or implied position, even when that posi

Tree of Thoughts
defined

Tree of Thoughts ToT is a prompting strategy that expands a single chain of reasoning into a tree of multiple reasoning

Sparse Retrieval
defined

Sparse retrieval scores documents by lexical overlap with the query — high-dimensional vectors where most entries are ze

Catastrophic Forgetting
stub
Mode Collapse
stub

Transformer & LLM components43 terms · 36 defined

KV Cache
defined

The KV cache stores the key and value tensors from previous attention computations so the model doesn't recompute them a

Context Window
defined

The context window is the maximum number of tokens a model can attend to at once — both prompt and previously generated

Attention Mechanism
defined

The attention mechanism is a neural network component that lets a model weigh the importance of different parts of the i

Token
defined

A token is the smallest unit of text a language model processes. Most modern models use subword tokenization, where comm

Self-Attention
defined

Self-attention computes a weighted representation of every position in a sequence by comparing each token against every

Tokenization
defined

Tokenization is the process of converting text into the numeric tokens a model can process. Modern systems use subword t

Multi-Head Attention
defined

Multi-Head Attention is a mechanism in transformer models where the input is projected into multiple parallel 'attention

Multi-Head Latent Attention (MLA)
defined

Multi-Head Latent Attention MLA is an attention mechanism used in DeepSeek V2/V3 that compresses the key-value KV cache

Prefill (Prompt Processing)
defined

Prefill is the first phase of LLM inference: the model processes the entire prompt in a single parallel pass, building u

Decode (Token Generation)
defined

Decode is the second phase of LLM inference: generating one output token at a time, autoregressively. Each decode step d

Flash Attention
defined

Flash Attention is a memory-efficient implementation of the attention mechanism that reduces memory usage from On² to On

Sliding Window Attention (SWA)
defined

Sliding Window Attention SWA is an attention pattern where each token only attends to a fixed-size window of nearby toke

Temperature (sampling)
defined

Temperature is a sampling parameter that controls the randomness of token selection during text generation. It scales th

Decoder
defined

A decoder is the component of a transformer model that generates output tokens one at a time, using the input's encoded

Encoder
defined

An encoder is a neural network component that processes input data text, images, audio into a dense representation—a vec

Grouped-Query Attention (GQA)
defined

Grouped-Query Attention GQA is a variant of multi-head attention that reduces memory and compute costs by sharing key-va

Rotary Position Embedding (RoPE)
defined

Rotary Position Embedding RoPE is a method for encoding token position in transformer models by rotating query and key v

Multi-Query Attention (MQA)
defined

Multi-Query Attention MQA is a transformer attention variant where all attention heads share a single key/value projecti

PagedAttention
defined

PagedAttention is the memory layout introduced by vLLM that stores the KV cache in fixed-size blocks pages, like virtual

Sampling (Decoding)
defined

Sampling is the process of converting model logits into output tokens. Common strategies: greedy temperature 0, random s

Byte Pair Encoding (BPE)
defined

Byte Pair Encoding BPE is a subword tokenization algorithm that splits text into a sequence of tokens by iteratively mer

Encoder-Decoder
defined

An encoder-decoder is a neural network architecture that processes an input sequence through an encoder to produce a com

Top-p (Nucleus) Sampling
defined

Top-p nucleus sampling is a text generation strategy that selects from the smallest set of tokens whose cumulative proba

Temperature 0 (Greedy Sampling)
defined

Temperature 0 disables sampling entirely — the model picks the highest-logit token at every step. Equivalent to greedy d

Cross-Attention
defined

Cross-attention is a mechanism in transformer models where the query vectors come from one sequence e.g., the decoder's

Positional Encoding
defined

Positional encoding is a technique used in transformer models to inject information about the position of tokens in a se

Softmax
stub
Top-k Sampling
defined

Top-k sampling is a text-generation strategy that restricts the model's next-token choices to the k tokens with the high

Deterministic Decoding
defined

Deterministic decoding means same prompt → same output, every time. Achieved by setting temperature to 0 always pick the

Beam Search
stub
Layer Normalization
defined

Layer normalization is a technique that stabilizes training and inference by normalizing activations across the features

Logits
defined

Logits are the raw, unnormalized scores output by the final linear layer of a transformer model, before the softmax func

Random Seed
defined

A random seed initializes the pseudo-random generator that drives sampling at temperature > 0. Same seed + same prompt +

RMSNorm
defined

RMSNorm is a simpler variant of LayerNorm that normalizes activations by their root-mean-square instead of their varianc

YaRN (Yet another RoPE eNlargement)
defined

YaRN is a context-extension method that modifies RoPE frequencies to let a model trained on, say, 8K context generalize

Greedy Decoding
stub
Residual Connection
stub
SentencePiece
stub
SwiGLU
defined

SwiGLU is a gated feed-forward activation: W1·x ⊙ swishW2·x · W3, replacing the standard MLP's GELU/ReLU in modern trans

WordPiece
stub
ALiBi (Attention with Linear Biases)
defined

ALiBi is a positional encoding scheme that biases attention scores by a linear function of token distance, instead of in

Mirostat Sampling
defined

Mirostat is a sampling algorithm that targets a fixed perplexity-like "surprise" level tau instead of a fixed top-p or t

Feed-Forward Network
stub

Natural language processing28 terms · 21 defined

GPT (architecture)
defined

GPT Generative Pre-trained Transformer is a decoder-only Transformer architecture that predicts the next token in a sequ

Natural Language Processing (NLP)
defined

Natural Language Processing NLP is the field of AI focused on enabling computers to understand, interpret, and generate

BERT
defined

BERT Bidirectional Encoder Representations from Transformers is a transformer-based language model that reads text in bo

Language Modeling
defined

Language modeling is the task of predicting the next token word, subword, or character in a sequence given the preceding

Text Generation
defined

Text generation is the process where a language model produces coherent sequences of tokens words or subwords in respons

Automatic Speech Recognition (ASR)
defined

Automatic Speech Recognition ASR converts spoken audio into text. Operators encounter ASR when running models like Whisp

Machine Translation
defined

Machine translation MT is the task of automatically translating text from one natural language to another using a neural

Sentiment Analysis
defined

Sentiment analysis is a text classification task where a model assigns a label e.g., positive, negative, neutral to a pi

Text Summarization
defined

Text summarization is a natural language processing task where a model generates a shorter version of a longer text whil

Text-to-Speech (TTS)
defined

Text-to-Speech TTS converts written text into spoken audio using neural models. Operators encounter TTS when running loc

Word Embedding
defined

A word embedding is a dense vector of floating-point numbers that maps a word or token to a point in a high-dimensional

Word2Vec
defined

Word2Vec is an algorithm that learns dense vector representations embeddings of words from large text corpora. Each word

Named Entity Recognition (NER)
defined

Named Entity Recognition NER is an NLP task that identifies and classifies named entities e.g., person, organization, lo

Question Answering
defined

Question answering QA is a natural language processing task where a model receives a question and returns a concise answ

Text Classification
defined

Text classification is a natural language processing task where a model assigns a predefined category label to a piece o

GloVe
defined

GloVe Global Vectors for Word Representation is a static word embedding method that learns vector representations of wor

Speech Synthesis
defined

Speech synthesis, also known as text-to-speech TTS, converts written text into spoken audio. In local AI, operators run

T5
defined

T5 Text-to-Text Transfer Transformer is a sequence-to-sequence model from Google that converts every NLP task into a tex

FastText
defined

FastText is a library for efficient learning of word representations and sentence classification, developed by Facebook

N-gram
defined

An n-gram is a contiguous sequence of n items usually tokens or characters from a text. In local AI, n-grams appear in t

Topic Modeling
defined

Topic modeling is an unsupervised NLP technique that discovers latent themes topics across a collection of documents. It

Latent Dirichlet Allocation (LDA)
stub
Part-of-Speech Tagging
stub
Stop Words
stub
Dependency Parsing
stub
Lemmatization
stub
Stemming
stub
Coreference Resolution
stub

Notable models & companies18 terms · 17 defined

GPT-4
defined

GPT-4 is a large multimodal language model developed by OpenAI, released in March 2023. It accepts text and image inputs

OpenAI
defined

OpenAI is the organization that developed the GPT series of large language models GPT-3, GPT-4, GPT-4o and the DALL-E im

Llama (Meta)
defined

Llama is a family of open-weight large language models LLMs developed by Meta, starting with Llama 1 in 2023 and continu

Anthropic
defined

Anthropic is an AI safety and research company that develops large language models LLMs under the Claude family. Operato

Claude (Anthropic)
defined

Claude is a family of large language models LLMs developed by Anthropic, designed for safe and helpful text generation.

NVIDIA
defined

NVIDIA designs the GPUs most operators use for local AI inference. Its consumer RTX series e.g., RTX 4090 and workstatio

DeepSeek
defined

DeepSeek is a family of open-weight large language models developed by DeepSeek 深度求索, a Chinese AI research company. The

GPT-5
defined

GPT-5 is the hypothetical successor to OpenAI's GPT-4 model family. As of early 2025, no official GPT-5 model has been r

Gemini (Google)
defined

Gemini is a family of multimodal large language models LLMs developed by Google DeepMind, designed to process text, imag

Google DeepMind
defined

Google DeepMind is an AI research lab formed from the 2023 merger of Google Brain and DeepMind. It develops large langua

Hugging Face
defined

Hugging Face is a platform and company that hosts a vast repository of open-source machine learning models, datasets, an

Qwen
defined

Qwen is a family of large language models LLMs developed by Alibaba Cloud, ranging from 0.5B to 110B parameters. Operato

Meta AI
defined

Meta AI is the artificial intelligence research division of Meta Platforms formerly Facebook. For local AI operators, Me

Mistral
defined

Mistral is a family of open-weight large language models LLMs developed by Mistral AI, known for their efficiency and st

Stability AI
defined

Stability AI is the company behind the Stable Diffusion family of image generation models, which operators run locally v

Grok (xAI)
defined

Grok is a family of large language models LLMs developed by xAI, led by Elon Musk. The first version, Grok-1, was releas

Phi (Microsoft)
defined

Phi is a family of small language models SLMs developed by Microsoft, designed to run efficiently on consumer hardware l

Command (Cohere)
stub

Generative AI23 terms · 14 defined

Generative AI (GenAI)
defined

Generative AI GenAI refers to machine learning models that produce new content—text, images, audio, code, or video—by le

Deepfake
defined

A deepfake is a synthetic media image, video, or audio generated or manipulated by a deep learning model, typically an a

Generative Model
defined

A generative model is a type of machine learning model that learns the underlying distribution of training data and can

ControlNet
defined

ControlNet is a neural network architecture that adds spatial conditioning to pretrained image diffusion models like Sta

Latent Diffusion
defined

Latent diffusion is a technique used in image generation models like Stable Diffusion that applies the diffusion process

Video Generation
defined

Video generation refers to the process of creating new video content from text prompts, images, or other video inputs us

Autoregressive Models
defined

Autoregressive models generate text one token at a time, where each new token depends on all previously generated tokens

Latent Space
defined

Latent space is the internal, compressed representation of data that a generative model learns during training. It is a

Voice Cloning
defined

Voice cloning is the process of generating synthetic speech that mimics a specific person's voice, including timbre, pit

Audio Generation
defined

Audio generation refers to the process of creating audio content—such as speech, music, or sound effects—using machine l

DreamBooth
defined

DreamBooth is a fine-tuning technique that personalizes a text-to-image model like Stable Diffusion to generate images o

StyleGAN
defined

StyleGAN is a generative adversarial network GAN architecture designed for high-resolution image synthesis, introduced b

DDPM (Denoising Diffusion Probabilistic Models)
defined

DDPM Denoising Diffusion Probabilistic Models is a class of generative models that learn to generate data by reversing a

Discriminative Model
stub
Music Generation
defined

Music generation refers to the use of AI models to produce audio or symbolic representations of music e.g., MIDI, sheet

3D Generation
stub
Conditional GAN
stub
CycleGAN
stub
Flow Matching
stub
Textual Inversion
stub
DDIM (Denoising Diffusion Implicit Models)
stub
Normalizing Flows
stub
Score-Based Models
stub

Frameworks & tools40 terms · 38 defined

Ollama
defined

Ollama is a runtime and CLI tool for running large language models locally on consumer hardware. It wraps llama.cpp and

PyTorch
defined

PyTorch is an open-source machine learning framework developed by Meta. It provides tensor computation with GPU accelera

llama.cpp
defined

llama.cpp is a C++ inference engine for running large language models LLMs locally on consumer hardware. It loads quanti

vLLM
defined

vLLM is an open-source inference engine optimized for high-throughput, low-latency serving of large language models. It

Hugging Face Transformers
defined

Hugging Face Transformers is a Python library that provides pre-trained models and tools for natural language processing

LM Studio
defined

LM Studio is a desktop application that provides a graphical interface for downloading, managing, and running local larg

LangChain
defined

LangChain is a Python/TypeScript framework for building applications that chain together LLM calls, external data source

TensorFlow
defined

TensorFlow is an open-source machine learning framework developed by Google. Operators encounter it as an alternative to

scikit-learn
defined

scikit-learn is a Python library for classical machine learning regression, classification, clustering, dimensionality r

text-generation-webui (oobabooga)
defined

text-generation-webui often called oobabooga is a browser-based interface for running large language models locally. It

ExLlamaV2
defined

ExLlamaV2 is a high-performance inference engine for Llama-family models, optimized for GPU execution. It achieves faste

KoboldCpp
defined

KoboldCpp is a single-file, self-contained executable that bundles llama.cpp with a web-based UI and a built-in API, des

LlamaIndex
defined

LlamaIndex is a data framework for building retrieval-augmented generation RAG applications. It provides tools to ingest

OpenCV
defined

OpenCV Open Source Computer Vision Library is a C++ library with Python bindings for real-time image and video processin

Continuous Batching
defined

Continuous batching sometimes "iteration-level scheduling" is a serving optimization where new requests join the active

Hugging Face Text Generation Inference (TGI)
defined

Hugging Face Text Generation Inference TGI is a production-grade inference server for large language models, optimized f

Gradio
defined

Gradio is an open-source Python library for quickly building web-based user interfaces for machine learning models. Oper

JAX
defined

JAX is a numerical computing library from Google that combines NumPy-like array operations with automatic differentiatio

Keras
defined

Keras is a high-level neural network API, written in Python and capable of running on top of TensorFlow, JAX, or PyTorch

MLC LLM
defined

MLC LLM Machine Learning Compilation for Large Language Models is a framework that compiles LLMs into deployable binarie

SGLang
defined

SGLang is an open-source LLM inference engine focused on high throughput for structured generation and complex agent wor

Streamlit
defined

Streamlit is an open-source Python framework for turning data scripts into interactive web apps with minimal code. Opera

Prefix Caching
defined

Prefix caching stores the KV cache from previous requests so a new request that shares a prefix system prompt, few-shot

Request Batching
defined

Request batching packs multiple inference requests into a single forward pass to amortize the cost of loading model weig

MPS (Metal Performance Shaders)
defined

MPS is Apple's high-level Metal-based compute library, exposed in PyTorch as the mps device backend. Calling model.to"mp

Airflow
defined

Airflow is a workflow orchestration tool for scheduling, monitoring, and managing complex data pipelines as directed acy

MLflow
defined

MLflow is an open-source platform for managing the end-to-end machine learning lifecycle, including experimentation, rep

Ray
defined

Ray is an open-source distributed computing framework for scaling AI workloads across multiple machines. Operators encou

Triton Inference Server
defined

Triton Inference Server is an open-source inference serving software by NVIDIA that manages multiple AI models across GP

Weights & Biases
defined

Weights & Biases W&B is a cloud-based MLOps platform for tracking experiments, visualizing metrics, and managing model a

spaCy
defined

spaCy is a Python library for industrial-strength natural language processing NLP that provides pre-trained pipelines fo

FAISS
defined

FAISS Facebook AI Similarity Search is a C++/Python library for fast approximate nearest-neighbor search over dense vect

GGML
defined

GGML is the C/C++ tensor library that underlies llama.cpp, whisper.cpp, and the original GGUF format. It provides quanti

Vulkan Compute
defined

Vulkan compute is the cross-vendor GPU compute API from Khronos. llama.cpp ships a Vulkan backend that runs on AMD, Inte

NLTK
defined

NLTK Natural Language Toolkit is a Python library for classical NLP tasks like tokenization, stemming, tagging, and pars

TensorBoard
defined

TensorBoard is a visualization toolkit from TensorFlow for inspecting model training metrics, graph structures, and weig

DirectML
defined

DirectML is Microsoft's GPU-agnostic ML acceleration API, layered on DirectX 12. It works on any Windows-supported GPU —

Expert Parallelism
defined

Expert parallelism is a parallelism strategy specific to MoE models: each GPU holds a different subset of the experts, a

Kubeflow
stub
DVC
stub

Neural network architectures29 terms · 24 defined

Transformer
defined

The Transformer is a neural network architecture introduced in 2017 that replaced recurrent layers with a self-attention

Diffusion Model
defined

A diffusion model is a type of generative model that learns to reverse a gradual noising process. During training, the m

Convolutional Neural Network (CNN)
defined

A Convolutional Neural Network CNN is a neural network architecture that uses convolutional layers to process grid-like

Generative Adversarial Network (GAN)
defined

A Generative Adversarial Network GAN is a machine learning architecture where two neural networks—a generator and a disc

Mixture of Experts (MoE)
defined

Mixture of Experts is a neural network architecture where multiple specialized sub-networks "experts" exist, but only a

Multimodal AI
defined

Multimodal AI refers to models that process and generate multiple data types—typically text, images, and sometimes audio

Vision-Language Model (VLM)
defined

A Vision-Language Model VLM processes both images and text, enabling tasks like image captioning, visual question answer

Long Short-Term Memory (LSTM)
defined

Long Short-Term Memory LSTM is a recurrent neural network RNN architecture designed to model sequential data while avoid

Recurrent Neural Network (RNN)
defined

A Recurrent Neural Network RNN is a neural network architecture designed for sequential data, where each output depends

Multi-Layer Perceptron (MLP)
defined

A Multi-Layer Perceptron MLP is a feedforward neural network composed of at least three layers: an input layer, one or m

Residual Network (ResNet)
defined

A Residual Network ResNet is a neural network architecture that introduces skip connections also called shortcut connect

Vision Transformer (ViT)
defined

A Vision Transformer ViT is a neural network architecture that applies the Transformer model, originally designed for te

Decoder-Only Transformer
defined

Decoder-only is the architecture of GPT, Llama, Qwen, Mistral, DeepSeek, and almost every modern open-weight LLM. The mo

Autoencoder
defined

An autoencoder is a neural network trained to reconstruct its input after passing it through a bottleneck layer. The bot

Graph Neural Network (GNN)
defined

A Graph Neural Network GNN is a neural network architecture designed to process data structured as graphs—nodes connecte

Perceptron
defined

A perceptron is the simplest form of a neural network: a single linear unit that takes weighted inputs, sums them, adds

State Space Models (Mamba)
defined

State Space Models SSMs, notably the Mamba architecture, are a class of sequence models that process tokens in linear ti

U-Net
defined

U-Net is a convolutional neural network architecture designed for image segmentation tasks. It consists of a contracting

Variational Autoencoder (VAE)
defined

A Variational Autoencoder VAE is a generative neural network that learns a compressed latent representation of input dat

Dense Model
defined

A dense model activates every parameter on every forward pass — the default architecture for transformers like Llama, Qw

Neural Radiance Field (NeRF)
defined

A Neural Radiance Field NeRF is a neural network that represents a 3D scene as a continuous function mapping a 3D locati

MoE Routing
defined

MoE routing is the gating mechanism that decides which experts a token activates in a Mixture-of-Experts layer. Top-k ro

Feedforward Neural Network
defined

A feedforward neural network FFNN is the simplest type of neural network where connections between nodes do not form cyc

Gated Recurrent Unit (GRU)
stub
Encoder-Decoder Transformer
defined

Encoder-decoder transformers T5, BART, original "Attention is All You Need" architecture have two halves: an encoder rea

Siamese Network
stub
Neural ODE
stub
Spiking Neural Network
stub
Capsule Network
stub

Hardware & infrastructure39 terms · 39 defined

VRAM (Video RAM)
defined

VRAM is the dedicated memory on a GPU. For local AI, VRAM capacity is the single most important spec — it determines whi

GPU
defined

A GPU Graphics Processing Unit is a specialized processor designed for parallel computation, originally for graphics but

CUDA
defined

CUDA Compute Unified Device Architecture is NVIDIA's parallel-computing platform and the dominant API for GPU-accelerate

CPU Offload
defined

CPU offload is a technique where parts of a neural network model are processed by the CPU instead of the GPU, typically

Edge AI
defined

Edge AI refers to running machine learning models locally on consumer hardware laptops, phones, GPUs rather than sending

VRAM Bandwidth
defined

VRAM bandwidth is the rate at which the GPU's video memory can transfer data to the compute cores, measured in GB/s. For

MLX (Apple)
defined

MLX is Apple's open-source array framework optimized for Apple Silicon. The Apple equivalent of PyTorch + CUDA, with fir

TPU (Tensor Processing Unit)
defined

A Tensor Processing Unit TPU is a custom ASIC designed by Google specifically for accelerating machine learning workload

Distributed Training
defined

Distributed training splits the work of training a neural network across multiple GPUs or machines, using techniques lik

Edge Computing
defined

Edge computing means running AI inference on a local device laptop, phone, embedded system instead of sending data to a

FLOPS
defined

FLOPS Floating Point Operations Per Second measures how many floating-point calculations a processor can perform in one

FP16
defined

FP16 16-bit floating point is a number format that uses 16 bits per weight or activation, balancing precision and memory

NPU (Neural Processing Unit)
defined

A Neural Processing Unit NPU is a specialized hardware accelerator designed to execute neural network operations efficie

On-Device AI
defined

On-device AI refers to running machine learning models directly on local hardware CPU, GPU, NPU rather than sending data

GDDR7
defined

GDDR7 uses PAM3 signaling to push per-pin rates to 28–32 Gbps in first-gen products 2025, with a path to 40+ Gbps. RTX 5

Unified Memory
defined

Unified memory is a memory architecture where CPU and GPU share the same physical RAM pool, eliminating CPU↔GPU copies.

BF16 (BFloat16)
defined

BF16 Brain Floating Point 16 is a 16-bit floating-point number format that uses 8 exponent bits and 7 mantissa bits, mat

Data Parallelism
defined

Data parallelism is a distributed training strategy where a model is replicated across multiple devices GPUs or nodes, a

DeepSpeed
defined

DeepSpeed is a deep learning optimization library by Microsoft that reduces memory usage and speeds up training for larg

FP8
defined

FP8 Floating Point 8 is an 8-bit floating-point number format used in AI inference and training to reduce memory and com

HBM (High Bandwidth Memory)
defined

HBM High Bandwidth Memory is a 3D-stacked DRAM design that vertically layers memory dies with through-silicon vias TSVs

Mixed Precision
defined

Mixed precision is a technique that uses different numerical precisions e.g., FP16 and FP32 for different parts of a mod

NVLink
defined

NVLink is NVIDIA's proprietary GPU-to-GPU interconnect, used to bind multiple data-center GPUs into a coherent memory fa

ONNX
defined

ONNX Open Neural Network Exchange is an open-source format for representing machine learning models, designed to enable

ROCm (AMD)
defined

ROCm Radeon Open Compute is AMD's open-source equivalent of NVIDIA's CUDA. It's required for any meaningful AMD GPU infe

Tensor Core
defined

Tensor Cores are specialized hardware units on NVIDIA GPUs Volta architecture and later that perform fused multiply-add

Tensor Parallelism
defined

Tensor parallelism splits each transformer layer's weight matrices across multiple GPUs. Card 0 holds the first half of

TensorRT
defined

TensorRT is NVIDIA's SDK for optimizing and deploying deep learning models on NVIDIA GPUs. It performs graph optimizatio

FSDP (Fully Sharded Data Parallel)
defined

FSDP Fully Sharded Data Parallel is a distributed training technique that shards model parameters, gradients, and optimi

INT8
defined

INT8 8-bit integer is a numerical format that uses 8 bits to represent integers, typically in the range -128, 127 for si

Metal (Apple)
defined

Metal is Apple's low-level GPU programming framework and API, analogous to Vulkan on other platforms. For local AI opera

Model Parallelism
defined

Model parallelism is a technique that splits a single neural network across multiple GPUs or other accelerators, with ea

ZeRO optimizer
defined

ZeRO Zero Redundancy Optimizer is a memory optimization technique for distributed training of large models. It partition

cuDNN
defined

cuDNN CUDA Deep Neural Network library is NVIDIA's GPU-accelerated library for deep learning primitives like convolution

NVSwitch
defined

NVSwitch is the crossbar that connects 8 or in NVL72, 72 GPUs into a single all-to-all NVLink fabric. Each GPU talks to

FP32
defined

FP32 32-bit floating point is a numerical format that uses 32 bits to represent each model weight, offering high precisi

INT4
defined

INT4 is a quantization format that stores each model weight using 4 bits, reducing memory usage by roughly 4× compared t

Pipeline Parallelism
defined

Pipeline parallelism a.k.a. "layer split" in llama.cpp parlance puts whole layers on different GPUs. Card 0 handles laye

Vulkan compute
defined

Vulkan compute is a cross-platform GPU compute API that runs inference workloads on GPUs without requiring CUDA. In loca

Training & optimization44 terms · 31 defined

Q4_K_M Quantization
defined

Q4KM is the most-downloaded GGUF quantization on Hugging Face — the default tradeoff for local inference. It mixes 6-bit

AWQ
defined

AWQ Activation-aware Weight Quantization is a 4-bit quantization method designed for fast inference on NVIDIA GPUs. It's

Backpropagation
defined

Backpropagation is the algorithm used to train neural networks by computing gradients of the loss function with respect

Dropout
defined

Dropout is a regularization technique used during neural network training where randomly selected neurons are ignored dr

Gradient Descent
defined

Gradient descent is an optimization algorithm that iteratively adjusts model weights to minimize a loss function. In loc

Overfitting
defined

Overfitting occurs when a model learns training data too well, including noise and irrelevant patterns, at the cost of g

Q5_K_M Quantization
defined

Q5KM is a mixed-precision GGUF quantization averaging ~5.7 bits per parameter. Attention and feed-forward weights use 6-

Q8_0 Quantization
defined

Q80 is llama.cpp's simplest 8-bit GGUF quantization: weights in INT8, one FP16 scale per 32-element block, no zero-point

Adam Optimizer
defined

Adam Adaptive Moment Estimation is an optimizer that adjusts learning rates per parameter during training. It combines m

Batch Normalization
defined

Batch normalization is a training technique that normalizes the inputs to a layer across a mini-batch of data. It comput

Hyperparameter
defined

A hyperparameter is a configuration variable set before training begins that controls the learning process, not a parame

Learning Rate
defined

Learning rate is a hyperparameter that controls how much the model's weights are adjusted during each training step. A h

Loss Function
stub
Stochastic Gradient Descent (SGD)
defined

Stochastic Gradient Descent SGD is an optimization algorithm used during model training to minimize the loss function. U

GPTQ
defined

GPTQ Generative Pre-trained Transformer Quantization is a one-shot post-training quantization method that uses approxima

Q4_0 Quantization
defined

Q40 is the original llama.cpp 4-bit quantization: INT4 weights with one FP16 scale per 32-element block, no zero-point,

AdamW
defined

AdamW is an optimizer algorithm used during fine-tuning or training of neural networks, including LLMs. It modifies the

Batch Size
defined

Batch size is the number of training samples processed together in one forward and backward pass. In local AI training,

Cross-Entropy Loss
stub
Hyperparameter Tuning
defined

Hyperparameter tuning is the process of selecting the configuration values that control how a model trains, such as lear

Regularization
defined

Regularization is a set of techniques used during model training to prevent overfitting—where the model memorizes traini

EXL2
defined

EXL2 is the ExLlamaV2 quantization format. NVIDIA-only, single-stream-throughput-optimized. Allows fractional bit-rates

Bias-Variance Tradeoff
defined

The bias-variance tradeoff describes the tension between a model's ability to fit training data closely low bias and its

Epoch
defined

An epoch is one complete pass through the entire training dataset during model training. In practice, operators fine-tun

HQQ (Half-Quadratic Quantization)
defined

HQQ Half-Quadratic Quantization is a calibration-free quantization method that produces 2-, 3-, 4-, and 8-bit weight qua

L1 / L2 Regularization
stub
Mean Squared Error (MSE)
stub
Q3_K_M Quantization
defined

Q3KM is a 3-bit GGUF K-quant averaging ~3.9 bits per parameter. It's the smallest format that still produces usable outp

Underfitting
stub
Vanishing Gradient
defined

The vanishing gradient problem occurs when gradients used to update model weights become extremely small as they are bac

Early Stopping
defined

Early stopping is a training technique that halts model training when performance on a validation set stops improving, p

Exploding Gradient
defined

An exploding gradient occurs when the gradients used to update model weights during training grow exponentially large, c

Gradient Clipping
defined

Gradient clipping is a technique used during neural network training to prevent exploding gradients. It caps the gradien

Learning Rate Schedule
defined

A learning rate schedule adjusts the step size learning rate during training to improve convergence and model quality. I

Weight Decay
defined

Weight decay is a regularization technique used during model training that adds a penalty proportional to the squared ma

Checkpoint
stub
Grid Search
stub
KL Divergence
stub
Mini-Batch
stub
Warmup
stub
Bayesian Optimization
stub
Momentum
stub
Q2_K Quantization
defined

Q2K is 2-bit GGUF quantization averaging ~3.0 bits per parameter with mandatory 4-bit scales and importance metadata. It

RMSprop
stub

Computer vision24 terms · 20 defined

Stable Diffusion
defined

Stable Diffusion is a text-to-image model that generates images from text prompts using a diffusion process. It runs on

Object Detection
defined

Object detection is a computer vision task that identifies and localizes specific objects within an image or video frame

DALL-E
defined

DALL-E is a family of text-to-image generative models developed by OpenAI. Operators encounter it as a cloud-only API se

Image Classification
defined

Image classification is a computer vision task where a model assigns a single label from a predefined set to an input im

Midjourney
defined

Midjourney is a proprietary text-to-image AI service accessible via Discord, not a local model. Operators cannot downloa

Optical Character Recognition (OCR)
defined

Optical Character Recognition OCR is the process of converting images of text—scanned documents, photos, or screenshots—

YOLO
defined

YOLO You Only Look Once is a family of real-time object detection models that process an entire image in a single forwar

Face Recognition
defined

Face recognition is a computer vision task that identifies or verifies a person from an image or video frame by comparin

Image Segmentation
defined

Image segmentation is a computer vision task that partitions an image into multiple segments or regions, each correspond

R-CNN family (Fast/Faster/Mask)
defined

The R-CNN family is a series of object detection architectures that evolved from region-based convolutional neural netwo

Semantic Segmentation
defined

Semantic segmentation is a computer vision task that assigns a class label e.g., 'car', 'road', 'person' to every pixel

Super-Resolution
defined

Super-resolution is a computer vision technique that takes a low-resolution image and generates a higher-resolution vers

Feature Extraction
defined

Feature extraction is the process of converting raw input data like an image into a compact set of numerical representat

Image Inpainting
defined

Image inpainting is the task of filling missing or masked regions of an image with plausible, contextually consistent co

Instance Segmentation
defined

Instance segmentation is a computer vision task that assigns a pixel-level mask to each distinct object instance in an i

SLAM
defined

SLAM Simultaneous Localization and Mapping is a computational problem in robotics and computer vision where a device bui

Style Transfer
defined

Style transfer is a computer vision technique that applies the visual style of one image e.g., a painting to the content

Depth Estimation
defined

Depth estimation is a computer vision task that predicts a depth value for each pixel in an image, producing a depth map

Edge Detection
defined

Edge detection is a computer vision technique that identifies points in an image where brightness changes sharply, formi

Pose Estimation
defined

Pose estimation is a computer vision task that identifies the positions of key body joints e.g., shoulders, elbows, wris

3D Reconstruction
stub
SSD (Single Shot Detector)
stub
Optical Flow
stub
Panoptic Segmentation
stub

Agents & agentic AI18 terms · 14 defined

AI Agent
defined

An AI agent is software that uses an LLM to decide what to do, takes actions, observes results, and iterates toward a go

Coding Agent
defined

A coding agent is a language model configured to write, debug, or refactor code autonomously or semi-autonomously. It ty

Function Calling / Tool Use
defined

Function calling also called tool use is a capability where the model emits structured JSON requesting that specific too

Tool calling
defined

Tool calling also called function calling is a model's structured output capability where it produces JSON-shaped tool i

MCP (Model Context Protocol)
defined

MCP is an open protocol introduced by Anthropic in late 2024 for connecting AI agents to tools and data sources in a sta

Autonomous Agent
defined

An autonomous agent is a system that uses a language model to decide and execute multi-step tasks without human interven

Browser Agent
defined

A browser agent is an AI-driven program that controls a web browser to automate tasks like form filling, data extraction

Multi-Agent System
defined

A multi-agent system MAS is a setup where multiple AI agents, each with distinct roles or capabilities, collaborate or c

Orchestration (agents)
defined

Orchestration in the context of agents refers to the system that manages the lifecycle, communication, and task delegati

Planning (in agents)
defined

Planning in agents refers to the process where an LLM decomposes a complex goal into a sequence of sub-steps or actions

Agent Memory (Short/Long/Episodic)
defined

Agent memory refers to the mechanisms an AI agent uses to store and recall information across interactions. Short-term m

Robotic Process Automation (RPA)
defined

Robotic Process Automation RPA is software that automates repetitive, rule-based tasks typically performed by humans int

Embodied AI
defined

Embodied AI refers to AI systems that interact with the physical world through a body or sensorimotor capabilities, rath

Goal-Oriented Agent
stub
Agent-Based Modeling
stub
Reactive Agent
defined

A Reactive Agent selects actions based solely on its current percepts and a fixed set of condition-action rules, without

Deliberative Agent
stub
BDI Architecture
stub

Evaluation metrics27 terms · 21 defined

Tokens per second
defined

Tokens per second tok/s is the most-cited LLM throughput metric, but it's also the most-misunderstood. It splits into tw

Accuracy
defined

Accuracy measures how often a model's predictions match the expected ground truth, typically expressed as a percentage e

Time to first token (TTFT)
defined

TTFT time-to-first-token is the latency between sending a prompt and receiving the first generated token. It's dominated

F1 Score
defined

The F1 score is the harmonic mean of precision and recall, giving a single metric that balances false positives and fals

Perplexity
defined

Perplexity is a metric that measures how well a language model predicts a sequence of tokens. Lower perplexity means the

Precision
defined

Precision in local AI refers to the number of bits used to represent each weight and activation in a neural network. Low

Recall
defined

Recall measures the fraction of relevant items that a retrieval or classification system successfully finds. In local AI

AUC (Area Under Curve)
defined

AUC Area Under the Curve measures a model's ability to rank positive examples higher than negative ones, typically using

Confusion Matrix
defined

A confusion matrix is a table that summarizes the performance of a classification model by comparing predicted labels ag

Elo Rating (LLM benchmarks)
defined

Elo rating in LLM benchmarks is a relative scoring system that ranks models based on pairwise comparison results, typica

Pass@k
defined

Pass@k is a metric that measures the probability that at least one of k independently generated samples from a model con

ROC Curve
defined

A Receiver Operating Characteristic ROC curve plots the true positive rate against the false positive rate at various cl

Throughput vs Latency
defined

Throughput is aggregate tokens generated per second across all in-flight requests; latency is wall-clock time for a sing

GSM8K
defined

GSM8K is a benchmark of 8,500 grade-school math word problems requiring 2–8 reasoning steps. Models are scored by whethe

BLEU score
defined

BLEU Bilingual Evaluation Understudy is an automated metric that measures how similar a machine-generated text is to one

FID (Fréchet Inception Distance)
defined

FID Fréchet Inception Distance is a metric that measures the quality of images generated by a model by comparing the sta

IoU (Intersection over Union)
defined

IoU Intersection over Union is a metric that measures the overlap between a predicted bounding box and a ground-truth bo

R²
defined

R² coefficient of determination measures how well a regression model's predictions match actual outcomes, on a scale fro

mAP (mean Average Precision)
defined

mAP mean Average Precision is a metric that evaluates object detection models by averaging precision across recall thres

pass@1
defined

pass@1 is the probability that a model's first generated solution passes the unit tests for a coding problem, computed f

ROUGE score
stub
Sensitivity
defined

Sensitivity measures how much a model's output changes in response to small changes in its input. In local AI, sensitivi

Specificity
stub
Top-k Accuracy
stub
Inception Score
stub
METEOR
stub
Silhouette Score
stub

Learning paradigms23 terms · 14 defined

Reinforcement Learning (RL)
defined

Reinforcement Learning RL is a machine learning paradigm where an agent learns to make decisions by interacting with an

Self-Supervised Learning
defined

Self-supervised learning SSL is a training paradigm where a model learns representations from unlabeled data by creating

Supervised Learning
defined

Supervised learning is a training paradigm where a model learns to map inputs to outputs using labeled data — each train

Zero-Shot Learning
defined

Zero-shot learning is a capability where a model performs a task it was never explicitly trained on, using only a natura

Transfer Learning
defined

Transfer learning is a technique where a model trained on one task is reused as the starting point for a second task. In

Federated Learning
defined

Federated learning is a machine learning technique where a model is trained across multiple decentralized devices or ser

Few-Shot Learning
defined

Few-shot learning is a technique where a model performs a task after seeing only a small number of examples typically 2–

Unsupervised Learning
defined

Unsupervised learning is a machine learning paradigm where a model finds patterns in data without labeled examples. Unli

Contrastive Learning
defined

Contrastive learning is a self-supervised training method where a model learns to pull similar data points e.g., two aug

Deep Reinforcement Learning
defined

Deep Reinforcement Learning DRL combines deep neural networks with reinforcement learning, enabling agents to learn opti

Representation Learning
defined

Representation learning is the process by which a model automatically discovers the features or patterns in raw data tha

Continual Learning
defined

Continual learning also called lifelong learning is a machine learning paradigm where a model is trained on a sequence o

Meta-Learning
defined

Meta-learning, or 'learning to learn,' is a training paradigm where a model is exposed to many related tasks so it can q

Multi-Task Learning
defined

Multi-task learning MTL trains a single model on multiple related tasks simultaneously, sharing representations across t

Semi-Supervised Learning
stub
Active Learning
stub
Imitation Learning
stub
One-Shot Learning
stub
Curriculum Learning
stub
Lifelong Learning
stub
Online Learning
stub
Batch Learning
stub
Inverse Reinforcement Learning
stub

Ethics, safety & society23 terms · 19 defined

AI Safety
defined

AI safety refers to the set of practices and research aimed at ensuring that AI systems behave reliably, predictably, an

AI Alignment
defined

AI alignment refers to the challenge of ensuring that a model's outputs match the operator's intended goals and values.

AI Ethics
defined

AI ethics refers to the principles and practices that guide the responsible development and deployment of AI systems. Fo

Bias (AI/ML)
defined

Bias in AI/ML refers to systematic errors in model outputs that result from skewed training data, flawed assumptions, or

Algorithmic Bias
defined

Algorithmic bias refers to systematic and repeatable errors in a model's outputs that create unfair outcomes, such as pr

EU AI Act
defined

The EU AI Act is a regulatory framework from the European Union that classifies AI systems by risk level unacceptable, h

Explainability
defined

Explainability refers to the ability to understand and interpret why a model produces a specific output. For local AI op

Fairness (in AI)
defined

Fairness in AI refers to the absence of systematic bias in model outputs across different demographic groups. For operat

Interpretability
defined

Interpretability refers to the ability to understand and explain why a model produces a specific output. For local AI op

Privacy (in AI)
defined

Privacy in local AI refers to the operator's control over their data and model interactions, ensuring no data leaves the

AI Regulation
defined

AI regulation refers to laws, policies, and guidelines that govern the development, deployment, and use of AI systems. F

Adversarial Attack
defined

An adversarial attack is a technique where small, often imperceptible perturbations are added to an input to cause a mac

AI Governance
defined

AI Governance refers to the set of policies, processes, and technical controls that determine how a model is developed,

Adversarial Example
defined

An adversarial example is an input to a machine learning model that has been intentionally perturbed to cause a mispredi

Differential Privacy
defined

Differential Privacy is a mathematical framework that provides a formal guarantee that the output of an analysis reveals

Existential Risk (X-risk)
stub
Mechanistic Interpretability
defined

Mechanistic interpretability is the research approach of reverse-engineering neural networks into human-understandable a

Transparency (AI)
defined

Transparency in AI refers to the degree to which a model's behavior, training data, architecture, and decision-making pr

XAI (Explainable AI)
defined

Explainable AI XAI refers to methods that make the decisions of machine learning models understandable to humans. For lo

Accountability (AI)
defined

Accountability in AI means that the operator or organization deploying a model can be held responsible for its outputs a

Model Robustness
stub
Scalable Oversight
stub
Corrigibility
stub

Specialized domains21 terms · 17 defined

Computer Vision (domain)
defined

Computer vision is the field of AI that enables machines to interpret and process visual data—images, videos, or live ca

Self-Driving Cars
defined

Self-driving cars, also known as autonomous vehicles, use AI to perceive their environment and navigate without human in

AlphaFold
defined

AlphaFold is a deep learning model developed by DeepMind that predicts the 3D structure of proteins from their amino aci

AlphaGo
defined

AlphaGo is a computer program developed by DeepMind that plays the board game Go at a superhuman level. It combines deep

Autonomous Vehicles
defined

Autonomous vehicles are self-driving systems that use AI to perceive their environment, plan routes, and control vehicle

Robotics (AI)
defined

Robotics in AI refers to the integration of machine learning models into physical robots to enable perception, decision-

Healthcare AI
defined

Healthcare AI refers to machine learning models applied to medical data for tasks like diagnosis, treatment planning, dr

Recommender Systems
defined

Recommender systems are machine learning models that predict user preferences for items movies, products, content based

AI in Finance
defined

AI in Finance refers to the application of machine learning and deep learning models to financial tasks like fraud detec

AlphaZero
defined

AlphaZero is a reinforcement learning algorithm developed by DeepMind that learns to master board games Go, chess, shogi

Anomaly Detection
defined

Anomaly detection is the task of identifying data points, events, or patterns that deviate significantly from a dataset'

Fraud Detection
defined

Fraud detection is a machine learning task that identifies suspicious transactions, account activities, or user behavior

Medical Imaging (AI)
defined

Medical imaging AI refers to machine learning models trained to analyze medical scans like X-rays, CTs, MRIs, and pathol

Speech Processing
defined

Speech processing refers to the analysis, synthesis, and manipulation of human speech by AI models. Operators encounter

Algorithmic Trading
defined

Algorithmic trading uses computer programs to execute financial trades based on predefined rules, often involving statis

Drug Discovery (AI)
defined

Drug discovery with AI applies machine learning to the process of identifying and designing new pharmaceutical compounds

Game AI
defined

Game AI refers to the algorithms and systems that control non-player characters NPCs, opponents, and procedural content

OpenAI Gym / Gymnasium
stub
Bioinformatics
stub
Predictive Maintenance
stub
Reinforcement Learning Environments
stub

Data & datasets34 terms · 27 defined

Training Data
defined

Training data is the dataset used to teach a model its patterns and behaviors. For LLMs, this typically means trillions

ImageNet
defined

ImageNet is a large-scale image dataset containing over 14 million labeled images across 20,000 categories, organized by

MMLU
defined

MMLU Massive Multitask Language Understanding is a benchmark that tests a language model's knowledge across 57 subjects,

Data Augmentation
defined

Data augmentation is the technique of generating modified copies of existing training data to increase dataset size and

Feature Engineering
defined

Feature engineering is the process of transforming raw data into input variables features that improve model performance

HumanEval
defined

HumanEval is a benchmark dataset of 164 hand-written programming problems, each with a function signature, docstring, an

MNIST
defined

MNIST Modified National Institute of Standards and Technology is a dataset of 70,000 grayscale images of handwritten dig

Synthetic Data
defined

Synthetic data is artificially generated data used to train or fine-tune AI models, created by algorithms rather than co

COCO
defined

COCO Common Objects in Context is a large-scale image dataset created by Microsoft for object detection, segmentation, a

Cross-Validation
defined

Cross-validation is a technique for evaluating how well a model generalizes to unseen data by partitioning the dataset i

Data Labeling
defined

Data labeling is the process of annotating raw data text, images, audio with tags or categories that teach a model what

Data Pipeline
defined

A data pipeline is a sequence of automated steps that ingest, transform, and load data from source to destination. In lo

ETL
defined

ETL Extract, Transform, Load is a data pipeline process that pulls raw data from sources Extract, cleans or reformats it

Ground Truth
defined

Ground truth is the correct, real-world answer or label that a model is trained to predict or evaluated against. In supe

Test Data
defined

Test data is a set of examples used to evaluate a model's performance after training, distinct from the training data th

Validation Data
defined

Validation data is a subset of examples held back from training to evaluate how well a model generalizes to unseen input

Annotation
defined

Annotation is the process of adding labels, tags, or metadata to raw data text, images, audio to create a training datas

CIFAR-10/100
defined

CIFAR-10 and CIFAR-100 are datasets of 32x32 color images used for benchmarking image classification models. CIFAR-10 ha

Data Drift
stub
Feature Selection
defined

Feature selection is the process of identifying and retaining only the most relevant input variables features for a mach

Imbalanced Data
defined

Imbalanced data refers to a dataset where the number of samples per class is significantly skewed, with one or more mino

Normalization
defined

Normalization is a data preprocessing step that rescales input values to a fixed range e.g., 0,1 or -1,1 or adjusts them

One-Hot Encoding
defined

One-hot encoding converts categorical data e.g., token IDs into binary vectors where only one element is 'hot' 1 and all

Concept Drift
defined

Concept drift is a change in the statistical properties of a target variable over time, causing a trained model to becom

Feature Scaling
defined

Feature scaling adjusts the range of numeric input values so that each feature contributes equally to a model's training

GLUE benchmark
defined

The GLUE General Language Understanding Evaluation benchmark is a collection of nine natural language understanding task

K-Fold Cross-Validation
defined

K-Fold Cross-Validation is a technique for evaluating a model's performance by splitting the dataset into K equal-sized

Standardization
defined

Standardization in local AI refers to the process of converting raw data into a consistent format that models can proces

Holdout Set
stub
Label Encoding
stub
Outlier Detection
stub
SuperGLUE benchmark
stub
SMOTE
stub
Stratified Sampling
stub

Classical ML algorithms27 terms · 18 defined

XGBoost
defined

XGBoost Extreme Gradient Boosting is a gradient-boosted decision tree GBDT library optimized for structured/tabular data

Random Forest
defined

Random Forest is an ensemble machine learning method that builds multiple decision trees during training and outputs the

Decision Tree
defined

A decision tree is a supervised learning model that splits data into branches based on feature values, forming a tree-li

Gradient Boosting
defined

Gradient boosting is an ensemble machine learning technique that builds a strong predictive model by sequentially adding

K-Means Clustering
defined

K-Means Clustering is an unsupervised learning algorithm that partitions a dataset into K distinct, non-overlapping clus

LightGBM
defined

LightGBM is a gradient boosting framework that uses tree-based learning algorithms. It is designed for efficiency and sp

Principal Component Analysis (PCA)
defined

Principal Component Analysis PCA is a dimensionality reduction technique that transforms a high-dimensional dataset into

CatBoost
defined

CatBoost is a gradient boosting library developed by Yandex that handles categorical features automatically without manu

K-Nearest Neighbors (KNN)
defined

K-Nearest Neighbors KNN is a classical machine learning algorithm used for classification or regression. It works by fin

Linear Regression
defined

Linear regression is a statistical method that models the relationship between an input variable feature and an output v

Logistic Regression
defined

Logistic regression is a statistical model used for binary classification tasks, predicting the probability that an inpu

Support Vector Machine (SVM)
defined

A Support Vector Machine SVM is a supervised learning model that finds a hyperplane a decision boundary to separate data

t-SNE
defined

t-SNE t-distributed Stochastic Neighbor Embedding is a dimensionality reduction technique used to visualize high-dimensi

Q-Learning
defined

Q-Learning is a model-free reinforcement learning algorithm that learns an optimal action-selection policy by iterativel

UMAP
defined

UMAP Uniform Manifold Approximation and Projection is a dimensionality reduction technique used to visualize high-dimens

DBSCAN
defined

DBSCAN Density-Based Spatial Clustering of Applications with Noise is an unsupervised clustering algorithm that groups d

Hidden Markov Model (HMM)
stub
Markov Decision Process (MDP)
defined

A Markov Decision Process MDP is a mathematical framework for modeling decision-making in environments where outcomes ar

Monte Carlo Methods
defined

Monte Carlo methods are a class of algorithms that use repeated random sampling to approximate numerical results. In loc

Naive Bayes
stub
Genetic Algorithm
stub
Hierarchical Clustering
stub
Conditional Random Field (CRF)
stub
Linear Discriminant Analysis (LDA)
stub
Simulated Annealing
stub
SARSA
stub
Particle Swarm Optimization
stub

MLOps & deployment16 terms · 11 defined

MLOps
defined

MLOps Machine Learning Operations is the practice of managing the lifecycle of machine learning models from development

LLMOps
defined

LLMOps Large Language Model Operations is the set of practices for deploying, monitoring, and maintaining LLMs in produc

Model Deployment
defined

Model deployment is the process of making a trained AI model available for inference in a production environment. For lo

A/B Testing
defined

A/B Testing in ML compares two model variants — a control current production model and a treatment candidate model — by

Inference API
defined

An inference API is a programmatic interface that accepts input data like a prompt and returns a model's output like gen

Model Monitoring
defined

Model Monitoring continuously tracks the health and performance of deployed ML models by measuring: 1 prediction quality

Model Serving
defined

Model serving is the process of making a trained AI model available for inference via an API or local runtime. For opera

Real-Time Inference
defined

Real-time inference means the model processes input and returns output fast enough to feel instantaneous to a human user

Batch Inference
stub
CI/CD for ML
stub
Edge Deployment
stub
Feature Store
stub
Model Versioning
defined

Model Versioning tracks the evolution of ML models over time by assigning unique identifiers to each trained artifact an

Canary Deployment
stub
Model Registry
defined

A Model Registry is a centralized catalog that stores and versions trained models along with their metadata — training d

Shadow Deployment
defined

Shadow Deployment also called dark launch or shadow mode runs a candidate model in production alongside the current mode

Missing a term?

The glossary grows when we find gaps.

If you searched for an AI term and we don't have a definition, email Contact support with the term. We prioritize terms that are practical for running AI locally over purely academic ones, but we'll consider any reasonable suggestion.