AI glossary

551 terms across 19 categories. 440 have full definitions today; the rest are cataloged and being written.

We focus depth on terms most relevant to running AI locally. Cloud-only and academic terms are listed for completeness but get less attention.

Core concepts & fields18 terms · 7 defined

Artificial Intelligence (AI)

defined

Artificial Intelligence AI refers to systems that perform tasks typically requiring human intelligence, such as reasonin

Machine Learning (ML)

defined

Machine Learning ML is a field of AI where systems learn patterns from data without being explicitly programmed for ever

Deep Learning (DL)

defined

Deep learning DL is a subset of machine learning that uses multi-layer neural networks to learn patterns from data. In l

Neural Networks

defined

Neural networks are the computational architecture behind modern AI models. They consist of layers of interconnected nod

Artificial General Intelligence (AGI)

defined

Artificial General Intelligence AGI refers to a hypothetical AI system that can perform any intellectual task that a hum

Artificial Superintelligence (ASI)

defined

Artificial Superintelligence ASI refers to a hypothetical AI system that surpasses human intelligence across all domains

Inference (logical)

defined

Inference is the process of running a trained model on input data to generate an output — the "forward pass" that produc

Knowledge Representation

Computational Intelligence

stub

Connectionism

stub

Large language models56 terms · 52 defined

Large Language Model (LLM)

defined

A Large Language Model is a neural network with billions of parameters trained on massive text corpora to predict the ne

Quantization

defined

Quantization is the process of reducing a model's numeric precision to shrink its memory footprint with minimal quality

Quantization

defined

Quantization is the process of reducing a model's numeric precision to shrink its memory footprint with minimal quality

Inference

defined

Inference is the act of running a trained model to generate predictions, as opposed to training which produces the model

Prompt

defined

A prompt is the input text you provide to a language model to generate a response. It can be a simple question, a set of

Retrieval-Augmented Generation (RAG)

defined

RAG is the pattern of retrieving relevant documents from a knowledge base and including them in the LLM's prompt so the

Hallucination

defined

Hallucination is when an LLM generates plausible-sounding but factually incorrect information — citing papers that don't

Prompt Engineering

defined

Prompt engineering is the practice of crafting model inputs to elicit better outputs without changing the model itself.

LoRA (Low-Rank Adaptation)

defined

LoRA is a parameter-efficient fine-tuning technique that adapts a large pre-trained model by training small low-rank mat

RLHF (Reinforcement Learning from Human Feedback)

defined

RLHF Reinforcement Learning from Human Feedback is a training method that fine-tunes a language model using human prefer

Fine-tuning

defined

Fine-tuning is continued training of a pre-trained model on a smaller, task-specific dataset. Pre-training builds genera

Embedding (Vector Embedding)

defined

An embedding is a fixed-length vector representation of text, image, or other input — typically 384-3072 dimensions — wh

Foundation Model

defined

A foundation model is a large neural network trained on broad data at scale, designed to be adapted for a wide range of

Chain-of-Thought (CoT)

defined

Chain-of-thought prompting is asking a model to show its reasoning step-by-step before giving the final answer. It drama

Latency

defined

Latency measures how fast you get a response. Two metrics matter for local LLMs: Time to First Token TTFT — wall-clock

Vector Database

defined

A vector database stores and retrieves data as high-dimensional vectors embeddings rather than rows or documents. In loc

Alignment

defined

Alignment refers to the process of fine-tuning a base LLM so its outputs match human preferences, values, or safety guid

GGUF

defined

GGUF GGML Unified Format is the file format used by llama.cpp and its ecosystem Ollama, KoboldCPP, LM Studio. A single f

Pre-training

defined

Pre-training is the initial phase where a large language model learns from a vast, diverse corpus of text data e.g., web

System Prompt

defined

A system prompt is the initial instruction or context prepended to a conversation with an LLM. It sets the model's behav

Throughput

defined

Throughput measures how much work a system completes per unit time — typically tokens-per-second across all concurrent r

Instruction Tuning

defined

Instruction tuning is a supervised fine-tuning step where a base language model is trained on instruction, response pair

QLoRA

defined

QLoRA combines LoRA/glossary/lora fine-tuning with 4-bit quantization of the base model. Introduced by Tim Dettmers in 2

Semantic Search

defined

Semantic search retrieves results based on meaning rather than exact keyword matches. Instead of looking for literal wor

Direct Preference Optimization (DPO)

defined

Direct Preference Optimization DPO is a method for fine-tuning language models to align with human preferences without u

Few-Shot Prompting

defined

Few-shot prompting is a technique where you include a small number of input-output examples in the prompt to guide the m

In-Context Learning

defined

In-context learning ICL is a capability of large language models where the model adapts its behavior based solely on exa

Jailbreak

defined

A jailbreak is a prompt designed to bypass the safety guardrails of an LLM, causing it to generate content it would norm

ORPO (Odds Ratio Preference Optimization)

defined

ORPO Odds Ratio Preference Optimization is a fine-tuning method that combines supervised fine-tuning SFT and preference

Prompt Injection

defined

Prompt injection is a security exploit where a crafted input overrides the system prompt or instruction set of an LLM, c

Zero-Shot Prompting

defined

Zero-shot prompting is a technique where you give a language model a task description or instruction without providing a

DoRA (Weight-Decomposed Low-Rank Adaptation)

defined

DoRA Weight-Decomposed Low-Rank Adaptation is a fine-tuning method that improves upon LoRA by decomposing pre-trained we

KV Cache Quantization

defined

KV cache quantization reduces the memory footprint of the key-value KV cache by storing its entries in lower-precision f

Speculative Decoding

defined

Speculative decoding speeds up LLM inference by using a small fast "draft" model to propose the next several tokens, the

Distillation

defined

Distillation is a training technique where a smaller 'student' model learns to mimic the behavior of a larger 'teacher'

Guardrails

defined

Guardrails are runtime constraints or filters applied to an LLM's input and output to enforce safety, compliance, or for

Parameter-Efficient Fine-Tuning (PEFT)

defined

Parameter-Efficient Fine-Tuning PEFT is a set of techniques that adapt a pre-trained large language model to a specific

ReAct

defined

ReAct Reasoning + Acting is a prompting technique that interleaves chain-of-thought reasoning with tool-use actions. In

Red Teaming

defined

Red teaming is the practice of systematically probing an LLM to find failure modes: harmful outputs, jailbreaks, halluci

Chunked Prefill

defined

Chunked prefill is an inference-engine technique that splits long-prompt processing into smaller chunks so the engine ca

Dense Retrieval

defined

Dense retrieval finds documents by computing cosine similarity or dot product between learned vector embeddings of the q

Reranker (Cross-Encoder)

defined

A reranker is a cross-encoder model that scores query/document pairs jointly concatenated as input, producing a relevanc

Hybrid Retrieval

defined

Hybrid retrieval combines dense and sparse retrieval, typically by union-then-rerank or reciprocal rank fusion RRF. The

Constitutional AI

defined

Constitutional AI CAI is a training method that aligns language model behavior using a set of written rules a 'constitut

Grounding

defined

Grounding connects a language model's output to verifiable external sources documents, databases, APIs to reduce halluci

Knowledge Distillation

defined

Knowledge distillation is a technique where a smaller, faster 'student' model is trained to mimic the behavior of a larg

Proximal Policy Optimization (PPO)

defined

Proximal Policy Optimization PPO is a reinforcement learning algorithm used to fine-tune large language models LLMs with

RLAIF (RL from AI Feedback)

defined

RLAIF Reinforcement Learning from AI Feedback is a technique for fine-tuning language models where an AI system, rather

BM25 (Best Matching 25)

defined

BM25 is the canonical sparse-retrieval algorithm: a TF-IDF variant that saturates term frequency a token appearing 100 t

Sycophancy in LLMs refers to the model's tendency to agree with a user's stated or implied position, even when that posi

Tree of Thoughts

defined

Tree of Thoughts ToT is a prompting strategy that expands a single chain of reasoning into a tree of multiple reasoning

Sparse Retrieval

defined

Sparse retrieval scores documents by lexical overlap with the query — high-dimensional vectors where most entries are ze

Catastrophic Forgetting

stub

Mode Collapse

stub

Transformer & LLM components43 terms · 36 defined

KV Cache

defined

The KV cache stores the key and value tensors from previous attention computations so the model doesn't recompute them a

Context Window

defined

The context window is the maximum number of tokens a model can attend to at once — both prompt and previously generated

Attention Mechanism

defined

The attention mechanism is a neural network component that lets a model weigh the importance of different parts of the i

Token

defined

A token is the smallest unit of text a language model processes. Most modern models use subword tokenization, where comm

Self-Attention

defined

Self-attention computes a weighted representation of every position in a sequence by comparing each token against every

Tokenization

defined

Tokenization is the process of converting text into the numeric tokens a model can process. Modern systems use subword t

Multi-Head Attention

defined

Multi-Head Attention is a mechanism in transformer models where the input is projected into multiple parallel 'attention

Multi-Head Latent Attention (MLA)

defined

Multi-Head Latent Attention MLA is an attention mechanism used in DeepSeek V2/V3 that compresses the key-value KV cache

Prefill (Prompt Processing)

defined

Prefill is the first phase of LLM inference: the model processes the entire prompt in a single parallel pass, building u

Decode (Token Generation)

defined

Decode is the second phase of LLM inference: generating one output token at a time, autoregressively. Each decode step d

Flash Attention

defined

Flash Attention is a memory-efficient implementation of the attention mechanism that reduces memory usage from On² to On

Sliding Window Attention (SWA)

defined

Sliding Window Attention SWA is an attention pattern where each token only attends to a fixed-size window of nearby toke

Temperature (sampling)

defined

Temperature is a sampling parameter that controls the randomness of token selection during text generation. It scales th

Decoder

defined

A decoder is the component of a transformer model that generates output tokens one at a time, using the input's encoded

Encoder

defined

An encoder is a neural network component that processes input data text, images, audio into a dense representation—a vec

Grouped-Query Attention (GQA)

defined

Grouped-Query Attention GQA is a variant of multi-head attention that reduces memory and compute costs by sharing key-va

Rotary Position Embedding (RoPE)

defined

Rotary Position Embedding RoPE is a method for encoding token position in transformer models by rotating query and key v

Multi-Query Attention (MQA)

defined

Multi-Query Attention MQA is a transformer attention variant where all attention heads share a single key/value projecti

PagedAttention

defined

PagedAttention is the memory layout introduced by vLLM that stores the KV cache in fixed-size blocks pages, like virtual

Sampling (Decoding)

defined

Sampling is the process of converting model logits into output tokens. Common strategies: greedy temperature 0, random s

Byte Pair Encoding (BPE)

defined

Byte Pair Encoding BPE is a subword tokenization algorithm that splits text into a sequence of tokens by iteratively mer

Encoder-Decoder

defined

An encoder-decoder is a neural network architecture that processes an input sequence through an encoder to produce a com

Top-p (Nucleus) Sampling

defined

Top-p nucleus sampling is a text generation strategy that selects from the smallest set of tokens whose cumulative proba

Temperature 0 (Greedy Sampling)

defined

Temperature 0 disables sampling entirely — the model picks the highest-logit token at every step. Equivalent to greedy d

Cross-Attention

defined

Cross-attention is a mechanism in transformer models where the query vectors come from one sequence e.g., the decoder's

Positional Encoding

defined

Positional encoding is a technique used in transformer models to inject information about the position of tokens in a se

Top-k sampling is a text-generation strategy that restricts the model's next-token choices to the k tokens with the high

Deterministic Decoding

defined

Deterministic decoding means same prompt → same output, every time. Achieved by setting temperature to 0 always pick the

Layer normalization is a technique that stabilizes training and inference by normalizing activations across the features

Logits

defined

Logits are the raw, unnormalized scores output by the final linear layer of a transformer model, before the softmax func

Random Seed

defined

A random seed initializes the pseudo-random generator that drives sampling at temperature > 0. Same seed + same prompt +

RMSNorm

defined

RMSNorm is a simpler variant of LayerNorm that normalizes activations by their root-mean-square instead of their varianc

YaRN (Yet another RoPE eNlargement)

defined

YaRN is a context-extension method that modifies RoPE frequencies to let a model trained on, say, 8K context generalize

SwiGLU is a gated feed-forward activation: W1·x ⊙ swishW2·x · W3, replacing the standard MLP's GELU/ReLU in modern trans

WordPiece

stub

ALiBi (Attention with Linear Biases)

defined

ALiBi is a positional encoding scheme that biases attention scores by a linear function of token distance, instead of in

Mirostat Sampling

defined

Mirostat is a sampling algorithm that targets a fixed perplexity-like "surprise" level tau instead of a fixed top-p or t

Feed-Forward Network

stub

Natural language processing28 terms · 21 defined

GPT (architecture)

defined

GPT Generative Pre-trained Transformer is a decoder-only Transformer architecture that predicts the next token in a sequ

Natural Language Processing (NLP)

defined

Natural Language Processing NLP is the field of AI focused on enabling computers to understand, interpret, and generate

BERT

defined

BERT Bidirectional Encoder Representations from Transformers is a transformer-based language model that reads text in bo

Language Modeling

defined

Language modeling is the task of predicting the next token word, subword, or character in a sequence given the preceding

Text Generation

defined

Text generation is the process where a language model produces coherent sequences of tokens words or subwords in respons

Automatic Speech Recognition (ASR)

defined

Automatic Speech Recognition ASR converts spoken audio into text. Operators encounter ASR when running models like Whisp

Machine Translation

defined

Machine translation MT is the task of automatically translating text from one natural language to another using a neural

Sentiment Analysis

defined

Sentiment analysis is a text classification task where a model assigns a label e.g., positive, negative, neutral to a pi

Text Summarization

defined

Text summarization is a natural language processing task where a model generates a shorter version of a longer text whil

Text-to-Speech (TTS)

defined

Text-to-Speech TTS converts written text into spoken audio using neural models. Operators encounter TTS when running loc

Word Embedding

defined

A word embedding is a dense vector of floating-point numbers that maps a word or token to a point in a high-dimensional

Word2Vec

defined

Word2Vec is an algorithm that learns dense vector representations embeddings of words from large text corpora. Each word

Named Entity Recognition (NER)

defined

Named Entity Recognition NER is an NLP task that identifies and classifies named entities e.g., person, organization, lo

Question Answering

defined

Question answering QA is a natural language processing task where a model receives a question and returns a concise answ

Text Classification

defined

Text classification is a natural language processing task where a model assigns a predefined category label to a piece o

GloVe

defined

GloVe Global Vectors for Word Representation is a static word embedding method that learns vector representations of wor

Speech Synthesis

defined

Speech synthesis, also known as text-to-speech TTS, converts written text into spoken audio. In local AI, operators run

defined

T5 Text-to-Text Transfer Transformer is a sequence-to-sequence model from Google that converts every NLP task into a tex

FastText

defined

FastText is a library for efficient learning of word representations and sentence classification, developed by Facebook

N-gram

defined

An n-gram is a contiguous sequence of n items usually tokens or characters from a text. In local AI, n-grams appear in t

Topic Modeling

defined

Topic modeling is an unsupervised NLP technique that discovers latent themes topics across a collection of documents. It

Latent Dirichlet Allocation (LDA)

stub

Part-of-Speech Tagging

Coreference Resolution

stub

Notable models & companies18 terms · 17 defined

GPT-4

defined

GPT-4 is a large multimodal language model developed by OpenAI, released in March 2023. It accepts text and image inputs

OpenAI

defined

OpenAI is the organization that developed the GPT series of large language models GPT-3, GPT-4, GPT-4o and the DALL-E im

Llama (Meta)

defined

Llama is a family of open-weight large language models LLMs developed by Meta, starting with Llama 1 in 2023 and continu

Anthropic

defined

Anthropic is an AI safety and research company that develops large language models LLMs under the Claude family. Operato

Claude (Anthropic)

defined

Claude is a family of large language models LLMs developed by Anthropic, designed for safe and helpful text generation.

NVIDIA

defined

NVIDIA designs the GPUs most operators use for local AI inference. Its consumer RTX series e.g., RTX 4090 and workstatio

DeepSeek

defined

DeepSeek is a family of open-weight large language models developed by DeepSeek 深度求索, a Chinese AI research company. The

GPT-5

defined

GPT-5 is the hypothetical successor to OpenAI's GPT-4 model family. As of early 2025, no official GPT-5 model has been r

Gemini (Google)

defined

Gemini is a family of multimodal large language models LLMs developed by Google DeepMind, designed to process text, imag

Google DeepMind

defined

Google DeepMind is an AI research lab formed from the 2023 merger of Google Brain and DeepMind. It develops large langua

Hugging Face

defined

Hugging Face is a platform and company that hosts a vast repository of open-source machine learning models, datasets, an

Qwen

defined

Qwen is a family of large language models LLMs developed by Alibaba Cloud, ranging from 0.5B to 110B parameters. Operato

Meta AI

defined

Meta AI is the artificial intelligence research division of Meta Platforms formerly Facebook. For local AI operators, Me

Mistral

defined

Mistral is a family of open-weight large language models LLMs developed by Mistral AI, known for their efficiency and st

Stability AI

defined

Stability AI is the company behind the Stable Diffusion family of image generation models, which operators run locally v

Grok (xAI)

defined

Grok is a family of large language models LLMs developed by xAI, led by Elon Musk. The first version, Grok-1, was releas

Phi (Microsoft)

defined

Phi is a family of small language models SLMs developed by Microsoft, designed to run efficiently on consumer hardware l

Command (Cohere)

stub

Generative AI23 terms · 14 defined

Generative AI (GenAI)

defined

Generative AI GenAI refers to machine learning models that produce new content—text, images, audio, code, or video—by le

Deepfake

defined

A deepfake is a synthetic media image, video, or audio generated or manipulated by a deep learning model, typically an a

Generative Model

defined

A generative model is a type of machine learning model that learns the underlying distribution of training data and can

ControlNet

defined

ControlNet is a neural network architecture that adds spatial conditioning to pretrained image diffusion models like Sta

Latent Diffusion

defined

Latent diffusion is a technique used in image generation models like Stable Diffusion that applies the diffusion process

Video Generation

defined

Video generation refers to the process of creating new video content from text prompts, images, or other video inputs us

Autoregressive Models

defined

Autoregressive models generate text one token at a time, where each new token depends on all previously generated tokens

Latent Space

defined

Latent space is the internal, compressed representation of data that a generative model learns during training. It is a

Voice Cloning

defined

Voice cloning is the process of generating synthetic speech that mimics a specific person's voice, including timbre, pit

Audio Generation

defined

Audio generation refers to the process of creating audio content—such as speech, music, or sound effects—using machine l

DreamBooth

defined

DreamBooth is a fine-tuning technique that personalizes a text-to-image model like Stable Diffusion to generate images o

StyleGAN

defined

StyleGAN is a generative adversarial network GAN architecture designed for high-resolution image synthesis, introduced b

DDPM (Denoising Diffusion Probabilistic Models)

defined

DDPM Denoising Diffusion Probabilistic Models is a class of generative models that learn to generate data by reversing a

Music generation refers to the use of AI models to produce audio or symbolic representations of music e.g., MIDI, sheet

DDIM (Denoising Diffusion Implicit Models)

Frameworks & tools40 terms · 38 defined

Ollama

defined

Ollama is a runtime and CLI tool for running large language models locally on consumer hardware. It wraps llama.cpp and

PyTorch

defined

PyTorch is an open-source machine learning framework developed by Meta. It provides tensor computation with GPU accelera

llama.cpp

defined

llama.cpp is a C++ inference engine for running large language models LLMs locally on consumer hardware. It loads quanti

vLLM

defined

vLLM is an open-source inference engine optimized for high-throughput, low-latency serving of large language models. It

Hugging Face Transformers

defined

Hugging Face Transformers is a Python library that provides pre-trained models and tools for natural language processing

LM Studio

defined

LM Studio is a desktop application that provides a graphical interface for downloading, managing, and running local larg

LangChain

defined

LangChain is a Python/TypeScript framework for building applications that chain together LLM calls, external data source

TensorFlow

defined

TensorFlow is an open-source machine learning framework developed by Google. Operators encounter it as an alternative to

scikit-learn

defined

scikit-learn is a Python library for classical machine learning regression, classification, clustering, dimensionality r

text-generation-webui (oobabooga)

defined

text-generation-webui often called oobabooga is a browser-based interface for running large language models locally. It

ExLlamaV2

defined

ExLlamaV2 is a high-performance inference engine for Llama-family models, optimized for GPU execution. It achieves faste

KoboldCpp

defined

KoboldCpp is a single-file, self-contained executable that bundles llama.cpp with a web-based UI and a built-in API, des

LlamaIndex

defined

LlamaIndex is a data framework for building retrieval-augmented generation RAG applications. It provides tools to ingest

OpenCV

defined

OpenCV Open Source Computer Vision Library is a C++ library with Python bindings for real-time image and video processin

Continuous Batching

defined

Continuous batching sometimes "iteration-level scheduling" is a serving optimization where new requests join the active

Hugging Face Text Generation Inference (TGI)

defined

Hugging Face Text Generation Inference TGI is a production-grade inference server for large language models, optimized f

Gradio

defined

Gradio is an open-source Python library for quickly building web-based user interfaces for machine learning models. Oper

JAX

defined

JAX is a numerical computing library from Google that combines NumPy-like array operations with automatic differentiatio

Keras

defined

Keras is a high-level neural network API, written in Python and capable of running on top of TensorFlow, JAX, or PyTorch

MLC LLM

defined

MLC LLM Machine Learning Compilation for Large Language Models is a framework that compiles LLMs into deployable binarie

SGLang

defined

SGLang is an open-source LLM inference engine focused on high throughput for structured generation and complex agent wor

Streamlit

defined

Streamlit is an open-source Python framework for turning data scripts into interactive web apps with minimal code. Opera

Prefix Caching

defined

Prefix caching stores the KV cache from previous requests so a new request that shares a prefix system prompt, few-shot

Request Batching

defined

Request batching packs multiple inference requests into a single forward pass to amortize the cost of loading model weig

MPS (Metal Performance Shaders)

defined

MPS is Apple's high-level Metal-based compute library, exposed in PyTorch as the mps device backend. Calling model.to"mp

Airflow

defined

Airflow is a workflow orchestration tool for scheduling, monitoring, and managing complex data pipelines as directed acy

MLflow

defined

MLflow is an open-source platform for managing the end-to-end machine learning lifecycle, including experimentation, rep

Ray

defined

Ray is an open-source distributed computing framework for scaling AI workloads across multiple machines. Operators encou

Triton Inference Server

defined

Triton Inference Server is an open-source inference serving software by NVIDIA that manages multiple AI models across GP

Weights & Biases

defined

Weights & Biases W&B is a cloud-based MLOps platform for tracking experiments, visualizing metrics, and managing model a

spaCy

defined

spaCy is a Python library for industrial-strength natural language processing NLP that provides pre-trained pipelines fo

FAISS

defined

FAISS Facebook AI Similarity Search is a C++/Python library for fast approximate nearest-neighbor search over dense vect

GGML

defined

GGML is the C/C++ tensor library that underlies llama.cpp, whisper.cpp, and the original GGUF format. It provides quanti

Vulkan Compute

defined

Vulkan compute is the cross-vendor GPU compute API from Khronos. llama.cpp ships a Vulkan backend that runs on AMD, Inte

NLTK

defined

NLTK Natural Language Toolkit is a Python library for classical NLP tasks like tokenization, stemming, tagging, and pars

TensorBoard

defined

TensorBoard is a visualization toolkit from TensorFlow for inspecting model training metrics, graph structures, and weig

DirectML

defined

DirectML is Microsoft's GPU-agnostic ML acceleration API, layered on DirectX 12. It works on any Windows-supported GPU —

Expert Parallelism

defined

Expert parallelism is a parallelism strategy specific to MoE models: each GPU holds a different subset of the experts, a

Neural network architectures29 terms · 24 defined

Transformer

defined

The Transformer is a neural network architecture introduced in 2017 that replaced recurrent layers with a self-attention

Diffusion Model

defined

A diffusion model is a type of generative model that learns to reverse a gradual noising process. During training, the m

Convolutional Neural Network (CNN)

defined

A Convolutional Neural Network CNN is a neural network architecture that uses convolutional layers to process grid-like

Generative Adversarial Network (GAN)

defined

A Generative Adversarial Network GAN is a machine learning architecture where two neural networks—a generator and a disc

Mixture of Experts (MoE)

defined

Mixture of Experts is a neural network architecture where multiple specialized sub-networks "experts" exist, but only a

Multimodal AI

defined

Multimodal AI refers to models that process and generate multiple data types—typically text, images, and sometimes audio

Vision-Language Model (VLM)

defined

A Vision-Language Model VLM processes both images and text, enabling tasks like image captioning, visual question answer

Long Short-Term Memory (LSTM)

defined

Long Short-Term Memory LSTM is a recurrent neural network RNN architecture designed to model sequential data while avoid

Recurrent Neural Network (RNN)

defined

A Recurrent Neural Network RNN is a neural network architecture designed for sequential data, where each output depends

Multi-Layer Perceptron (MLP)

defined

A Multi-Layer Perceptron MLP is a feedforward neural network composed of at least three layers: an input layer, one or m

Residual Network (ResNet)

defined

A Residual Network ResNet is a neural network architecture that introduces skip connections also called shortcut connect

Vision Transformer (ViT)

defined

A Vision Transformer ViT is a neural network architecture that applies the Transformer model, originally designed for te

Decoder-Only Transformer

defined

Decoder-only is the architecture of GPT, Llama, Qwen, Mistral, DeepSeek, and almost every modern open-weight LLM. The mo

Autoencoder

defined

An autoencoder is a neural network trained to reconstruct its input after passing it through a bottleneck layer. The bot

Graph Neural Network (GNN)

defined

A Graph Neural Network GNN is a neural network architecture designed to process data structured as graphs—nodes connecte

Perceptron

defined

A perceptron is the simplest form of a neural network: a single linear unit that takes weighted inputs, sums them, adds

State Space Models (Mamba)

defined

State Space Models SSMs, notably the Mamba architecture, are a class of sequence models that process tokens in linear ti

U-Net

defined

U-Net is a convolutional neural network architecture designed for image segmentation tasks. It consists of a contracting

Variational Autoencoder (VAE)

defined

A Variational Autoencoder VAE is a generative neural network that learns a compressed latent representation of input dat

Dense Model

defined

A dense model activates every parameter on every forward pass — the default architecture for transformers like Llama, Qw

Neural Radiance Field (NeRF)

defined

A Neural Radiance Field NeRF is a neural network that represents a 3D scene as a continuous function mapping a 3D locati

MoE Routing

defined

MoE routing is the gating mechanism that decides which experts a token activates in a Mixture-of-Experts layer. Top-k ro

Feedforward Neural Network

defined

A feedforward neural network FFNN is the simplest type of neural network where connections between nodes do not form cyc

Gated Recurrent Unit (GRU)

stub

Encoder-Decoder Transformer

defined

Encoder-decoder transformers T5, BART, original "Attention is All You Need" architecture have two halves: an encoder rea

Spiking Neural Network

stub

Capsule Network

stub

Hardware & infrastructure39 terms · 39 defined

VRAM (Video RAM)

defined

VRAM is the dedicated memory on a GPU. For local AI, VRAM capacity is the single most important spec — it determines whi

GPU

defined

A GPU Graphics Processing Unit is a specialized processor designed for parallel computation, originally for graphics but

CUDA

defined

CUDA Compute Unified Device Architecture is NVIDIA's parallel-computing platform and the dominant API for GPU-accelerate

CPU Offload

defined

CPU offload is a technique where parts of a neural network model are processed by the CPU instead of the GPU, typically

Edge AI

defined

Edge AI refers to running machine learning models locally on consumer hardware laptops, phones, GPUs rather than sending

VRAM Bandwidth

defined

VRAM bandwidth is the rate at which the GPU's video memory can transfer data to the compute cores, measured in GB/s. For

MLX (Apple)

defined

MLX is Apple's open-source array framework optimized for Apple Silicon. The Apple equivalent of PyTorch + CUDA, with fir

TPU (Tensor Processing Unit)

defined

A Tensor Processing Unit TPU is a custom ASIC designed by Google specifically for accelerating machine learning workload

Distributed Training

defined

Distributed training splits the work of training a neural network across multiple GPUs or machines, using techniques lik

Edge Computing

defined

Edge computing means running AI inference on a local device laptop, phone, embedded system instead of sending data to a

FLOPS

defined

FLOPS Floating Point Operations Per Second measures how many floating-point calculations a processor can perform in one

FP16

defined

FP16 16-bit floating point is a number format that uses 16 bits per weight or activation, balancing precision and memory

NPU (Neural Processing Unit)

defined

A Neural Processing Unit NPU is a specialized hardware accelerator designed to execute neural network operations efficie

On-Device AI

defined

On-device AI refers to running machine learning models directly on local hardware CPU, GPU, NPU rather than sending data

GDDR7

defined

GDDR7 uses PAM3 signaling to push per-pin rates to 28–32 Gbps in first-gen products 2025, with a path to 40+ Gbps. RTX 5

Unified Memory

defined

Unified memory is a memory architecture where CPU and GPU share the same physical RAM pool, eliminating CPU↔GPU copies.

BF16 (BFloat16)

defined

BF16 Brain Floating Point 16 is a 16-bit floating-point number format that uses 8 exponent bits and 7 mantissa bits, mat

Data Parallelism

defined

Data parallelism is a distributed training strategy where a model is replicated across multiple devices GPUs or nodes, a

DeepSpeed

defined

DeepSpeed is a deep learning optimization library by Microsoft that reduces memory usage and speeds up training for larg

FP8

defined

FP8 Floating Point 8 is an 8-bit floating-point number format used in AI inference and training to reduce memory and com

HBM (High Bandwidth Memory)

defined

HBM High Bandwidth Memory is a 3D-stacked DRAM design that vertically layers memory dies with through-silicon vias TSVs

Mixed Precision

defined

Mixed precision is a technique that uses different numerical precisions e.g., FP16 and FP32 for different parts of a mod

NVLink

defined

NVLink is NVIDIA's proprietary GPU-to-GPU interconnect, used to bind multiple data-center GPUs into a coherent memory fa

ONNX

defined

ONNX Open Neural Network Exchange is an open-source format for representing machine learning models, designed to enable

ROCm (AMD)

defined

ROCm Radeon Open Compute is AMD's open-source equivalent of NVIDIA's CUDA. It's required for any meaningful AMD GPU infe

Tensor Core

defined

Tensor Cores are specialized hardware units on NVIDIA GPUs Volta architecture and later that perform fused multiply-add

Tensor Parallelism

defined

Tensor parallelism splits each transformer layer's weight matrices across multiple GPUs. Card 0 holds the first half of

TensorRT

defined

TensorRT is NVIDIA's SDK for optimizing and deploying deep learning models on NVIDIA GPUs. It performs graph optimizatio

FSDP (Fully Sharded Data Parallel)

defined

FSDP Fully Sharded Data Parallel is a distributed training technique that shards model parameters, gradients, and optimi

INT8

defined

INT8 8-bit integer is a numerical format that uses 8 bits to represent integers, typically in the range -128, 127 for si

Metal (Apple)

defined

Metal is Apple's low-level GPU programming framework and API, analogous to Vulkan on other platforms. For local AI opera

Model Parallelism

defined

Model parallelism is a technique that splits a single neural network across multiple GPUs or other accelerators, with ea

ZeRO optimizer

defined

ZeRO Zero Redundancy Optimizer is a memory optimization technique for distributed training of large models. It partition

cuDNN

defined

cuDNN CUDA Deep Neural Network library is NVIDIA's GPU-accelerated library for deep learning primitives like convolution

NVSwitch

defined

NVSwitch is the crossbar that connects 8 or in NVL72, 72 GPUs into a single all-to-all NVLink fabric. Each GPU talks to

FP32

defined

FP32 32-bit floating point is a numerical format that uses 32 bits to represent each model weight, offering high precisi

INT4

defined

INT4 is a quantization format that stores each model weight using 4 bits, reducing memory usage by roughly 4× compared t

Pipeline Parallelism

defined

Pipeline parallelism a.k.a. "layer split" in llama.cpp parlance puts whole layers on different GPUs. Card 0 handles laye

Vulkan compute

defined

Vulkan compute is a cross-platform GPU compute API that runs inference workloads on GPUs without requiring CUDA. In loca

Training & optimization44 terms · 31 defined

Q4_K_M Quantization

defined

Q4KM is the most-downloaded GGUF quantization on Hugging Face — the default tradeoff for local inference. It mixes 6-bit

AWQ

defined

AWQ Activation-aware Weight Quantization is a 4-bit quantization method designed for fast inference on NVIDIA GPUs. It's

Backpropagation

defined

Backpropagation is the algorithm used to train neural networks by computing gradients of the loss function with respect

Dropout

defined

Dropout is a regularization technique used during neural network training where randomly selected neurons are ignored dr

Gradient Descent

defined

Gradient descent is an optimization algorithm that iteratively adjusts model weights to minimize a loss function. In loc

Overfitting

defined

Overfitting occurs when a model learns training data too well, including noise and irrelevant patterns, at the cost of g

Q5_K_M Quantization

defined

Q5KM is a mixed-precision GGUF quantization averaging ~5.7 bits per parameter. Attention and feed-forward weights use 6-

Q8_0 Quantization

defined

Q80 is llama.cpp's simplest 8-bit GGUF quantization: weights in INT8, one FP16 scale per 32-element block, no zero-point

Adam Optimizer

defined

Adam Adaptive Moment Estimation is an optimizer that adjusts learning rates per parameter during training. It combines m

Batch Normalization

defined

Batch normalization is a training technique that normalizes the inputs to a layer across a mini-batch of data. It comput

Hyperparameter

defined

A hyperparameter is a configuration variable set before training begins that controls the learning process, not a parame

Learning Rate

defined

Learning rate is a hyperparameter that controls how much the model's weights are adjusted during each training step. A h

Loss Function

stub

Stochastic Gradient Descent (SGD)

defined

Stochastic Gradient Descent SGD is an optimization algorithm used during model training to minimize the loss function. U

GPTQ

defined

GPTQ Generative Pre-trained Transformer Quantization is a one-shot post-training quantization method that uses approxima

Q4_0 Quantization

defined

Q40 is the original llama.cpp 4-bit quantization: INT4 weights with one FP16 scale per 32-element block, no zero-point,

AdamW

defined

AdamW is an optimizer algorithm used during fine-tuning or training of neural networks, including LLMs. It modifies the

Batch Size

defined

Batch size is the number of training samples processed together in one forward and backward pass. In local AI training,

Cross-Entropy Loss

stub

Hyperparameter Tuning

defined

Hyperparameter tuning is the process of selecting the configuration values that control how a model trains, such as lear

Regularization

defined

Regularization is a set of techniques used during model training to prevent overfitting—where the model memorizes traini

EXL2

defined

EXL2 is the ExLlamaV2 quantization format. NVIDIA-only, single-stream-throughput-optimized. Allows fractional bit-rates

Bias-Variance Tradeoff

defined

The bias-variance tradeoff describes the tension between a model's ability to fit training data closely low bias and its

Epoch

defined

An epoch is one complete pass through the entire training dataset during model training. In practice, operators fine-tun

HQQ (Half-Quadratic Quantization)

defined

HQQ Half-Quadratic Quantization is a calibration-free quantization method that produces 2-, 3-, 4-, and 8-bit weight qua

L1 / L2 Regularization

stub

Mean Squared Error (MSE)

stub

Q3_K_M Quantization

defined

Q3KM is a 3-bit GGUF K-quant averaging ~3.9 bits per parameter. It's the smallest format that still produces usable outp

The vanishing gradient problem occurs when gradients used to update model weights become extremely small as they are bac

Early Stopping

defined

Early stopping is a training technique that halts model training when performance on a validation set stops improving, p

Exploding Gradient

defined

An exploding gradient occurs when the gradients used to update model weights during training grow exponentially large, c

Gradient Clipping

defined

Gradient clipping is a technique used during neural network training to prevent exploding gradients. It caps the gradien

Learning Rate Schedule

defined

A learning rate schedule adjusts the step size learning rate during training to improve convergence and model quality. I

Weight Decay

defined

Weight decay is a regularization technique used during model training that adds a penalty proportional to the squared ma

Bayesian Optimization

Q2K is 2-bit GGUF quantization averaging ~3.0 bits per parameter with mandatory 4-bit scales and importance metadata. It

RMSprop

stub

Computer vision24 terms · 20 defined

Stable Diffusion

defined

Stable Diffusion is a text-to-image model that generates images from text prompts using a diffusion process. It runs on

Object Detection

defined

Object detection is a computer vision task that identifies and localizes specific objects within an image or video frame

DALL-E

defined

DALL-E is a family of text-to-image generative models developed by OpenAI. Operators encounter it as a cloud-only API se

Image Classification

defined

Image classification is a computer vision task where a model assigns a single label from a predefined set to an input im

Midjourney

defined

Midjourney is a proprietary text-to-image AI service accessible via Discord, not a local model. Operators cannot downloa

Optical Character Recognition (OCR)

defined

Optical Character Recognition OCR is the process of converting images of text—scanned documents, photos, or screenshots—

YOLO

defined

YOLO You Only Look Once is a family of real-time object detection models that process an entire image in a single forwar

Face Recognition

defined

Face recognition is a computer vision task that identifies or verifies a person from an image or video frame by comparin

Image Segmentation

defined

Image segmentation is a computer vision task that partitions an image into multiple segments or regions, each correspond

R-CNN family (Fast/Faster/Mask)

defined

The R-CNN family is a series of object detection architectures that evolved from region-based convolutional neural netwo

Semantic Segmentation

defined

Semantic segmentation is a computer vision task that assigns a class label e.g., 'car', 'road', 'person' to every pixel

Super-Resolution

defined

Super-resolution is a computer vision technique that takes a low-resolution image and generates a higher-resolution vers

Feature Extraction

defined

Feature extraction is the process of converting raw input data like an image into a compact set of numerical representat

Image Inpainting

defined

Image inpainting is the task of filling missing or masked regions of an image with plausible, contextually consistent co

Instance Segmentation

defined

Instance segmentation is a computer vision task that assigns a pixel-level mask to each distinct object instance in an i

SLAM

defined

SLAM Simultaneous Localization and Mapping is a computational problem in robotics and computer vision where a device bui

Style Transfer

defined

Style transfer is a computer vision technique that applies the visual style of one image e.g., a painting to the content

Depth Estimation

defined

Depth estimation is a computer vision task that predicts a depth value for each pixel in an image, producing a depth map

Edge Detection

defined

Edge detection is a computer vision technique that identifies points in an image where brightness changes sharply, formi

Pose Estimation

defined

Pose estimation is a computer vision task that identifies the positions of key body joints e.g., shoulders, elbows, wris

3D Reconstruction

stub

SSD (Single Shot Detector)

stub

Optical Flow

stub

Panoptic Segmentation

stub

Agents & agentic AI18 terms · 14 defined

AI Agent

defined

An AI agent is software that uses an LLM to decide what to do, takes actions, observes results, and iterates toward a go

Coding Agent

defined

A coding agent is a language model configured to write, debug, or refactor code autonomously or semi-autonomously. It ty

Function Calling / Tool Use

defined

Function calling also called tool use is a capability where the model emits structured JSON requesting that specific too

Tool calling

defined

Tool calling also called function calling is a model's structured output capability where it produces JSON-shaped tool i

MCP (Model Context Protocol)

defined

MCP is an open protocol introduced by Anthropic in late 2024 for connecting AI agents to tools and data sources in a sta

Autonomous Agent

defined

An autonomous agent is a system that uses a language model to decide and execute multi-step tasks without human interven

Browser Agent

defined

A browser agent is an AI-driven program that controls a web browser to automate tasks like form filling, data extraction

Multi-Agent System

defined

A multi-agent system MAS is a setup where multiple AI agents, each with distinct roles or capabilities, collaborate or c

Orchestration (agents)

defined

Orchestration in the context of agents refers to the system that manages the lifecycle, communication, and task delegati

Planning (in agents)

defined

Planning in agents refers to the process where an LLM decomposes a complex goal into a sequence of sub-steps or actions

Agent Memory (Short/Long/Episodic)

defined

Agent memory refers to the mechanisms an AI agent uses to store and recall information across interactions. Short-term m

Robotic Process Automation (RPA)

defined

Robotic Process Automation RPA is software that automates repetitive, rule-based tasks typically performed by humans int

Embodied AI

defined

Embodied AI refers to AI systems that interact with the physical world through a body or sensorimotor capabilities, rath

A Reactive Agent selects actions based solely on its current percepts and a fixed set of condition-action rules, without

Evaluation metrics27 terms · 21 defined

Tokens per second

defined

Tokens per second tok/s is the most-cited LLM throughput metric, but it's also the most-misunderstood. It splits into tw

Accuracy

defined

Accuracy measures how often a model's predictions match the expected ground truth, typically expressed as a percentage e

Time to first token (TTFT)

defined

TTFT time-to-first-token is the latency between sending a prompt and receiving the first generated token. It's dominated

F1 Score

defined

The F1 score is the harmonic mean of precision and recall, giving a single metric that balances false positives and fals

Perplexity

defined

Perplexity is a metric that measures how well a language model predicts a sequence of tokens. Lower perplexity means the

Precision

defined

Precision in local AI refers to the number of bits used to represent each weight and activation in a neural network. Low

Recall

defined

Recall measures the fraction of relevant items that a retrieval or classification system successfully finds. In local AI

AUC (Area Under Curve)

defined

AUC Area Under the Curve measures a model's ability to rank positive examples higher than negative ones, typically using

Confusion Matrix

defined

A confusion matrix is a table that summarizes the performance of a classification model by comparing predicted labels ag

Elo Rating (LLM benchmarks)

defined

Elo rating in LLM benchmarks is a relative scoring system that ranks models based on pairwise comparison results, typica

Pass@k

defined

Pass@k is a metric that measures the probability that at least one of k independently generated samples from a model con

ROC Curve

defined

A Receiver Operating Characteristic ROC curve plots the true positive rate against the false positive rate at various cl

Throughput vs Latency

defined

Throughput is aggregate tokens generated per second across all in-flight requests; latency is wall-clock time for a sing

GSM8K

defined

GSM8K is a benchmark of 8,500 grade-school math word problems requiring 2–8 reasoning steps. Models are scored by whethe

BLEU score

defined

BLEU Bilingual Evaluation Understudy is an automated metric that measures how similar a machine-generated text is to one

FID (Fréchet Inception Distance)

defined

FID Fréchet Inception Distance is a metric that measures the quality of images generated by a model by comparing the sta

IoU (Intersection over Union)

defined

IoU Intersection over Union is a metric that measures the overlap between a predicted bounding box and a ground-truth bo

R²

defined

R² coefficient of determination measures how well a regression model's predictions match actual outcomes, on a scale fro

mAP (mean Average Precision)

defined

mAP mean Average Precision is a metric that evaluates object detection models by averaging precision across recall thres

pass@1

defined

pass@1 is the probability that a model's first generated solution passes the unit tests for a coding problem, computed f

Sensitivity measures how much a model's output changes in response to small changes in its input. In local AI, sensitivi

Learning paradigms23 terms · 14 defined

Reinforcement Learning (RL)

defined

Reinforcement Learning RL is a machine learning paradigm where an agent learns to make decisions by interacting with an

Self-Supervised Learning

defined

Self-supervised learning SSL is a training paradigm where a model learns representations from unlabeled data by creating

Supervised Learning

defined

Supervised learning is a training paradigm where a model learns to map inputs to outputs using labeled data — each train

Zero-Shot Learning

defined

Zero-shot learning is a capability where a model performs a task it was never explicitly trained on, using only a natura

Transfer Learning

defined

Transfer learning is a technique where a model trained on one task is reused as the starting point for a second task. In

Federated Learning

defined

Federated learning is a machine learning technique where a model is trained across multiple decentralized devices or ser

Few-Shot Learning

defined

Few-shot learning is a technique where a model performs a task after seeing only a small number of examples typically 2–

Unsupervised Learning

defined

Unsupervised learning is a machine learning paradigm where a model finds patterns in data without labeled examples. Unli

Contrastive Learning

defined

Contrastive learning is a self-supervised training method where a model learns to pull similar data points e.g., two aug

Deep Reinforcement Learning

defined

Deep Reinforcement Learning DRL combines deep neural networks with reinforcement learning, enabling agents to learn opti

Representation Learning

defined

Representation learning is the process by which a model automatically discovers the features or patterns in raw data tha

Continual Learning

defined

Continual learning also called lifelong learning is a machine learning paradigm where a model is trained on a sequence o

Meta-Learning

defined

Meta-learning, or 'learning to learn,' is a training paradigm where a model is exposed to many related tasks so it can q

Multi-Task Learning

defined

Multi-task learning MTL trains a single model on multiple related tasks simultaneously, sharing representations across t

Semi-Supervised Learning

Inverse Reinforcement Learning

stub

Ethics, safety & society23 terms · 19 defined

AI Safety

defined

AI safety refers to the set of practices and research aimed at ensuring that AI systems behave reliably, predictably, an

AI Alignment

defined

AI alignment refers to the challenge of ensuring that a model's outputs match the operator's intended goals and values.

AI Ethics

defined

AI ethics refers to the principles and practices that guide the responsible development and deployment of AI systems. Fo

Bias (AI/ML)

defined

Bias in AI/ML refers to systematic errors in model outputs that result from skewed training data, flawed assumptions, or

Algorithmic Bias

defined

Algorithmic bias refers to systematic and repeatable errors in a model's outputs that create unfair outcomes, such as pr

EU AI Act

defined

The EU AI Act is a regulatory framework from the European Union that classifies AI systems by risk level unacceptable, h

Explainability

defined

Explainability refers to the ability to understand and interpret why a model produces a specific output. For local AI op

Fairness (in AI)

defined

Fairness in AI refers to the absence of systematic bias in model outputs across different demographic groups. For operat

Interpretability

defined

Interpretability refers to the ability to understand and explain why a model produces a specific output. For local AI op

Privacy (in AI)

defined

Privacy in local AI refers to the operator's control over their data and model interactions, ensuring no data leaves the

AI Regulation

defined

AI regulation refers to laws, policies, and guidelines that govern the development, deployment, and use of AI systems. F

Adversarial Attack

defined

An adversarial attack is a technique where small, often imperceptible perturbations are added to an input to cause a mac

AI Governance

defined

AI Governance refers to the set of policies, processes, and technical controls that determine how a model is developed,

Adversarial Example

defined

An adversarial example is an input to a machine learning model that has been intentionally perturbed to cause a mispredi

Differential Privacy

defined

Differential Privacy is a mathematical framework that provides a formal guarantee that the output of an analysis reveals

Existential Risk (X-risk)

stub

Mechanistic Interpretability

defined

Mechanistic interpretability is the research approach of reverse-engineering neural networks into human-understandable a

Transparency (AI)

defined

Transparency in AI refers to the degree to which a model's behavior, training data, architecture, and decision-making pr

XAI (Explainable AI)

defined

Explainable AI XAI refers to methods that make the decisions of machine learning models understandable to humans. For lo

Accountability (AI)

defined

Accountability in AI means that the operator or organization deploying a model can be held responsible for its outputs a

Specialized domains21 terms · 17 defined

Computer Vision (domain)

defined

Computer vision is the field of AI that enables machines to interpret and process visual data—images, videos, or live ca

Self-Driving Cars

defined

Self-driving cars, also known as autonomous vehicles, use AI to perceive their environment and navigate without human in

AlphaFold

defined

AlphaFold is a deep learning model developed by DeepMind that predicts the 3D structure of proteins from their amino aci

AlphaGo

defined

AlphaGo is a computer program developed by DeepMind that plays the board game Go at a superhuman level. It combines deep

Autonomous Vehicles

defined

Autonomous vehicles are self-driving systems that use AI to perceive their environment, plan routes, and control vehicle

Robotics (AI)

defined

Robotics in AI refers to the integration of machine learning models into physical robots to enable perception, decision-

Healthcare AI

defined

Healthcare AI refers to machine learning models applied to medical data for tasks like diagnosis, treatment planning, dr

Recommender Systems

defined

Recommender systems are machine learning models that predict user preferences for items movies, products, content based

AI in Finance

defined

AI in Finance refers to the application of machine learning and deep learning models to financial tasks like fraud detec

AlphaZero

defined

AlphaZero is a reinforcement learning algorithm developed by DeepMind that learns to master board games Go, chess, shogi

Anomaly Detection

defined

Anomaly detection is the task of identifying data points, events, or patterns that deviate significantly from a dataset'

Fraud Detection

defined

Fraud detection is a machine learning task that identifies suspicious transactions, account activities, or user behavior

Medical Imaging (AI)

defined

Medical imaging AI refers to machine learning models trained to analyze medical scans like X-rays, CTs, MRIs, and pathol

Speech Processing

defined

Speech processing refers to the analysis, synthesis, and manipulation of human speech by AI models. Operators encounter

Algorithmic Trading

defined

Algorithmic trading uses computer programs to execute financial trades based on predefined rules, often involving statis

Drug Discovery (AI)

defined

Drug discovery with AI applies machine learning to the process of identifying and designing new pharmaceutical compounds

Game AI

defined

Game AI refers to the algorithms and systems that control non-player characters NPCs, opponents, and procedural content

OpenAI Gym / Gymnasium

stub

Bioinformatics

stub

Predictive Maintenance

stub

Reinforcement Learning Environments

stub

Data & datasets34 terms · 27 defined

Training Data

defined

Training data is the dataset used to teach a model its patterns and behaviors. For LLMs, this typically means trillions

ImageNet

defined

ImageNet is a large-scale image dataset containing over 14 million labeled images across 20,000 categories, organized by

MMLU

defined

MMLU Massive Multitask Language Understanding is a benchmark that tests a language model's knowledge across 57 subjects,

Data Augmentation

defined

Data augmentation is the technique of generating modified copies of existing training data to increase dataset size and

Feature Engineering

defined

Feature engineering is the process of transforming raw data into input variables features that improve model performance

HumanEval

defined

HumanEval is a benchmark dataset of 164 hand-written programming problems, each with a function signature, docstring, an

MNIST

defined

MNIST Modified National Institute of Standards and Technology is a dataset of 70,000 grayscale images of handwritten dig

Synthetic Data

defined

Synthetic data is artificially generated data used to train or fine-tune AI models, created by algorithms rather than co

COCO

defined

COCO Common Objects in Context is a large-scale image dataset created by Microsoft for object detection, segmentation, a

Cross-Validation

defined

Cross-validation is a technique for evaluating how well a model generalizes to unseen data by partitioning the dataset i

Data Labeling

defined

Data labeling is the process of annotating raw data text, images, audio with tags or categories that teach a model what

Data Pipeline

defined

A data pipeline is a sequence of automated steps that ingest, transform, and load data from source to destination. In lo

ETL

defined

ETL Extract, Transform, Load is a data pipeline process that pulls raw data from sources Extract, cleans or reformats it

Ground Truth

defined

Ground truth is the correct, real-world answer or label that a model is trained to predict or evaluated against. In supe

Test Data

defined

Test data is a set of examples used to evaluate a model's performance after training, distinct from the training data th

Validation Data

defined

Validation data is a subset of examples held back from training to evaluate how well a model generalizes to unseen input

Annotation

defined

Annotation is the process of adding labels, tags, or metadata to raw data text, images, audio to create a training datas

CIFAR-10/100

defined

CIFAR-10 and CIFAR-100 are datasets of 32x32 color images used for benchmarking image classification models. CIFAR-10 ha

Feature selection is the process of identifying and retaining only the most relevant input variables features for a mach

Imbalanced Data

defined

Imbalanced data refers to a dataset where the number of samples per class is significantly skewed, with one or more mino

Normalization

defined

Normalization is a data preprocessing step that rescales input values to a fixed range e.g., 0,1 or -1,1 or adjusts them

One-Hot Encoding

defined

One-hot encoding converts categorical data e.g., token IDs into binary vectors where only one element is 'hot' 1 and all

Concept Drift

defined

Concept drift is a change in the statistical properties of a target variable over time, causing a trained model to becom

Feature Scaling

defined

Feature scaling adjusts the range of numeric input values so that each feature contributes equally to a model's training

GLUE benchmark

defined

The GLUE General Language Understanding Evaluation benchmark is a collection of nine natural language understanding task

K-Fold Cross-Validation

defined

K-Fold Cross-Validation is a technique for evaluating a model's performance by splitting the dataset into K equal-sized

Standardization

defined

Standardization in local AI refers to the process of converting raw data into a consistent format that models can proces

Classical ML algorithms27 terms · 18 defined

XGBoost

defined

XGBoost Extreme Gradient Boosting is a gradient-boosted decision tree GBDT library optimized for structured/tabular data

Random Forest

defined

Random Forest is an ensemble machine learning method that builds multiple decision trees during training and outputs the

Decision Tree

defined

A decision tree is a supervised learning model that splits data into branches based on feature values, forming a tree-li

Gradient Boosting

defined

Gradient boosting is an ensemble machine learning technique that builds a strong predictive model by sequentially adding

K-Means Clustering

defined

K-Means Clustering is an unsupervised learning algorithm that partitions a dataset into K distinct, non-overlapping clus

LightGBM

defined

LightGBM is a gradient boosting framework that uses tree-based learning algorithms. It is designed for efficiency and sp

Principal Component Analysis (PCA)

defined

Principal Component Analysis PCA is a dimensionality reduction technique that transforms a high-dimensional dataset into

CatBoost

defined

CatBoost is a gradient boosting library developed by Yandex that handles categorical features automatically without manu

K-Nearest Neighbors (KNN)

defined

K-Nearest Neighbors KNN is a classical machine learning algorithm used for classification or regression. It works by fin

Linear Regression

defined

Linear regression is a statistical method that models the relationship between an input variable feature and an output v

Logistic Regression

defined

Logistic regression is a statistical model used for binary classification tasks, predicting the probability that an inpu

Support Vector Machine (SVM)

defined

A Support Vector Machine SVM is a supervised learning model that finds a hyperplane a decision boundary to separate data

t-SNE

defined

t-SNE t-distributed Stochastic Neighbor Embedding is a dimensionality reduction technique used to visualize high-dimensi

Q-Learning

defined

Q-Learning is a model-free reinforcement learning algorithm that learns an optimal action-selection policy by iterativel

UMAP

defined

UMAP Uniform Manifold Approximation and Projection is a dimensionality reduction technique used to visualize high-dimens

DBSCAN

defined

DBSCAN Density-Based Spatial Clustering of Applications with Noise is an unsupervised clustering algorithm that groups d

Hidden Markov Model (HMM)

stub

Markov Decision Process (MDP)

defined

A Markov Decision Process MDP is a mathematical framework for modeling decision-making in environments where outcomes ar

Monte Carlo Methods

defined

Monte Carlo methods are a class of algorithms that use repeated random sampling to approximate numerical results. In loc

Hierarchical Clustering

stub

Conditional Random Field (CRF)

stub

Linear Discriminant Analysis (LDA)

Particle Swarm Optimization

stub

MLOps & deployment16 terms · 11 defined

MLOps

defined

MLOps Machine Learning Operations is the practice of managing the lifecycle of machine learning models from development

LLMOps

defined

LLMOps Large Language Model Operations is the set of practices for deploying, monitoring, and maintaining LLMs in produc

Model Deployment

defined

Model deployment is the process of making a trained AI model available for inference in a production environment. For lo

A/B Testing

defined

A/B Testing in ML compares two model variants — a control current production model and a treatment candidate model — by

Inference API

defined

An inference API is a programmatic interface that accepts input data like a prompt and returns a model's output like gen

Model Monitoring

defined

Model Monitoring continuously tracks the health and performance of deployed ML models by measuring: 1 prediction quality

Model Serving

defined

Model serving is the process of making a trained AI model available for inference via an API or local runtime. For opera

Real-Time Inference

defined

Real-time inference means the model processes input and returns output fast enough to feel instantaneous to a human user

Model Versioning tracks the evolution of ML models over time by assigning unique identifiers to each trained artifact an

A Model Registry is a centralized catalog that stores and versions trained models along with their metadata — training d

Shadow Deployment

defined

Shadow Deployment also called dark launch or shadow mode runs a candidate model in production alongside the current mode

Missing a term?

The glossary grows when we find gaps.

If you searched for an AI term and we don't have a definition, email Contact support with the term. We prioritize terms that are practical for running AI locally over purely academic ones, but we'll consider any reasonable suggestion.