Generative AI

Generative Model

A generative model is a type of machine learning model that learns the underlying distribution of training data and can then produce new samples resembling that data. In local AI, generative models like large language models (LLMs) or diffusion models generate text, images, or audio from prompts or latent vectors. They differ from discriminative models, which classify or label inputs. For operators, the key practical distinction is that generative models require significant VRAM and compute for inference, especially at larger sizes (e.g., 70B parameters).

Deeper dive

Generative models capture the joint probability distribution P(X, Y) or P(X) of the data, enabling them to create new instances. Common types include autoregressive models (e.g., GPT, Llama), which predict the next token sequentially; variational autoencoders (VAEs); generative adversarial networks (GANs); and diffusion models (e.g., Stable Diffusion). In local deployment, autoregressive LLMs dominate text generation, while diffusion models are popular for image generation. The choice of model size and quantization directly impacts VRAM usage and inference speed. For example, a 7B parameter model at 4-bit quantization uses ~4 GB VRAM, while a 70B model uses ~40 GB, dictating hardware requirements.

Practical example

An operator running Llama 3.1 8B on an RTX 4090 (24 GB VRAM) can generate text at ~50 tokens/sec using Q4_K_M quantization. The same model on an RTX 3060 (12 GB VRAM) might fit but run slower due to partial offloading. For image generation, Stable Diffusion XL requires ~8 GB VRAM; an RTX 3060 can run it but may struggle with high-resolution outputs.

Workflow example

When using Ollama, running ollama run llama3.1:8b loads a generative model. The runtime allocates VRAM for weights and KV cache. If VRAM is insufficient, Ollama offloads layers to system RAM, reducing tokens/sec from ~40 to ~5. Operators monitor VRAM usage with nvidia-smi or ollama ps to ensure the model fits.

Reviewed by Fredoline Eruo. See our editorial policy.

Buyer guides

When it doesn't work