Neural network architectures

Variational Autoencoder (VAE)

A Variational Autoencoder (VAE) is a generative neural network that learns a compressed latent representation of input data (like images) and can generate new samples from that latent space. In local AI image generation, the VAE acts as the 'image decoder' that converts latent tensors (compressed by the diffusion model) into viewable pixel images. Operators encounter VAEs in Stable Diffusion workflows: the model uses a VAE to compress images into a smaller latent space for efficient diffusion, then decodes the final latent back to full resolution. A different VAE (e.g., the 'taesd' tiny autoencoder) can be swapped to speed up decoding at the cost of quality.

Deeper dive

The VAE consists of an encoder that maps input data to a mean and variance in latent space, and a decoder that reconstructs the data from a sampled latent vector. In Stable Diffusion, the VAE is separate from the UNet: the UNet denoises in latent space, and the VAE decodes the final latent to pixels. Operators can swap VAEs to change output quality or speed. For example, the default SDXL VAE produces sharp images but uses ~340 MB VRAM; the 'taesd' VAE uses ~10 MB and decodes 2-3x faster but introduces artifacts. VAEs are also used for inpainting (encoding masked regions) and upscaling (e.g., 4x-UltraSharp VAE). When loading models in ComfyUI or Automatic1111, the VAE is loaded separately and can be selected from a dropdown. VRAM impact: a VAE typically uses 200-500 MB during decoding, which matters on 4-6 GB cards.

Practical example

On an RTX 3060 12 GB, generating a 1024x1024 image with SDXL takes ~8 GB VRAM during diffusion. The final VAE decode adds ~500 MB and takes ~1 second. Switching to the 'taesd' VAE reduces decode time to ~0.3 seconds and uses only ~50 MB, but the image may show slight grid-like artifacts. On a 6 GB card, using the default VAE can push VRAM over the limit, causing a crash; taesd avoids that.

Workflow example

In Automatic1111, under 'Settings > Stable Diffusion > VAE', operators can select a .vae.pt file. When generating, the UI loads the VAE into VRAM after the UNet finishes. In ComfyUI, a separate 'VAEDecode' node takes the latent from the KSampler and outputs an image. Operators often download custom VAEs (e.g., 'vae-ft-mse-840000') from Hugging Face and place them in the models/VAE folder. In llama.cpp, VAEs are not used directly; they are specific to image generation pipelines.

Reviewed by Fredoline Eruo. See our editorial policy.

Buyer guides

When it doesn't work