RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
Glossary / Neural network architectures / Variational Autoencoder (VAE)
Neural network architectures

Variational Autoencoder (VAE)

A Variational Autoencoder (VAE) is a generative neural network that learns a compressed latent representation of input data (like images) and can generate new samples from that latent space. In local AI image generation, the VAE acts as the 'image decoder' that converts latent tensors (compressed by the diffusion model) into viewable pixel images. Operators encounter VAEs in Stable Diffusion workflows: the model uses a VAE to compress images into a smaller latent space for efficient diffusion, then decodes the final latent back to full resolution. A different VAE (e.g., the 'taesd' tiny autoencoder) can be swapped to speed up decoding at the cost of quality.

Deeper dive

The VAE consists of an encoder that maps input data to a mean and variance in latent space, and a decoder that reconstructs the data from a sampled latent vector. In Stable Diffusion, the VAE is separate from the UNet: the UNet denoises in latent space, and the VAE decodes the final latent to pixels. Operators can swap VAEs to change output quality or speed. For example, the default SDXL VAE produces sharp images but uses ~340 MB VRAM; the 'taesd' VAE uses ~10 MB and decodes 2-3x faster but introduces artifacts. VAEs are also used for inpainting (encoding masked regions) and upscaling (e.g., 4x-UltraSharp VAE). When loading models in ComfyUI or Automatic1111, the VAE is loaded separately and can be selected from a dropdown. VRAM impact: a VAE typically uses 200-500 MB during decoding, which matters on 4-6 GB cards.

Practical example

On an RTX 3060 12 GB, generating a 1024x1024 image with SDXL takes ~8 GB VRAM during diffusion. The final VAE decode adds ~500 MB and takes ~1 second. Switching to the 'taesd' VAE reduces decode time to ~0.3 seconds and uses only ~50 MB, but the image may show slight grid-like artifacts. On a 6 GB card, using the default VAE can push VRAM over the limit, causing a crash; taesd avoids that.

Workflow example

In Automatic1111, under 'Settings > Stable Diffusion > VAE', operators can select a .vae.pt file. When generating, the UI loads the VAE into VRAM after the UNet finishes. In ComfyUI, a separate 'VAEDecode' node takes the latent from the KSampler and outputs an image. Operators often download custom VAEs (e.g., 'vae-ft-mse-840000') from Hugging Face and place them in the models/VAE folder. In llama.cpp, VAEs are not used directly; they are specific to image generation pipelines.

Reviewed by Fredoline Eruo. See our editorial policy.

Buyer guides
  • Best GPU for local AI →
  • Best laptop for local AI →
  • Best Mac for local AI →
When it doesn't work
  • CUDA out of memory →
  • Ollama running slowly →
  • ROCm not detected →