RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
  1. >
  2. Home
  3. /Troubleshooting
  4. /ComfyUI CUDA out of memory
fatal✓Editorial·Reviewed May 2026

ComfyUI CUDA OOM — stop the workflow from eating your VRAM

ComfyUI-specific CUDA OOM: what triggers it (loaded checkpoints, IPAdapter/ControlNet overhead, missing --lowvram), how to fix it, and the ComfyUI settings that matter.

ComfyUINVIDIA CUDAStable DiffusionFlux
By Fredoline Eruo · Last verified 2026-05-08

Diagnostic order — most likely first

#1

Multiple checkpoints loaded simultaneously

Diagnose

Workflow chains multiple models (SDXL base + refiner, or Flux + upscaler). Each checkpoint holds its full weights in VRAM. `nvidia-smi` shows VRAM jump by the sum of all model sizes on workflow start.

Fix

Use ComfyUI's 'Unload Checkpoint' node between model switches. Or set up a checkpoint-switching node (many community nodes offer this — RGThree, Efficiency Nodes). The key: only one checkpoint should be live at a time.

Best GPU for Stable Diffusion →
#2

IPAdapter / ControlNet models eating VRAM alongside the main model

Diagnose

OOM fires specifically when IPAdapter or ControlNet nodes are in the workflow but doesn't happen on a plain txt2img. Each ControlNet unit can add 1-3 GB, and IPAdapter adds another 1-2 GB.

Fix

Use the 'Unload ControlNet' / 'Unload IPAdapter' nodes after the nodes that need them. If using multiple ControlNets, consider lowering the ControlNet strength — full-strength runs full precision, half-strength can use FP16, saving VRAM.

Best GPU for local AI →
#3

--lowvram flag not set (ComfyUI loads full model at startup)

Diagnose

ComfyUI launched without `--lowvram`. VRAM spikes during model loading with CLIP loading immediately, even before generation starts. OOM hits on model load, not generation.

Fix

Launch ComfyUI with `python main.py --lowvram`. This tells ComfyUI to load only the U-Net or DiT when needed and offload it between runs. Also use `--normalvram` (slightly less aggressive offloading but faster switching) if `--lowvram` creates too much switching latency.

#4

Custom node memory leak (repeated node execution without cleanup)

Diagnose

OOM happens after running the workflow 3-5 times in sequence without restarting ComfyUI. VRAM usage climbs incrementally each run. A custom node isn't releasing tensors between executions.

Fix

Add a 'Free Memory' node (from various community packs) at the end of your workflow. Identify the leaking node by removing custom nodes one at a time until the leak stops. Report the node on the author's GitHub.

#5

Model unloading not triggering between workflow runs

Diagnose

VRAM stays high after a workflow completes. Next workflow OOMs because the previous model's weights were never unloaded. ComfyUI's heuristic for 'can I unload this?' didn't fire.

Fix

Insert explicit 'Unload CLIP' and 'Unload Checkpoint' nodes at the end of each workflow. Or add a 'GC (Garbage Collect)' node. Also check ComfyUI Manager settings: enable 'Aggressive model unloading' and 'VRAM cleanup between prompts.'

Frequently asked questions

Why does ComfyUI OOM but Automatic1111 works fine with the same model?

ComfyUI's node graph doesn't automatically unload models between nodes unless told to. Automatic1111 aggressively unloads/loads between operations. ComfyUI gives you more control — and more ways to accidentally keep everything loaded. The fix is explicit unload nodes.

Can I run ComfyUI on 8 GB VRAM?

Yes, with `--lowvram` and careful node management. Stick to SD 1.5 models (3-4 GB) for Flux-like quality at low VRAM, or use Flux on a highly quantized version (NF4). Avoid IPAdapter + ControlNet simultaneously. 12 GB is the comfort floor for SDXL/Flux workflows.

Does ComfyUI work with multiple GPUs?

Poorly. ComfyUI doesn't natively support tensor parallelism. You can direct different models to different GPUs with custom nodes (e.g., base model on GPU 0, control net on GPU 1), but it's manual and fiddly. For multi-GPU image gen, SwarmUI does a better job.

What's the minimum VRAM for ComfyUI + Flux in 2026?

Flux Dev FP16 needs ~24 GB for full-quality generation at 1024×1024. Flux Dev FP8 works on 12 GB with `--lowvram` enabled. Flux Schnell FP8 can run on 8 GB at reduced resolution. SDXL runs comfortably on 8 GB. For a combo workflow (Flux base + SDXL refiner + ControlNet), 16 GB is the practical floor — and you'll still need explicit unload nodes between stages.

Why does ComfyUI's --lowvram flag help with OOM but slow things down?

`--lowvram` tells ComfyUI to load only the current model's weights into VRAM and aggressively offload everything else between executions. The offload/reload cycle adds 5-15 seconds per workflow run. It's a trade-off between VRAM capacity and speed. `--normalvram` is a middle ground: less aggressive offloading, faster switching, but still some VRAM management. If you're on 8-12 GB, `--lowvram` is mandatory; on 16+ GB, try `--normalvram` first.

Is there a way to see which node is using the most VRAM in my workflow?

Not natively in ComfyUI. Use external monitoring: `nvidia-smi -l 1` during workflow execution. Each node's execution typically causes a VRAM spike that you can correlate by watching `nvidia-smi` output at the same time the ComfyUI console prints the node's execution log. If VRAM spikes by 6 GB when a specific checkpoint-load node runs, that's your largest consumer. Community tools like 'VRAM Debug' nodes (available via ComfyUI Manager) can also log per-node VRAM usage.

Related troubleshooting

CUDA out of memory

Why CUDA OOM happens during local LLM inference and image gen, how to confirm the real cause, and the four real fixes (smaller quant, shorter context, gradient checkpointing, or more VRAM).

ComfyUI stuck on 'loading' / first run never completes

ComfyUI hanging on first launch is usually a custom-node conflict, model file corruption, or python env collision with A1111. Bisect via --disable-all-custom-nodes and you'll catch 80% of cases in 30 seconds.

When the fix is hardware

A surprising fraction of troubleshooting tickets resolve to: this card doesn't have enough VRAM for what you're asking it to do. If you're hitting OOM after every reasonable fix, or your GPU genuinely can't fit the model you need, it's upgrade time:

  • Best GPU for local AI
  • Best laptop for local AI
  • Best Mac for local AI

Where next?

All troubleshooting guides
OrBest GPU for local AIWill it run on my hardware?