fatalEditorialReviewed May 2026

ComfyUI CUDA OOM — stop the workflow from eating your VRAM

ComfyUI-specific CUDA OOM: what triggers it (loaded checkpoints, IPAdapter/ControlNet overhead, missing --lowvram), how to fix it, and the ComfyUI settings that matter.

ComfyUINVIDIA CUDAStable DiffusionFlux
By Fredoline Eruo · Last verified 2026-05-08

Diagnostic order — most likely first

#1

Multiple checkpoints loaded simultaneously

Diagnose

Workflow chains multiple models (SDXL base + refiner, or Flux + upscaler). Each checkpoint holds its full weights in VRAM. `nvidia-smi` shows VRAM jump by the sum of all model sizes on workflow start.

Fix

Use ComfyUI's 'Unload Checkpoint' node between model switches. Or set up a checkpoint-switching node (many community nodes offer this — RGThree, Efficiency Nodes). The key: only one checkpoint should be live at a time.

#2

IPAdapter / ControlNet models eating VRAM alongside the main model

Diagnose

OOM fires specifically when IPAdapter or ControlNet nodes are in the workflow but doesn't happen on a plain txt2img. Each ControlNet unit can add 1-3 GB, and IPAdapter adds another 1-2 GB.

Fix

Use the 'Unload ControlNet' / 'Unload IPAdapter' nodes after the nodes that need them. If using multiple ControlNets, consider lowering the ControlNet strength — full-strength runs full precision, half-strength can use FP16, saving VRAM.

#3

--lowvram flag not set (ComfyUI loads full model at startup)

Diagnose

ComfyUI launched without `--lowvram`. VRAM spikes during model loading with CLIP loading immediately, even before generation starts. OOM hits on model load, not generation.

Fix

Launch ComfyUI with `python main.py --lowvram`. This tells ComfyUI to load only the U-Net or DiT when needed and offload it between runs. Also use `--normalvram` (slightly less aggressive offloading but faster switching) if `--lowvram` creates too much switching latency.

#4

Custom node memory leak (repeated node execution without cleanup)

Diagnose

OOM happens after running the workflow 3-5 times in sequence without restarting ComfyUI. VRAM usage climbs incrementally each run. A custom node isn't releasing tensors between executions.

Fix

Add a 'Free Memory' node (from various community packs) at the end of your workflow. Identify the leaking node by removing custom nodes one at a time until the leak stops. Report the node on the author's GitHub.

#5

Model unloading not triggering between workflow runs

Diagnose

VRAM stays high after a workflow completes. Next workflow OOMs because the previous model's weights were never unloaded. ComfyUI's heuristic for 'can I unload this?' didn't fire.

Fix

Insert explicit 'Unload CLIP' and 'Unload Checkpoint' nodes at the end of each workflow. Or add a 'GC (Garbage Collect)' node. Also check ComfyUI Manager settings: enable 'Aggressive model unloading' and 'VRAM cleanup between prompts.'

Frequently asked questions

Why does ComfyUI OOM but Automatic1111 works fine with the same model?

ComfyUI's node graph doesn't automatically unload models between nodes unless told to. Automatic1111 aggressively unloads/loads between operations. ComfyUI gives you more control — and more ways to accidentally keep everything loaded. The fix is explicit unload nodes.

Can I run ComfyUI on 8 GB VRAM?

Yes, with `--lowvram` and careful node management. Stick to SD 1.5 models (3-4 GB) for Flux-like quality at low VRAM, or use Flux on a highly quantized version (NF4). Avoid IPAdapter + ControlNet simultaneously. 12 GB is the comfort floor for SDXL/Flux workflows.

Does ComfyUI work with multiple GPUs?

Poorly. ComfyUI doesn't natively support tensor parallelism. You can direct different models to different GPUs with custom nodes (e.g., base model on GPU 0, control net on GPU 1), but it's manual and fiddly. For multi-GPU image gen, SwarmUI does a better job.

What's the minimum VRAM for ComfyUI + Flux in 2026?

Flux Dev FP16 needs ~24 GB for full-quality generation at 1024×1024. Flux Dev FP8 works on 12 GB with `--lowvram` enabled. Flux Schnell FP8 can run on 8 GB at reduced resolution. SDXL runs comfortably on 8 GB. For a combo workflow (Flux base + SDXL refiner + ControlNet), 16 GB is the practical floor — and you'll still need explicit unload nodes between stages.

Why does ComfyUI's --lowvram flag help with OOM but slow things down?

`--lowvram` tells ComfyUI to load only the current model's weights into VRAM and aggressively offload everything else between executions. The offload/reload cycle adds 5-15 seconds per workflow run. It's a trade-off between VRAM capacity and speed. `--normalvram` is a middle ground: less aggressive offloading, faster switching, but still some VRAM management. If you're on 8-12 GB, `--lowvram` is mandatory; on 16+ GB, try `--normalvram` first.

Is there a way to see which node is using the most VRAM in my workflow?

Not natively in ComfyUI. Use external monitoring: `nvidia-smi -l 1` during workflow execution. Each node's execution typically causes a VRAM spike that you can correlate by watching `nvidia-smi` output at the same time the ComfyUI console prints the node's execution log. If VRAM spikes by 6 GB when a specific checkpoint-load node runs, that's your largest consumer. Community tools like 'VRAM Debug' nodes (available via ComfyUI Manager) can also log per-node VRAM usage.

Related troubleshooting

When the fix is hardware

A surprising fraction of troubleshooting tickets resolve to: this card doesn't have enough VRAM for what you're asking it to do. If you're hitting OOM after every reasonable fix, or your GPU genuinely can't fit the model you need, it's upgrade time: