Inpainting
Filling in masked regions of an image based on context + optional prompts. Essential for object removal, background replacement, content-aware fills.
Setup walkthrough
- Install ComfyUI via Stability Matrix (one-click install).
- ComfyUI Manager → Install Models → search "flux1-fill-dev" → download (23 GB) OR "stable-diffusion-2-inpainting" (5 GB — lighter).
- Load a Flux Fill workflow (from the workflow library) or SD inpainting workflow.
- In the workflow: (a) Load Image → select your photo, (b) Right-click → "Open in MaskEditor" → paint over the object you want to remove (red brush), (c) Prompt: "empty stone floor, seamless" or leave blank for context-aware fill, (d) For Flux Fill: steps=20, guidance=3-5. For SD inpainting: steps=25, denoise=0.85.
- Queue → first inpainted image in 8-20 seconds (SD) or 10-25 seconds (Flux Fill) on 12+ GB GPU.
- Use case examples: remove tourists from vacation photos, delete watermarks, fill gaps in scanned documents, remove logos.
The cheap setup
Used RTX 3060 12 GB (~$200-250, see /hardware/rtx-3060-12gb). Runs SD 2.0 inpainting at 8-15 seconds per 1024×1024 edit. Flux Fill at FP8 (GGUF quant, ~12 GB) at 20-35 seconds per edit. For simple object removal from photos, SD inpainting on 12 GB is perfectly adequate. Pair with Ryzen 5 5600 + 32 GB DDR4 + 1TB NVMe. Total: ~$390-440. For light inpainting (blemish removal, small object deletion), even 8 GB cards work with SD 1.5 inpainting at 5-10 seconds.
The serious setup
Used RTX 3090 24 GB ($700-900, see /hardware/rtx-3090). Runs Flux Fill Dev FP16 at 8-15 seconds per 1024×1024 edit — the highest-quality local inpainting available. Handles complex fills (replacing entire backgrounds, removing large objects with detailed replacement). For production photo editing workflows (50-100 images/day): the speed is acceptable for interactive use. Total: ~$1,800-2,200. RTX 4090 24 GB ($1,600, see /hardware/rtx-4090) drops Flux Fill to 3-6 seconds — fast enough for real-time preview.
Common beginner mistake
The mistake: Masking the object with a sharp-edged rectangle and wondering why the filled region has visible seams. Why it fails: Inpainting models blend at mask edges — a sharp rectangular mask creates a visible boundary where the new content abruptly meets the old. The model fills within the mask, but the transition is jarring. The fix: Use soft-edged masks. In ComfyUI's MaskEditor, use a soft brush with 20-40% hardness. Feather the mask edges by 10-30 pixels. This gives the model a transition zone to blend the new content with the original. Also: include some surrounding context in the mask (extend 20-50 pixels beyond the object) so the model has reference pixels to match lighting and texture. Soft masks + context overlap = seamless inpainting.
Recommended setup for inpainting
Browse all tools for runtimes that fit this workload.
Reality check
Image gen is compute-bound, not bandwidth-bound. VRAM matters for the resolution + LoRA training stack, but FP16 TFLOPS is what decides Flux throughput. The 5080's compute advantage over 5070 Ti shows here in ways it doesn't on LLM inference.
Common mistakes
- Buying for VRAM ceiling without checking compute (16 GB Flux Dev FP16 doesn't fit anyway)
- Skipping LoRA training requirements (24 GB minimum, 32 GB comfortable for Flux)
- Underestimating ComfyUI's multi-model VRAM appetite vs A1111's single-pipeline
- Using Q4 quantized image models — quality drop is more visible than on LLMs
What breaks first
The errors most operators hit when running inpainting locally. Each links to a diagnose+fix walkthrough.
Before you buy
Verify your specific hardware can handle inpainting before committing money.