Outpainting
Extending images beyond their original borders. Useful for aspect ratio changes, scene expansion, panoramic creation.
Setup walkthrough
- Install ComfyUI via Stability Matrix.
- ComfyUI Manager → Install Models → search "flux1-fill-dev" (23 GB) or "stable-diffusion-2-inpainting" (5 GB) for lighter weight.
- Load an outpainting workflow (search ComfyUI workflow library for "outpaint"). The workflow:
- Load Image → Pad Image for Outpainting (adds empty canvas in chosen direction)
- Connect to Flux Fill / SD Inpaint model
- Prompt: "extend the image naturally" or specific: "continue the beach scene to the left"
- Queue
- First outpainted image in 10-30 seconds. The model fills the padded region while matching the original's style, lighting, and content.
- Use cases: change aspect ratio (square→16:9), expand product photos for banners, create panoramas from single shots, extend backgrounds for design.
- For character outpainting: use ControlNet + IP-Adapter alongside the fill model to maintain character consistency in the extended region.
The cheap setup
Used RTX 3060 12 GB ($200-250, see /hardware/rtx-3060-12gb). Runs SD 2.0 inpainting in outpainting mode at 10-20 seconds per edge extension. For a 1024×1024 → 1920×1024 panorama (adding 448px to each side), expect 20-40 seconds total. Flux Fill at FP8 (12 GB) at 25-45 seconds for the same operation but higher quality. Pair with Ryzen 5 5600 + 32 GB DDR4 + 1TB NVMe. Total: ~$390-440. Outpainting is slightly heavier than inpainting because the filled area is often larger.
The serious setup
Used RTX 3090 24 GB ($700-900, see /hardware/rtx-3090). Runs Flux Fill Dev FP16 for outpainting at 10-20 seconds per edge extension — the quality difference vs. SD is significant for complex scenes (architecture, landscapes). For production graphic design workflows (daily outpainting for social media banners, print layouts), the speed is acceptable. Total: ~$1,800-2,200. RTX 4090 ($1,600) drops outpainting to 4-8 seconds — viable for interactive "expand canvas" in photo editors.
Common beginner mistake
The mistake: Outpainting without providing a text prompt, relying entirely on "context-aware fill," then getting blurry or nonsensical extended regions. Why it fails: Without a prompt, the model tries to extend the image based purely on visual patterns — it tends to repeat nearby textures or generate generic blur. The model doesn't know what SHOULD be there. The fix: Always provide a prompt describing what's in the extended region. "Extend the forest scene to the left, continuing the path and adding more pine trees" works 10× better than prompt="". For better results, outpaint in smaller increments (100-200px at a time) rather than trying to fill 1000px in one go. Each small extension gives the model the previous extension as context, building a coherent scene iteratively.
Recommended setup for outpainting
Browse all tools for runtimes that fit this workload.
Reality check
Image gen is compute-bound, not bandwidth-bound. VRAM matters for the resolution + LoRA training stack, but FP16 TFLOPS is what decides Flux throughput. The 5080's compute advantage over 5070 Ti shows here in ways it doesn't on LLM inference.
Common mistakes
- Buying for VRAM ceiling without checking compute (16 GB Flux Dev FP16 doesn't fit anyway)
- Skipping LoRA training requirements (24 GB minimum, 32 GB comfortable for Flux)
- Underestimating ComfyUI's multi-model VRAM appetite vs A1111's single-pipeline
- Using Q4 quantized image models — quality drop is more visible than on LLMs
What breaks first
The errors most operators hit when running outpainting locally. Each links to a diagnose+fix walkthrough.
Before you buy
Verify your specific hardware can handle outpainting before committing money.