Image Inpainting
Image inpainting is the task of filling missing or masked regions of an image with plausible, contextually consistent content. Operators encounter it in local AI workflows when using diffusion models (e.g., Stable Diffusion, FLUX) to remove objects, repair damaged photos, or extend images beyond their original borders. The model receives the original image plus a binary mask indicating which pixels to regenerate, and outputs a completed image. Performance depends on VRAM: a 6 GB card can run lightweight inpainting models at 512×512, while 12+ GB allows higher resolutions or larger models without system-RAM offload.
Deeper dive
Inpainting models are typically fine-tuned from text-to-image diffusion models by conditioning on both the masked image and the mask itself. During inference, the model denoises only the masked region while preserving unmasked pixels. Common approaches include: (1) dedicated inpainting checkpoints (e.g., SD 1.5 Inpainting, SDXL Inpainting) that accept a concatenated 4-channel input (RGB + mask); (2) general-purpose models that support inpainting via a mask argument (e.g., FLUX.1 fill); (3) post-processing tools like Lama or LaMa that use CNNs for faster but lower-quality results. Operators must ensure the mask is binary (0 for keep, 255 for inpaint) and that the image resolution matches the model's training resolution to avoid artifacts. VRAM scales with resolution: 1024×1024 inpainting with SDXL requires ~8 GB; FLUX.1 fill at 1024×1024 needs ~12 GB with fp16.
Practical example
An operator wants to remove a person from a vacation photo. Using Stable Diffusion WebUI (AUTOMATIC1111), they upload the image, paint a mask over the person with the brush tool, select the SD 1.5 Inpainting checkpoint, set denoising strength to 0.75, and generate. The model fills the masked area with background that matches the surroundings. On an RTX 3060 12 GB, a 512×512 inpaint takes ~3 seconds; on an M1 Mac with 8 GB unified memory, it takes ~8 seconds due to shared RAM overhead.
Workflow example
In ComfyUI, the operator loads an image, draws a mask using the 'MaskEditor' node, then passes both to a 'VAEEncode (for Inpainting)' node that concatenates the mask into the latent. The 'KSampler' node uses a dedicated inpainting checkpoint (e.g., 'sd_xl_base_1.0_inpainting.safetensors'). The operator sets steps=20, CFG=7, and denoise=1.0. The workflow outputs the inpainted image. For batch processing, they can use the 'ImageInpainting' pipeline from Hugging Face diffusers: pipe = StableDiffusionInpaintPipeline.from_pretrained('runwayml/stable-diffusion-inpainting').
Reviewed by Fredoline Eruo. See our editorial policy.