by Stability AI
The pioneering open-weight image-gen family. SDXL remains widely deployed; SD 3.5 Large is the architectural successor. Massive finetune ecosystem (Pony, Illustrious, NoobAI, dozens of community models).
Start with SDXL 1.0 via ComfyUI on RTX 3060 12GB — SDXL is the most mature, best-documented, and most fine-tuned open-weight image generation model. It generates 1024×1024 images in ~6 seconds on RTX 3060 12GB and has >50,000 community LoRAs/checkpoints on CivitAI. For higher quality, SD3.5 Large uses a DiT architecture (same lineage as Flux) and generates in ~12 seconds on RTX 4090 24 GB at 1024×1024. Skip SD3 Medium — it had anatomy-quality issues at release. Skip SD1.5 — the 512×512 native resolution is obsolete, and SDXL upscales to the same output size with better quality. SDXL and SD3.5 use the Stability AI Community License — free for non-commercial and small-business use (<$1M annual revenue). For commercial use above $1M, a Stability AI Creator License is required.
For single-user generation: ComfyUI with SDXL 1.0 FP16 on RTX 3060 12GB — ~6 sec/image at 1024×1024, 20-step DPM++ 2M scheduler. Automatic1111 WebUI is the alternative with larger extension ecosystem. For SD3.5 Large: ComfyUI with the SD3 DiT node on RTX 4090 24 GB — ~12 sec/image at 1024×1024 FP16. The triple-text-encoder architecture (CLIP-L + CLIP-G + T5-XXL) requires ~15 GB just for text encoders at FP16 — use Q8 T5-XXL via bitsandbytes to fit on 24 GB. For server/batching: ComfyUI API mode with job queue — generate at queue depth. For LoRA training: Kohya SS SDXL LoRA training on RTX 3060 12GB — ~8 GB VRAM for rank-16 LoRA, ~30 min per epoch on 1K images. For SDXL Lightning/LCM/DMD distillation: these reduce steps to 4-8 for ~2 sec/image on RTX 4090 with minor quality tradeoff. See GPU buyer guide.
Verify Stable Diffusion runs on your specific hardware before committing money.