Computer vision

DALL-E

DALL-E is a family of text-to-image generative models developed by OpenAI. Operators encounter it as a cloud-only API service — unlike Stable Diffusion or FLUX, there is no open-weight release or local runtime. DALL-E 3, the current version, generates 1024×1024 images from natural-language prompts. The model uses a diffusion process conditioned on CLIP text embeddings. For operators running local AI, DALL-E is relevant as a benchmark for image quality and prompt adherence, but it cannot be self-hosted; local alternatives like Stable Diffusion or FLUX are required for offline or private use.

Deeper dive

DALL-E 1 (2021) used a discrete VAE and autoregressive transformer to generate images from text. DALL-E 2 (2022) switched to a diffusion model with a CLIP text encoder, producing 1024×1024 images with improved coherence. DALL-E 3 (2023) further improved prompt following by training on synthetic captions generated by an image captioner. All versions are cloud-only; OpenAI provides access via API (pay-per-image) or through ChatGPT Plus. The model's architecture is not publicly documented in full detail, and no weights have been released. For operators, DALL-E's closed nature means it cannot be used in offline workflows, fine-tuned, or run on local hardware. Its main impact on the local AI community is as a quality target: local models like SDXL, SD3, and FLUX aim to match or exceed DALL-E 3's output quality while remaining open and self-hostable.

Practical example

An operator with an RTX 4090 (24 GB VRAM) cannot run DALL-E locally. To generate an image with DALL-E 3, they must call the OpenAI API: curl https://api.openai.com/v1/images/generations -H "Authorization: Bearer $OPENAI_API_KEY" -H "Content-Type: application/json" -d '{"model":"dall-e-3","prompt":"a cat wearing a hat","n":1,"size":"1024x1024"}'. Each image costs ~$0.04. For local generation, they would instead use Stable Diffusion 3.5 (8 GB VRAM) or FLUX.1-dev (12 GB VRAM) via ComfyUI.

Workflow example

In a typical workflow, an operator prototyping a generative AI application might start with DALL-E via API for quick iteration, then switch to a local model for production or privacy. For example, they might use curl or a Python script with openai library to generate initial samples. Once the prompt style is finalized, they replicate the workflow locally using diffusers or ComfyUI with a local model like SDXL. The operator must ensure the local model's prompt adherence matches DALL-E; this often requires prompt engineering or fine-tuning.

Reviewed by Fredoline Eruo. See our editorial policy.

Buyer guides

When it doesn't work