Neural network architectures

Generative Adversarial Network (GAN)

A Generative Adversarial Network (GAN) is a machine learning architecture where two neural networks—a generator and a discriminator—compete in a zero-sum game. The generator creates synthetic data (e.g., images, audio) from random noise, while the discriminator tries to distinguish real data from fakes. Training alternates between improving the discriminator's accuracy and updating the generator to produce more convincing outputs. GANs are used for image synthesis, style transfer, and data augmentation. For local AI operators, GANs are less common than autoregressive or diffusion models for text/image generation, but they appear in niche tasks like super-resolution or domain adaptation.

Deeper dive

GANs were introduced by Ian Goodfellow in 2014 and popularized image generation before diffusion models. The generator maps latent vectors (e.g., 100-dimensional Gaussian noise) to high-dimensional outputs (e.g., 256x256 images). The discriminator is a binary classifier. Training is notoriously unstable—mode collapse (generator producing limited variety) and non-convergence are common. Variants like DCGAN (convolutional), StyleGAN (style-based), and CycleGAN (unpaired translation) address specific use cases. For local operators, GANs are typically run via PyTorch or TensorFlow, not llama.cpp/Ollama. They require moderate VRAM (e.g., StyleGAN2 ~4 GB at 512x512) but lack the text-prompt interface of modern diffusion models. GANs remain relevant for real-time applications (e.g., video frame interpolation) where inference speed matters.

Practical example

A local operator wanting to upscale old photos could use ESRGAN (Enhanced Super-Resolution GAN). On an RTX 3060 12 GB, ESRGAN upscales a 256x256 image to 1024x1024 in ~2 seconds, consuming ~3 GB VRAM. The generator outputs a 4x larger image with realistic textures, while the discriminator is only used during training—inference uses the generator alone.

Workflow example

To run a pre-trained GAN locally, operators typically use a Python script with PyTorch. For example, cloning the ESRGAN repository and running python test.py --model RRDB_ESRGAN_x4.pth --input lowres.png loads the generator weights (~16 MB) into VRAM and produces an upscaled output. No Ollama or llama.cpp integration exists—GANs are standalone models loaded via torch.load().

Reviewed by Fredoline Eruo. See our editorial policy.

Buyer guides

When it doesn't work