The
Adversaries
Two networks lock horns. The generator forges samples; the discriminator tells real from fake. Each tries to outdo the other, and the equilibrium is photorealism.
A generative adversarial network (GAN) trains two models simultaneously: a generator that maps random noise to fake samples, and a discriminator that scores each input as real or fake.
The discriminator is trained to maximize the score gap between real and fake. The generator is trained to minimize that gap — to make its forgeries indistinguishable. They're locked in a minimax game, and the optimum is when the discriminator can no longer tell the difference (50% accuracy).
Watch the dynamic on a 2D toy: real points come from a fixed distribution; fake points come from the generator. The discriminator's decision boundary contracts and morphs as the generator improves.
From 2014 to 2020, GANs produced the sharpest image-generation results in ML — StyleGAN faces, BigGAN ImageNet samples, image-to-image translation (CycleGAN, pix2pix). Diffusion models eventually overtook them on quality, but GANs remain the reference for fast, single-pass generation.
The adversarial framing reappears everywhere: domain adaptation, super-resolution (ESRGAN), audio synthesis (HiFi-GAN), and even RLHF — where a learned reward model plays the role of the discriminator that pushes a language model toward human-preferred outputs.
- Hit Train. The generator's points (orange) start as a tight blob near origin and spread to match the real-data ring. Watch the discriminator's heatmap shift as it loses traction.
- Watch the loss strip above the canvas. G-loss (orange) and D-loss (ink) push against each other and oscillate — they don't both fall like a normal training curve. That zig-zag is the adversarial game.
- Try spiral or two clusters. Some shapes are harder — the generator may collapse to a single mode (a known GAN failure called mode collapse). Uncovered modes pulse on the canvas.
- Slide the discriminator strength knob. Too strong, and the generator gets no useful signal; too weak, and it goes nowhere. Tuning this balance is the eternal GAN-training pain.
The "this person does not exist" website is StyleGAN samples. Trained on a few hundred thousand celebrity photos, it produces photorealistic faces that have never existed. Cited as the moment GANs hit human-quality.
CycleGAN, pix2pix — turn satellite photos into maps, summer into winter, horses into zebras, sketches into paintings. All variations on the GAN recipe with conditioning.
HiFi-GAN and similar models upsample low-bitrate audio to studio quality. The discriminator's pressure for "realism" produces sharper sound than MSE-trained models.
Training a chatbot via human preferences uses a reward model that scores outputs as "good" or "bad." The chatbot is trained to maximize that score — adversarial training, dressed up.
- Generative Adversarial Networks paper Goodfellow et al. (2014) · The original GAN paper. Started the entire field; surprisingly readable.
- A Style-Based Generator Architecture (StyleGAN) paper Karras et al. (2019) · The StyleGAN paper. Photorealistic face generation, with controllable style at every layer.
- From GAN to WGAN essay Lilian Weng · The math of why training GANs is hard, and the variants (Wasserstein GAN, gradient penalty) that fixed it.
- This Person Does Not Exist demo Anonymous · A StyleGAN sample served on every page reload. The cultural moment when "computer-generated face" became indistinguishable from a real photo.