jthomas.site// notebook · v.4.2026
Machine Learning, Visualized · Vol. XX

The
Bottleneck

An autoencoder learns to compress an image through a tiny k-dimensional bottleneck and reconstruct it on the other side. Whatever survives the squeeze is the data's essence — its latent structure, discovered without supervision.

The concept

An autoencoder is a neural network whose goal is to copy its input to its output — through a deliberately narrow middle layer.

If the middle (the code or latent vector) has only k dimensions, the network must discover a k-dimensional summary of the input that's just expressive enough to reconstruct it. With k = 2, you can plot the codes in a scatter and see the data organize itself by class without ever being told the labels.

Linear autoencoders converge to PCA. Nonlinear ones learn nonlinear manifolds — which is why autoencoders feel like "PCA, but smarter."

Why ML cares

The bottleneck idea recurs everywhere in modern ML. U-Nets in image segmentation. The encoder of every machine-translation model. The compressor in every audio codec built since 2020. Stable Diffusion's "VAE" stage that compresses images to a small latent before the diffusion runs.

Even when not literally an autoencoder, "force the model through a narrow representation" is the trick that makes self-supervised pre-training work. The narrowness is a forcing function for abstraction.

Try this
  1. Hit Run forward. Particles flow from input → encoder → bottleneck → decoder → output. The waist is exactly k numbers wide. Bottleneck isn't a metaphor — it's a literal squeeze in the network's shape.
  2. Drag the latent dim slider from 2 to 10. The funnel waist gets fatter; the reconstruction sharpens. With k=2, the AE can only keep two numbers — so reconstruction collapses to the average.
  3. Pick the noisy input and switch to Denoising. Three panels: clean / noisy input / reconstruction. The AE projects the noisy version back onto the manifold of real shapes — denoising for free.
  4. Switch to Latent scatter with k=2. Each preset gets one dot. Classes separate in 2D without ever being told the labels. That clustering is the unsupervised structure the bottleneck found.
· A funnel: input pixels condense through narrowing layers to a tight latent of k numbers, then expand back out into a reconstruction. Whatever survives the squeeze is the data's essence.
Where you've seen this04 examples
Stable Diffusion's VAE stage

Stable Diffusion compresses 512×512 images into a 64×64×4 latent before running the diffusion process. That compression is an autoencoder — much of the model's efficiency comes from never working with raw pixels.

Anomaly detection

Train an autoencoder on normal data; flag anything it can't reconstruct cleanly as anomalous. Used for credit-card fraud, manufacturing defects, network-intrusion detection.

Image denoising

Add Gaussian noise to images, train an AE to map noisy → clean. The network learns "the manifold of real images" and projects everything onto it. The principle behind Photoshop's "Reduce Noise" filter and a thousand mobile camera enhancements.

Audio codecs

Modern neural audio codecs (SoundStream, Encodec, Lyra) are autoencoders trained to compress speech and music to ~1 kbps with surprisingly good quality. Used in WhatsApp calls, Discord, and the audio side of Gemini Live.

Further reading