An interactive encyclopedia

Machine Learning,
Visualized

A visual encyclopedia of the ideas behind modern machine learning — one entry at a time, from the math at the bottom to the models on top.

Linear Algebra · the foundations

VOL. I

Matrix Transformations

A 2×2 matrix is a recipe for reshaping the plane. Two arrows tell you everything.

VOL. II

Eigenvalues & Eigenvectors

The directions a transform leaves alone. Iterate them and you've built PageRank.

VOL. III

Gradients & Derivatives

Tangent lines on curves, gradient arrows on surfaces. The slope every learning algorithm walks down.

VOL. IV

Principal Components

Find the grain of a data cloud. The first eigenvector of covariance is the line of best fit.

VOL. V

Singular Value Decomposition

Every linear map is a rotation, a stretch, and another rotation. SVD finds the choreography.

Probability · the language of uncertainty

VOL. VI

Probability Distributions

Gaussian, Bernoulli, Poisson and more. Tweak parameters; sample; watch the histogram converge.

VOL. VII

Bayes' Theorem

A prior, some evidence, a posterior. The single equation behind probabilistic reasoning.

Neural networks · the bend in the line

VOL. VIII

The Perceptron

One neuron. Two weights and a bias. It draws a single straight line — and learns where to put it.

VOL. IX

A Field Guide to Activations

Sigmoid, tanh, ReLU, leaky, GELU. The bend that makes a stack of layers more than one matrix.

VOL. X

Forward Propagation

Watch the signal travel: weighted sums, bent through nonlinearities, layer by layer into a prediction.

VOL. XI

Backpropagation

The signal flows forward; the error flows back. Chain rule, applied recursively — the engine that learns.

VOL. XII

Loss Landscapes

Training is descent through high-dimensional terrain. Bowls, ravines, saddles — and what they teach.

VOL. XIII

A Race of Optimizers

SGD, Momentum, RMSprop, Adam — same hill, four different rules for descending it.

Architectures · stacking the blocks

VOL. XIV

The MLP Builder

Stack layers, pick activations, watch a multi-layer perceptron carve up the plane in real time.

VOL. XV

Convolutional Networks

Slide a small kernel over an image. Edges, textures, parts — features compose themselves.

VOL. XVI

An Image Classifier, end to end

Pixels in, label out. Follow the signal through conv blocks, pooling, the final softmax.

VOL. XVII

Recurrent Networks

A loop in the graph: the hidden state is the network's running thought as it reads a sequence.

VOL. XVIII

Gates & Long Memory

A vanilla RNN forgets quickly. Gates — small learned controllers — decide what to keep.

VOL. XIX

Holding it Together

Two simple tricks that make deep nets trainable: random silencing, and standardized activations.

Modern deep learning · the present tense

VOL. XX

The Bottleneck

Squeeze input through a narrow channel and decode it back. The channel learns the essence.

VOL. XXI

A Continent of Latent Space

Make every region of latent space decode to something plausible — then sample it for free.

VOL. XXII

The Adversaries

Two networks lock horns: a forger and a critic. They drag each other to perfection.

VOL. XXIII

What the model looks at

A learned, soft, content-addressed lookup. The mechanism that ate machine learning.

VOL. XXIV

The Transformer

Stack attention blocks. Skip connections, layer norm, multi-head — the architecture behind almost everything in 2024.

VOL. XXV

A Geometry of Meaning

Words become vectors. Similar meanings cluster; analogies become arithmetic.

Reinforcement learning · trial and reward

VOL. XXVI

Markov Decision Processes

A grid world, a few states, a reward. The minimal stage on which every RL algorithm performs.

VOL. XXVII

Q-Learning

An agent stumbles around a grid; the values of each move slowly fill in. No model, no plan — just trial, error, and bookkeeping.

VOL. XXVIII

Policy Gradients

Learn the policy directly. The probability of each action drifts up or down with the reward it brings.

VOL. XXIX

Multi-Armed Bandits

Many slot machines, one budget. Explore the unknown lever or exploit the best one so far?

Advanced & practical · the rest of the iceberg

VOL. XXX

Transfer Learning

A network trained for one task is usually halfway to the next. Freeze, fine-tune, ship.

VOL. XXXI

Hyperparameter Tuning

Grid search, random search, Bayesian optimization — three ways to comb a high-dimensional knob space.

VOL. XXXII

Bias-Variance Tradeoff

Too simple a model misses the signal; too complex, it memorizes the noise. The dial every model has.

VOL. XXXIII

Model Interpretability

SHAP and LIME open the black box just enough to ask "which feature pushed the prediction which way?"

VOL. XXXIV

Federated Learning

A model trained across many devices, each keeping its data local. The server only sees the gradients.