jthomas.site// notebook · v.4.2026
An interactive encyclopedia

Machine Learning,
Visualized

A visual encyclopedia of the ideas behind modern machine learning — one entry at a time, from the math at the bottom to the models on top.

Linear Algebra · the foundations
VOL. I
Matrix Transformations
A 2×2 matrix is a recipe for reshaping the plane. Two arrows tell you everything.
VOL. II
Eigenvalues & Eigenvectors
The directions a transform leaves alone. Iterate them and you've built PageRank.
VOL. III
Gradients & Derivatives
Tangent lines on curves, gradient arrows on surfaces. The slope every learning algorithm walks down.
VOL. IV
Principal Components
Find the grain of a data cloud. The first eigenvector of covariance is the line of best fit.
VOL. V
Singular Value Decomposition
Every linear map is a rotation, a stretch, and another rotation. SVD finds the choreography.
Probability · the language of uncertainty
VOL. VI
Probability Distributions
Gaussian, Bernoulli, Poisson and more. Tweak parameters; sample; watch the histogram converge.
VOL. VII
Bayes' Theorem
A prior, some evidence, a posterior. The single equation behind probabilistic reasoning.
Neural networks · the bend in the line
VOL. VIII
The Perceptron
One neuron. Two weights and a bias. It draws a single straight line — and learns where to put it.
VOL. IX
A Field Guide to Activations
Sigmoid, tanh, ReLU, leaky, GELU. The bend that makes a stack of layers more than one matrix.
VOL. X
Forward Propagation
Watch the signal travel: weighted sums, bent through nonlinearities, layer by layer into a prediction.
VOL. XI
Backpropagation
The signal flows forward; the error flows back. Chain rule, applied recursively — the engine that learns.
VOL. XII
Loss Landscapes
Training is descent through high-dimensional terrain. Bowls, ravines, saddles — and what they teach.
VOL. XIII
A Race of Optimizers
SGD, Momentum, RMSprop, Adam — same hill, four different rules for descending it.
Architectures · stacking the blocks
VOL. XIV
The MLP Builder
Stack layers, pick activations, watch a multi-layer perceptron carve up the plane in real time.
VOL. XV
Convolutional Networks
Slide a small kernel over an image. Edges, textures, parts — features compose themselves.
VOL. XVI
An Image Classifier, end to end
Pixels in, label out. Follow the signal through conv blocks, pooling, the final softmax.
VOL. XVII
Recurrent Networks
A loop in the graph: the hidden state is the network's running thought as it reads a sequence.
VOL. XVIII
Gates & Long Memory
A vanilla RNN forgets quickly. Gates — small learned controllers — decide what to keep.
VOL. XIX
Holding it Together
Two simple tricks that make deep nets trainable: random silencing, and standardized activations.
Modern deep learning · the present tense
VOL. XX
The Bottleneck
Squeeze input through a narrow channel and decode it back. The channel learns the essence.
VOL. XXI
A Continent of Latent Space
Make every region of latent space decode to something plausible — then sample it for free.
VOL. XXII
The Adversaries
Two networks lock horns: a forger and a critic. They drag each other to perfection.
VOL. XXIII
What the model looks at
A learned, soft, content-addressed lookup. The mechanism that ate machine learning.
VOL. XXIV
The Transformer
Stack attention blocks. Skip connections, layer norm, multi-head — the architecture behind almost everything in 2024.
VOL. XXV
A Geometry of Meaning
Words become vectors. Similar meanings cluster; analogies become arithmetic.
Reinforcement learning · trial and reward
VOL. XXVI
Markov Decision Processes
A grid world, a few states, a reward. The minimal stage on which every RL algorithm performs.
VOL. XXVII
Q-Learning
An agent stumbles around a grid; the values of each move slowly fill in. No model, no plan — just trial, error, and bookkeeping.
VOL. XXVIII
Policy Gradients
Learn the policy directly. The probability of each action drifts up or down with the reward it brings.
VOL. XXIX
Multi-Armed Bandits
Many slot machines, one budget. Explore the unknown lever or exploit the best one so far?
Advanced & practical · the rest of the iceberg
VOL. XXX
Transfer Learning
A network trained for one task is usually halfway to the next. Freeze, fine-tune, ship.
VOL. XXXI
Hyperparameter Tuning
Grid search, random search, Bayesian optimization — three ways to comb a high-dimensional knob space.
VOL. XXXII
Bias-Variance Tradeoff
Too simple a model misses the signal; too complex, it memorizes the noise. The dial every model has.
VOL. XXXIII
Model Interpretability
SHAP and LIME open the black box just enough to ask "which feature pushed the prediction which way?"
VOL. XXXIV
Federated Learning
A model trained across many devices, each keeping its data local. The server only sees the gradients.