jthomas.site// notebook · v.4.2026
Machine Learning, Visualized · Vol. X

Forward
Propagation

A neural network is a pipeline. Each layer multiplies the previous activations by a weight matrix, adds a bias, and bends the result through a nonlinearity. Watch the signal travel.

The concept

A forward pass is the act of running input through a neural network to produce an output, layer by layer.

Each layer does three things: (1) multiply the previous activations by a weight matrix, (2) add a bias vector, (3) bend the result through a nonlinearity. The output of layer ℓ becomes the input of layer ℓ+1.

The math is one line: a⁽ˡ⁾ = ƒ(W⁽ˡ⁾ a⁽ˡ⁻¹⁾ + b⁽ˡ⁾). Stack three of these and you have the network on the right — input → hidden 1 → hidden 2 → output, with tanh between hidden layers and softmax at the top.

Why ML cares

Forward propagation is what every deployed neural network does at inference time — every Gemini response, every Stable Diffusion image, every Tesla camera frame is the result of running a forward pass on a trained network. Training is forward propagation followed by backpropagation; inference is just forward.

The same forward pass also defines what the network can express: a stack of linear layers with bends in between can approximate any continuous function (universal approximation theorem). The depth and the bends are what make it powerful.

Try this
  1. Hit Replay flow. Watch the signal hop column by column — pre-activations (the weighted sums) lit up first, then post-activations (after the bend).
  2. Drag x₁ and x₂. The network is fixed; only the inputs change. Watch how a small shift in one number ripples differently through every neuron.
  3. Try the four corner presets. Each one is a different region of the input space, and the softmax outputs change smoothly as you move between them — a learned 2D classifier in action.
Where you've seen this 04 examples
Every model inference, ever

The "forward pass" is what every deployed neural network does at inference. Every prediction your iPhone makes, every Gemini response, every YouTube recommendation — a forward pass through trained weights.

Image classification

Convolutional networks process images by stacking these forward passes — but the matrices are organized as filters that slide over patches of the image. The basic recipe (multiply, bias, bend, repeat) is unchanged.

Transformer attention

Each transformer layer is a forward pass with a special structure: queries, keys, values are all computed by linear layers, then combined with attention weights. Stack 100 of these and you have GPT-4.

Tabular ML

Multi-layer perceptrons remain the workhorse for tabular data — credit risk, ad-CTR prediction, propensity models. Each prediction is a small forward pass through a few densely connected layers.

Further reading