Forward
Propagation
A neural network is a pipeline. Each layer multiplies the previous activations by a weight matrix, adds a bias, and bends the result through a nonlinearity. Watch the signal travel.
A forward pass is the act of running input through a neural network to produce an output, layer by layer.
Each layer does three things: (1) multiply the previous activations by a weight matrix, (2) add a bias vector, (3) bend the result through a nonlinearity. The output of layer ℓ becomes the input of layer ℓ+1.
The math is one line: a⁽ˡ⁾ = ƒ(W⁽ˡ⁾ a⁽ˡ⁻¹⁾ + b⁽ˡ⁾). Stack three of these and you have the network on the right — input → hidden 1 → hidden 2 → output, with tanh between hidden layers and softmax at the top.
Forward propagation is what every deployed neural network does at inference time — every Gemini response, every Stable Diffusion image, every Tesla camera frame is the result of running a forward pass on a trained network. Training is forward propagation followed by backpropagation; inference is just forward.
The same forward pass also defines what the network can express: a stack of linear layers with bends in between can approximate any continuous function (universal approximation theorem). The depth and the bends are what make it powerful.
- Hit Replay flow. Watch the signal hop column by column — pre-activations (the weighted sums) lit up first, then post-activations (after the bend).
- Drag x₁ and x₂. The network is fixed; only the inputs change. Watch how a small shift in one number ripples differently through every neuron.
- Try the four corner presets. Each one is a different region of the input space, and the softmax outputs change smoothly as you move between them — a learned 2D classifier in action.
The "forward pass" is what every deployed neural network does at inference. Every prediction your iPhone makes, every Gemini response, every YouTube recommendation — a forward pass through trained weights.
Convolutional networks process images by stacking these forward passes — but the matrices are organized as filters that slide over patches of the image. The basic recipe (multiply, bias, bend, repeat) is unchanged.
Each transformer layer is a forward pass with a special structure: queries, keys, values are all computed by linear layers, then combined with attention weights. Stack 100 of these and you have GPT-4.
Multi-layer perceptrons remain the workhorse for tabular data — credit risk, ad-CTR prediction, propensity models. Each prediction is a small forward pass through a few densely connected layers.
- 3Blue1Brown — But what is a neural network? video Grant Sanderson · The series that uses the same column-of-neurons diagram you see on this page. Episodes 1 and 2 cover forward propagation in detail.
- Neural Networks and Deep Learning · Ch. 1 free book Michael Nielsen · A genuinely beautiful exposition that derives the forward pass equation from MNIST classification — and ends with working Python code.
- TensorFlow Playground interactive Smilkov et al. · A more elaborate version of the canvas above with multiple datasets, configurable depth, and live training. Excellent next step.
- Universal Approximation Theorem reference Wikipedia · The theorem that says the kind of network you see on this page can approximate any continuous function — given enough neurons. Why depth works.