jthomas.site// notebook · v.4.2026
Machine Learning, Visualized · Vol. XVII

Recurrent
Networks

A network with a memory. At each step it sees a new symbol and updates a hidden state — a running summary of everything it has seen so far. The same weights, applied over and over.

The concept

A recurrent neural network (RNN) is a network with a loop: at each timestep, it takes a new input and the previous hidden state, and produces a new hidden state.

The defining equation is h_t = tanh(W_xh · x_t + W_hh · h_{t−1} + b). The same weight matrices are reused at every step — that's why an RNN can read sequences of any length with a fixed parameter count.

Watch the hidden state evolve as the network reads a sequence. Different dimensions of the hidden vector specialize to track different patterns — vowels, recurring symbols, position in the sequence.

Why ML cares

Before transformers ate the field in 2018, RNNs (and their LSTM/GRU variants) were the dominant architecture for everything sequential: language modeling, machine translation, speech recognition, time-series forecasting.

They're still the right tool when sequences are very long, irregular, or when memory is constrained — RNN inference is constant-memory per step, while attention is quadratic. Modern "linear attention" variants (Mamba, RWKV) revive the recurrent recipe with new tricks.

Try this
  1. Edit the sequence (only a b r c d and space). Hit Replay and watch the hidden state matrix fill in column by column.
  2. Observe how a repeated pattern like abracadabra creates a recurring rhythm in the hidden state — certain dimensions oscillate as familiar chunks reappear.
  3. Click ↻ weights to randomize. The same input produces a wildly different state trajectory — but the structure of "memory of recent history" is preserved.
· Same recurrent cell, drawn out across time. The hidden state h flows left-to-right; each step takes a new input x_t and the previous h_{t−1}, and the same weight matrices W_xh and W_hh are reused at every timestep — the architectural definition of "recurrent."
Where you've seen this04 examples
Pre-2018 machine translation

Google Translate's neural backend launched in 2016 with stacked LSTMs (encoder–decoder with attention). Held the field until transformers swept it away in 2018.

Speech recognition

Bidirectional LSTMs powered Siri, Alexa, and Google's voice search through the late 2010s — taking a stream of audio frames and producing a stream of phonemes.

Time series and weather

RNNs forecast electricity demand, financial volatility, and short-term weather. Newer state-space models (S4, Mamba) are direct descendants of the recurrent idea.

Karpathy's char-RNN

The famous 2015 blog post that generated Shakespeare, Wikipedia, and source code letter-by-letter. The first time many people saw a neural net produce structured prose.

Further reading