Recurrent
Networks
A network with a memory. At each step it sees a new symbol and updates a hidden state — a running summary of everything it has seen so far. The same weights, applied over and over.
A recurrent neural network (RNN) is a network with a loop: at each timestep, it takes a new input and the previous hidden state, and produces a new hidden state.
The defining equation is h_t = tanh(W_xh · x_t + W_hh · h_{t−1} + b). The same weight matrices are reused at every step — that's why an RNN can read sequences of any length with a fixed parameter count.
Watch the hidden state evolve as the network reads a sequence. Different dimensions of the hidden vector specialize to track different patterns — vowels, recurring symbols, position in the sequence.
Before transformers ate the field in 2018, RNNs (and their LSTM/GRU variants) were the dominant architecture for everything sequential: language modeling, machine translation, speech recognition, time-series forecasting.
They're still the right tool when sequences are very long, irregular, or when memory is constrained — RNN inference is constant-memory per step, while attention is quadratic. Modern "linear attention" variants (Mamba, RWKV) revive the recurrent recipe with new tricks.
- Edit the sequence (only a b r c d and space). Hit Replay and watch the hidden state matrix fill in column by column.
- Observe how a repeated pattern like abracadabra creates a recurring rhythm in the hidden state — certain dimensions oscillate as familiar chunks reappear.
- Click ↻ weights to randomize. The same input produces a wildly different state trajectory — but the structure of "memory of recent history" is preserved.
Google Translate's neural backend launched in 2016 with stacked LSTMs (encoder–decoder with attention). Held the field until transformers swept it away in 2018.
Bidirectional LSTMs powered Siri, Alexa, and Google's voice search through the late 2010s — taking a stream of audio frames and producing a stream of phonemes.
RNNs forecast electricity demand, financial volatility, and short-term weather. Newer state-space models (S4, Mamba) are direct descendants of the recurrent idea.
The famous 2015 blog post that generated Shakespeare, Wikipedia, and source code letter-by-letter. The first time many people saw a neural net produce structured prose.
- The Unreasonable Effectiveness of Recurrent Neural Networks essay Andrej Karpathy (2015) · The blog post that launched a thousand char-RNN demos. Still the clearest "look what these things can do" introduction.
- Understanding LSTM Networks essay Christopher Olah · The diagrams everyone has seen. Bridges from vanilla RNNs to LSTMs and is the natural next read after this page.
- On the difficulty of training recurrent networks paper Pascanu et al. (2013) · The vanishing/exploding gradient analysis that motivated all subsequent gating mechanisms.
- Mamba: Linear-Time Sequence Modeling paper Gu & Dao (2023) · The state-space model that revived the recurrent idea — competitive with transformers, with linear scaling and constant memory.