Bias-Variance
Tradeoff
A linear fit on quadratic data misses the curve. A 12th-degree polynomial fits every point exactly — and bends wildly between them. Somewhere in the middle, just enough flexibility, sits the model you want.
Total prediction error decomposes into bias² + variance + irreducible noise.
Bias: how far is the model's average prediction from the truth, across many imagined training datasets? A linear model on curved data has high bias — it can't capture the curve, no matter which dataset you train it on. Underfitting.
Variance: how much do the model's predictions change when the training set changes? A 12-degree polynomial has high variance — different training samples produce wildly different curves. Overfitting.
The sweet spot is whatever model complexity minimizes the sum. For a fixed dataset size, that's a specific degree, depth, or capacity.
Bias-variance is the conceptual frame for nearly every model-selection decision: "should I use a deeper network, more features, more regularization?" Each choice slides the dial.
It's also the lens through which the double-descent phenomenon was discovered — overparameterized neural networks (more parameters than data points) sometimes show lower test error than smaller models, breaking the classical U-shape. The story is more nuanced than the textbook curve, but the textbook curve is still the right starting point.
- Drag the polynomial degree slider from 1 to 15. The dashed vertical line in the bottom plot moves with you — bias² (ink) shrinks, variance (oxblood) grows. Watch the connector lines between the spread of fits (top) and the variance reading (bottom).
- Click Resample several times. Watch how degree-1 fits stay similar while degree-15 fits look completely different — that bundle of wiggly curves is variance.
- Toggle Show noise. The data splits into the smooth true signal sin(πx) and the vertical sticks of added noise. The σ² readout in the gloss tells you the irreducible piece — no model, however clever, can beat it.
- bias² · how far the model's average prediction (across many imagined training datasets) is from the truth. Squared so positive.
- variance · how much the model's prediction wobbles when you swap in a different training set of the same size.
- σ² (sigma-squared) · irreducible noise in the data — the labels themselves are noisy. No model can beat this; it sets the error floor.
- Error(x) · expected squared error at a point x, averaged over all the training datasets you might have drawn.
"Linear regression vs gradient-boosted trees vs deep neural net" is a bias-variance question. Linear regression is high-bias; deep networks are flexible (low-bias) but variance-prone unless regularized.
Held-out-set accuracy approximates the bias-variance sum on unseen data. The standard model-selection workflow — try many model variants, pick whichever gives best validation accuracy — is this curve, made empirical.
Random forests average many high-variance trees to drive variance down without raising bias. Bagging is, formally, a variance-reduction technique.
Empirical work (Belkin et al., 2019) showed that classical bias-variance curves break for very large neural networks. Generalization sometimes improves past the interpolation threshold — a still-debated frontier.
- The Elements of Statistical Learning · Ch. 7 textbook Hastie, Tibshirani, Friedman · The classical chapter on model assessment, with the bias-variance decomposition. Free PDF.
- Reconciling modern machine-learning practice and the classical bias-variance trade-off paper Belkin et al. (2019) · The double-descent paper. Shows empirically that deep networks past the interpolation threshold can generalize better, breaking the U.
- Cornell CS4780 · Bias-Variance Tradeoff course notes Kilian Weinberger · The lucid academic introduction with the math worked out cleanly. Pairs nicely with this page's interactive.
- Understanding the Bias-Variance Tradeoff essay Scott Fortmann-Roe · A patient visual essay with diagrams of dart-throwing as bias-variance metaphors. Often the first link in undergrad ML courses.