Singular Value
Decomposition
Every linear transformation, no matter how strange-looking, is secretly the same three-step move: rotate, stretch along axes, rotate again. SVD is the recipe that reveals it.
SVD says every linear map factors into three steps: rotate, stretch along axes, rotate again — written A = U Σ Vᵀ.
Send the unit circle through any matrix and you always get an ellipse. The singular values σ₁ ≥ σ₂ ≥ 0 are the lengths of that ellipse's semi-axes. V names the input directions (in the original circle) that line up with those axes; U names where they end up in the output.
It's the deepest "structure theorem" in elementary linear algebra: arbitrary matrix multiplication is secretly always the same shape — and unlike eigen-decomposition, SVD works for any matrix, even non-square or singular ones.
Truncating the small singular values gives the best low-rank approximation of any matrix (the Eckart–Young theorem). This single fact powers image compression, latent semantic indexing, recommender systems, and every modern matrix-factorization method.
It's also the connecting tissue of the curriculum: PCA is just the SVD of mean-centered data; the pseudoinverse used in least squares is built from SVD; and the spectral norm — how much a matrix can stretch a unit vector — is exactly σ₁.
- Click Slanted, then Replay. Watch the four stages: identity → Vᵀ rotates → Σ stretches → U rotates again.
- Click Singular — σ₂ = 0, so the ellipse collapses to a line segment. The matrix has rank 1.
- Click Symmetric — U and V come out the same (up to sign). Symmetric SVD is eigendecomposition.
The 2006 Netflix challenge was won by SVD-based methods. Simon Funk's FunkSVD decomposed the (users × movies) rating matrix into low-rank factors. Today every major recommender — TikTok, YouTube, Spotify, Amazon — uses some descendant of this idea.
Modern LLM fine-tuning rarely updates the full weight matrix of a model with billions of parameters. Instead, Low-Rank Adaptation approximates the update as a low-rank product UV — the same low-rank truncation idea SVD pioneered. A 7B-parameter model can be adapted with millions, not billions, of trainable weights.
Compute SVD on a (documents × words) matrix and the leading singular components capture topics. A search for "auto" can match documents containing "car" because they share a latent topic dimension. This was the first scalable semantic search method, predating embeddings by a decade.
An image is a matrix of pixels; keep only the components with the largest singular values and throw the rest away. Astronomers use SVD to denoise telescope images; medical imaging reconstructs MRIs from undersampled data. The Eckart–Young theorem says this truncation is provably the best low-rank approximation that exists.
- MIT 18.06 — Singular Value Decomposition lecture Gilbert Strang · A 50-minute lecture from the man who probably taught your professor's professor SVD.
- Try This at Home (Netflix Prize) blog post Simon Funk (2006) · The single blog post that won most of the Netflix Prize. A working model in <100 lines, with the geometric intuition spelled out plainly.
- LoRA: Low-Rank Adaptation of Large Language Models paper Hu et al. (2021) · The paper that made fine-tuning trillion-parameter models practical. The math is the same low-rank truncation you watched on this page.
- Singular Value Decomposition as Simply as Possible essay Gregory Gundersen · A clean derivation that connects all four views of SVD — geometric, eigen-, low-rank, and Eckart–Young — in one essay.