Machine Learning, Visualized · Vol. V

Singular Value
Decomposition

Every linear transformation, no matter how strange-looking, is secretly the same three-step move: rotate, stretch along axes, rotate again. SVD is the recipe that reveals it.

The concept

SVD says every linear map factors into three steps: rotate, stretch along axes, rotate again — written A = U Σ Vᵀ.

Send the unit circle through any matrix and you always get an ellipse. The singular values σ₁ ≥ σ₂ ≥ 0 are the lengths of that ellipse's semi-axes. V names the input directions (in the original circle) that line up with those axes; U names where they end up in the output.

It's the deepest "structure theorem" in elementary linear algebra: arbitrary matrix multiplication is secretly always the same shape — and unlike eigen-decomposition, SVD works for any matrix, even non-square or singular ones.

Why ML cares

Truncating the small singular values gives the best low-rank approximation of any matrix (the Eckart–Young theorem). This single fact powers image compression, latent semantic indexing, recommender systems, and every modern matrix-factorization method.

It's also the connecting tissue of the curriculum: PCA is just the SVD of mean-centered data; the pseudoinverse used in least squares is built from SVD; and the spectral norm — how much a matrix can stretch a unit vector — is exactly σ₁.

Try this

Click Slanted, then Replay. Watch the four stages: identity → Vᵀ rotates → Σ stretches → U rotates again.
Click Singular — σ₂ = 0, so the ellipse collapses to a line segment. The matrix has rank 1.
Click Symmetric — U and V come out the same (up to sign). Symmetric SVD is eigendecomposition.

step 0/3

Where you've seen this 04 examples

The Netflix Prize and modern recommenders

The 2006 Netflix challenge was won by SVD-based methods. Simon Funk's FunkSVD decomposed the (users × movies) rating matrix into low-rank factors. Today every major recommender — TikTok, YouTube, Spotify, Amazon — uses some descendant of this idea.

LoRA — how Gemini and GPT get fine-tuned

Modern LLM fine-tuning rarely updates the full weight matrix of a model with billions of parameters. Instead, Low-Rank Adaptation approximates the update as a low-rank product UV — the same low-rank truncation idea SVD pioneered. A 7B-parameter model can be adapted with millions, not billions, of trainable weights.

Latent Semantic Analysis in search

Compute SVD on a (documents × words) matrix and the leading singular components capture topics. A search for "auto" can match documents containing "car" because they share a latent topic dimension. This was the first scalable semantic search method, predating embeddings by a decade.

Image and signal compression

An image is a matrix of pixels; keep only the components with the largest singular values and throw the rest away. Astronomers use SVD to denoise telescope images; medical imaging reconstructs MRIs from undersampled data. The Eckart–Young theorem says this truncation is provably the best low-rank approximation that exists.