Machine Learning, Visualized · Vol. IV

Principal
Components

A cloud of data has a grain — directions along which it spreads more, and directions where it barely moves. Find those directions and you have found what the data is really about. The math is just eigenvectors of the covariance matrix.

The concept

PCA finds the directions in your data along which it varies the most.

Picture a cloud of points. There's a "long way" through it and a "short way" through it. PC₁ is the long way; PC₂ is perpendicular and is the second-longest. These directions are, by recipe, the eigenvectors of the data's covariance matrix — the matrix that measures how each pair of features moves together.

The eigenvalue λ for each direction is the variance along it. Big λ = the data spreads a lot in that direction (informative). Small λ = barely any spread (discardable). PCA is just: keep the directions with the biggest λ's.

Why ML cares

PCA is the simplest way to compress data without losing much: keep only the directions that carry the most variance, throw the rest away. A 10,000-pixel face becomes a 50-number signature with almost no information lost — those signatures are called eigenfaces.

The same idea, different data, gives latent topics in text, motion modes in molecular dynamics, and the first stage of nearly every classical visualization pipeline. PCA is also the linear special case of every nonlinear dim-reduction method (t-SNE, UMAP, autoencoders).

Try this

Try Slanted blob — PC₁ runs along the long axis. PC₂ is perpendicular and much shorter.
Click Isotropic — λ₁ ≈ λ₂. The cloud is round, so neither direction is meaningfully "principal."
Switch to III. Reduce — every point slides onto PC₁. That single number per point is the entire 1D compression of the dataset.

Where you've seen this 04 examples

Eigenfaces — face recognition, 1991

Turk and Pentland ran PCA on a database of face photos and found the leading principal components looked like ghostly composite faces. Any new face could be reconstructed as a linear combination of, say, fifty eigenfaces. This was the dominant face-recognition technique through the 2000s.

Genetics maps reveal ancestry

Run PCA on the genomes of thousands of Europeans and the first two principal components reproduce a recognizable map of Europe — without telling the algorithm anything about geography. Norwegians cluster up-and-left; Greeks down-and-right. Continental ancestry is baked into the eigenvectors.

Stock-market "factors"

Quants run PCA on stock returns and find that PC₁ is essentially the market (everything moves together). PC₂ might be "tech vs banks." Modern factor-investing strategies bet on combinations of these PCs, and risk models are built on top of them.

Visualizing LLM embeddings

Almost every t-SNE or UMAP plot of language-model embeddings starts by reducing 1024- or 4096-dim vectors to ~50 dims via PCA, then refining that further with the nonlinear method. PCA does the heavy lifting; t-SNE just polishes.