Machine Learning, Visualized · Vol. XXV

A Geometry of
Meaning

Words become vectors. Similar meanings cluster; analogies become arithmetic. The single trick that turns text into something a neural network can multiply.

The concept

An embedding assigns each word (or token, or item) a vector in some high-dimensional space. The geometry encodes meaning.

The famous result from Word2Vec (2013): vector arithmetic captures semantics. king − man + woman ≈ queen. Paris − France + Italy ≈ Rome. The directions in embedding space correspond to abstract relationships — gender, tense, country/capital — that nobody explicitly programmed.

Modern embeddings (BERT, OpenAI embeddings, Cohere embeddings) live in 768- to 4096-dimensional spaces. The toy below shows a hand-tuned 2D version where you can see analogies as actual arrows.

Why ML cares

Embeddings are how text — discrete, symbolic — becomes geometric, continuous, and differentiable. Every transformer's first layer is an embedding lookup. Every retrieval system (RAG, semantic search) compares query and document embeddings by cosine similarity.

Beyond text, the same idea generalizes: face embeddings (FaceNet), product embeddings (Amazon's recommender), molecule embeddings (drug discovery), node embeddings (graph ML). Wherever you have items + a notion of "similar," embeddings are the common currency.

Try this

Pick the king − man + woman preset. The faint arrow from man → king is parallel to the bright arrow from woman → queen. That parallelism is the "gender" direction in embedding space — a literal vector you can add to any other word.
Pick the Paris − France + Italy preset. Same parallelogram shape, different content — country→capital is its own direction. Same trick generalizes to verb tense, comparatives, plurals.
Open Build your own · A − B + C. Click three words on the canvas to set A, B, C. The result vector snaps to the closest word — try it on any cluster.
Hover any word to see its 5 nearest neighbors (by cosine similarity). Words with related meanings cluster; the geometric closeness is the semantic similarity.

Before this

Before learned embeddings (Word2Vec 2013, GloVe 2014), words were one-hot vectors — orthogonal, ignorant of meaning. King and queen were as related as king and asphalt. Embeddings packed semantic similarity into geometry: similar words near each other, analogies as vector arithmetic. The conceptual leap that made modern NLP possible.

Embedding space

Words go through a lookup table: each word has a learned vector. The geometry between vectors encodes semantic relationships nobody hand-coded.

Symbol gloss

embedding: a vector — a list of numbers — assigned to each word.
cosine similarity: cos(angle between two vectors) — 1 = parallel (very similar), 0 = perpendicular (unrelated), −1 = opposite.
vector arithmetic: just add and subtract the vectors as lists of numbers; the result is another point in the same space.

Word2Vec

Train a network to predict a word from its surrounding context (or vice versa). The hidden layer's weights become the word vectors. Mikolov's 2013 paper showed this simple objective gives semantic geometry for free.

Modern embeddings

Today's text embeddings come from large pre-trained transformers — OpenAI's text-embedding-3, Cohere, Voyage, etc. Same geometry idea, just trained on much more data with much bigger models.

Where you've seen this04 examples

Retrieval-augmented generation (RAG)

Every "chat with your PDF" tool embeds chunks of the document, embeds the query, and retrieves the chunks closest to the query in embedding space. The LLM then answers using the retrieved chunks as context.

Semantic search

Google, Bing, and most modern search engines now use embedding-based retrieval alongside keyword matching. A query for "fix my car" matches docs about "auto repair" because the vectors are close.

Face recognition

FaceNet embeds each face into a 128-d vector such that vectors of the same person are close, different people far. Phone unlock, photo album grouping, and surveillance all use this geometry.

Amazon & Spotify recommendations

Product embeddings (item2vec) and song embeddings (sequence-based) live in spaces where "things bought together" are close. Recommendation = nearest-neighbor lookup in embedding space.