Machine Learning, Visualized · Vol. VIII

The
Perceptron

One neuron. Two weights and a bias. It draws a single straight line through the plane and calls one side "yes," the other "no." Crude — but it learns.

The concept

A perceptron is the simplest neural network: it computes w·x + b, then outputs 1 if that's positive, 0 otherwise.

Read the formula slowly: w·x is a dot product — multiply each weight by the matching input feature, then sum (w₁·x₁ + w₂·x₂). Add the bias b. If the result is positive, output 1; otherwise 0. Geometrically, w·x + b = 0 traces a line (a hyperplane in higher dimensions): the two weights set the line's tilt, the bias slides it along.

The learning rule is brutally simple: when the perceptron is wrong on a point, nudge the weights toward correctness. Repeat until everything is classified — provably converges, if the data is linearly separable.

Why ML cares

Every neuron in a modern neural network is, at its core, a perceptron with a smoother activation. Stack them in layers, train with backpropagation instead of the perceptron rule, and you get GPT and Gemini and image classifiers.

The perceptron's failure on XOR — the historical failure that froze AI for a decade — is also why deep networks were invented. Stack two layers and the wall falls.

Try this

Pick the diagonal dataset and hit Train. Watch the line rotate as the perceptron rule fixes one mistake at a time. After a few seconds it lands on a separator.
Switch to xor. Try training. Accuracy never gets above ~75% — XOR isn't linearly separable, so a single line can't do it. This is the wall that broke 1960s AI.
Drag w₁ and w₂ manually. Watch the dark arrow (the weight vector) rotate; the decision line is always perpendicular to it.

· The accent line is where w·x + b = 0. One side says "1," the other "0." Training nudges the line until it splits the classes — if it can. Misclassified points are circled.

Before this

Before Rosenblatt (1957), classification was hand-coded rules — pick features, set thresholds, hope. The perceptron showed a machine could learn where to draw the line from labeled examples. Then Minsky and Papert (1969) showed a single perceptron couldn't do XOR — and progress stalled for a decade until the multi-layer rescue.

In the margin

ŷ = step(w·x + b)

The simplest neural network: a weighted sum, then a threshold.

Rosenblatt's 1958 learning rule: when wrong, nudge weights in the direction of the input, by the amount of the error.

Symbols, plainly

w·x = dot product — pair up entries, multiply, sum. w = weight vector (the dark arrow on the canvas). b = bias, a single number that shifts the line. step(z) = 1 if z > 0, else 0. η (eta) = learning rate, the step size of each weight nudge. The arrow is drawn perpendicular to the line because w is the line's normal: the dot product is largest along w, zero across it.

Reading the picture

Filled orange dot = class 1; dark dot = class 0. Open ring around a dot = currently misclassified. The shaded oxblood half-plane is the region the perceptron currently labels "1." The fading ghost lines are the previous decision boundaries — its trail through training.

The wall

Minsky & Papert (1969) proved one perceptron can't learn XOR. Stack them in layers and the wall falls — but you need backpropagation to train them.

Where you've seen this 04 examples

Logistic regression

Replace the hard step with a smooth sigmoid and you have logistic regression — still the workhorse classifier in fraud detection, click-through prediction, and medical risk scores. Same geometry: a hyperplane separating two classes.

Single-neuron output layers

The last layer of a binary classifier — "is this email spam?" "does this CT scan show a tumor?" — is essentially a perceptron operating on features extracted by deeper layers. Modern networks compose many of these.

Spam filters and ad ranking

Linear classifiers powered web-scale filtering through the 2000s. Their interpretability — every weight is "how much does this feature pull toward spam?" — is still why they appear in audit-sensitive ML pipelines.

Geometric ML

The perceptron's view of classification — find the right hyperplane — is the foundation of support-vector machines, max-margin classifiers, and even modern contrastive-learning objectives.