The
Perceptron
One neuron. Two weights and a bias. It draws a single straight line through the plane and calls one side "yes," the other "no." Crude — but it learns.
A perceptron is the simplest neural network: it computes w·x + b, then outputs 1 if that's positive, 0 otherwise.
Read the formula slowly: w·x is a dot product — multiply each weight by the matching input feature, then sum (w₁·x₁ + w₂·x₂). Add the bias b. If the result is positive, output 1; otherwise 0. Geometrically, w·x + b = 0 traces a line (a hyperplane in higher dimensions): the two weights set the line's tilt, the bias slides it along.
The learning rule is brutally simple: when the perceptron is wrong on a point, nudge the weights toward correctness. Repeat until everything is classified — provably converges, if the data is linearly separable.
Every neuron in a modern neural network is, at its core, a perceptron with a smoother activation. Stack them in layers, train with backpropagation instead of the perceptron rule, and you get GPT and Gemini and image classifiers.
The perceptron's failure on XOR — the historical failure that froze AI for a decade — is also why deep networks were invented. Stack two layers and the wall falls.
- Pick the diagonal dataset and hit Train. Watch the line rotate as the perceptron rule fixes one mistake at a time. After a few seconds it lands on a separator.
- Switch to xor. Try training. Accuracy never gets above ~75% — XOR isn't linearly separable, so a single line can't do it. This is the wall that broke 1960s AI.
- Drag w₁ and w₂ manually. Watch the dark arrow (the weight vector) rotate; the decision line is always perpendicular to it.
Replace the hard step with a smooth sigmoid and you have logistic regression — still the workhorse classifier in fraud detection, click-through prediction, and medical risk scores. Same geometry: a hyperplane separating two classes.
The last layer of a binary classifier — "is this email spam?" "does this CT scan show a tumor?" — is essentially a perceptron operating on features extracted by deeper layers. Modern networks compose many of these.
Linear classifiers powered web-scale filtering through the 2000s. Their interpretability — every weight is "how much does this feature pull toward spam?" — is still why they appear in audit-sensitive ML pipelines.
The perceptron's view of classification — find the right hyperplane — is the foundation of support-vector machines, max-margin classifiers, and even modern contrastive-learning objectives.
- The Perceptron: A Probabilistic Model paper Frank Rosenblatt (1958) · The original. Surprisingly readable, and an interesting historical document — written a decade before the AI winter that the perceptron's limitations triggered.
- Perceptrons book (1969) Minsky & Papert · The book that proved single-layer perceptrons can't learn XOR — and was widely (mis)read as proof that neural networks were a dead end. Backpropagation revived the field 17 years later.
- Deep Learning · Chapter 6: Deep Feedforward Networks textbook Goodfellow, Bengio, Courville · The bridge from one perceptron to many. The XOR example appears explicitly as the motivation for hidden layers.
- TensorFlow Playground interactive Daniel Smilkov · A more elaborate sibling of the toy above, with multiple layers, multiple datasets, and live training animations. The natural next step after this page.