jthomas.site// notebook · v.4.2026
Machine Learning, Visualized · Vol. VIII

The
Perceptron

One neuron. Two weights and a bias. It draws a single straight line through the plane and calls one side "yes," the other "no." Crude — but it learns.

The concept

A perceptron is the simplest neural network: it computes w·x + b, then outputs 1 if that's positive, 0 otherwise.

Read the formula slowly: w·x is a dot product — multiply each weight by the matching input feature, then sum (w₁·x₁ + w₂·x₂). Add the bias b. If the result is positive, output 1; otherwise 0. Geometrically, w·x + b = 0 traces a line (a hyperplane in higher dimensions): the two weights set the line's tilt, the bias slides it along.

The learning rule is brutally simple: when the perceptron is wrong on a point, nudge the weights toward correctness. Repeat until everything is classified — provably converges, if the data is linearly separable.

Why ML cares

Every neuron in a modern neural network is, at its core, a perceptron with a smoother activation. Stack them in layers, train with backpropagation instead of the perceptron rule, and you get GPT and Gemini and image classifiers.

The perceptron's failure on XOR — the historical failure that froze AI for a decade — is also why deep networks were invented. Stack two layers and the wall falls.

Try this
  1. Pick the diagonal dataset and hit Train. Watch the line rotate as the perceptron rule fixes one mistake at a time. After a few seconds it lands on a separator.
  2. Switch to xor. Try training. Accuracy never gets above ~75% — XOR isn't linearly separable, so a single line can't do it. This is the wall that broke 1960s AI.
  3. Drag w₁ and w₂ manually. Watch the dark arrow (the weight vector) rotate; the decision line is always perpendicular to it.
· The accent line is where w·x + b = 0. One side says "1," the other "0." Training nudges the line until it splits the classes — if it can. Misclassified points are circled.
Where you've seen this 04 examples
Logistic regression

Replace the hard step with a smooth sigmoid and you have logistic regression — still the workhorse classifier in fraud detection, click-through prediction, and medical risk scores. Same geometry: a hyperplane separating two classes.

Single-neuron output layers

The last layer of a binary classifier — "is this email spam?" "does this CT scan show a tumor?" — is essentially a perceptron operating on features extracted by deeper layers. Modern networks compose many of these.

Spam filters and ad ranking

Linear classifiers powered web-scale filtering through the 2000s. Their interpretability — every weight is "how much does this feature pull toward spam?" — is still why they appear in audit-sensitive ML pipelines.

Geometric ML

The perceptron's view of classification — find the right hyperplane — is the foundation of support-vector machines, max-margin classifiers, and even modern contrastive-learning objectives.

Further reading