jthomas.site// notebook · v.4.2026
Machine Learning, Visualized · Vol. XVI

An Image
Classifier, end to end

Pixels in, label out. Follow the signal through conv blocks, pooling, a flatten, and a final softmax — the whole assembly that turns a 32×32 grid of brightnesses into "this is a circle, 87% sure."

The concept

An image classifier is a pipeline: image → feature maps → pooled features → flattened vector → class probabilities.

Each conv block applies a few learned filters and a ReLU; pooling shrinks the spatial dimensions while keeping the strongest signals. After two or three blocks, the original 32×32 image becomes an 8×8 stack of high-level feature maps.

Those feature maps get flattened into a long vector and fed to a small MLP that produces logits — one per class. Softmax turns logits into probabilities. The class with the highest probability is the prediction.

Why ML cares

This is the simplest end-to-end architecture in computer vision. A scaled-up version of it (more layers, more channels, batch norm, residual connections) is what AlexNet, VGG, ResNet, and EfficientNet all are.

Even today, with vision transformers and diffusion models in the headlines, the convolutional classifier is the default reach for any "does this image contain X?" task that doesn't justify a billion-parameter model.

Try this
  1. Pick the circle input. Hit Replay and watch the signal propagate stage by stage. The probabilities at the end commit to one class.
  2. Try the cross input — the early conv outputs look very different, but the same architecture handles it. That's the network's invariance at work.
  3. Switch inputs at the end of a replay. The pipeline runs again with the new image; only the output probabilities change in real time.
LeNet-style architecture · shape annotations
Where you've seen this 04 examples
Photo organization apps

Apple Photos' "people," "places," and "categories" features run essentially this pipeline (much deeper, on millions of training images) on every photo in your library — locally on the device.

Plant disease detection

Apps like PlantNet and Pl@ntNet point a phone camera at a leaf and identify the species or disease. The model in the app: a CNN classifier trained on a few hundred thousand expert-labeled photos.

Industrial quality control

Manufacturing lines use vision classifiers to spot defects in microchips, glass bottles, paint finishes — anywhere a camera can see and a label exists. Faster and more consistent than human inspectors.

Captchas (sort of)

"Click on all the squares with traffic lights" trains exactly this kind of network in the background. Every solve adds another labeled image to a vast corpus that quietly powers Google's vision systems.

Further reading