jthomas.site// notebook · v.4.2026
Machine Learning, Visualized · Vol. XXXIII

Model
Interpretability

A modern model is a billion-parameter blackbox. SHAP and LIME crack it open just enough to answer the regulator's, the doctor's, and the user's question: which features pushed this particular prediction which way?

The concept

For a single prediction, ask: "if I remove this feature, what happens?" That difference, averaged over every possible combination of the other features, is each feature's SHAP value — its fair share of the prediction.

SHAP values are additive: start from the model's baseline (its average prediction across the dataset), add each feature's contribution, and you land exactly on the model's output for this case. It's a literal accounting trail.

LIME takes a different angle: fit a simple, interpretable model (a small linear regression) to the local neighborhood of one prediction. The simple model's coefficients are the explanation.

Why ML cares

Regulated industries — credit scoring (Equal Credit Opportunity Act), insurance, healthcare, hiring — require per-decision explanations. SHAP became the de facto standard because it satisfies a clean axiomatic foundation while being computationally feasible.

Even outside regulated settings, interpretability matters for debugging ("why did my fraud model flag this?"), trust ("why does the model recommend this treatment?"), and discovering bias ("does our hiring model rely on zip code?"). Every deployed ML team eventually needs one of these tools.

Try this
  1. Pick a sample case. The waterfall starts at the baseline E[f(x)] (model's average) and walks step-by-step through each feature's SHAP value, ending at this case's actual prediction. Bars are sorted by magnitude — biggest pushers at the top.
  2. Switch to What-if. Click any feature toggle to "remove" it (replace with its average) and watch the prediction shift. The size of that shift is roughly that feature's SHAP value — a tiny taste of how Shapley values are actually computed.
  3. Switch to LIME bars. Same case, different method: a local linear surrogate's coefficients. Magnitudes line up with SHAP for clear-cut features but can diverge for interacting ones.
Plain-English math
  • f(x) · the model's prediction for this case (e.g., 0.83 = 83% fraud probability).
  • E[f(x)] · the model's average prediction over the entire dataset — the "baseline" or starting point of the waterfall.
  • φi (phi-i) · the SHAP value for feature i in this case. Positive = pushed the prediction up; negative = pushed it down.
  • The accounting: f(x) = E[f(x)] + Σ φi. The baseline plus the sum of all feature contributions equals the prediction, exactly.
  • The intuition: imagine asking "if I drop feature i, what happens?" SHAP averages that question across every possible coalition of other features. The answer is feature i's fair share.
· SHAP waterfall: start from the model's average prediction (left). Each feature is a step up (oxblood) or down (ink) based on its contribution to this sample's prediction. The final stop on the right is the model's actual output. What-if: turn features off in the left panel; the waterfall recomputes against the remaining set. LIME bars: coefficients of a local linear approximation around the same point.
Where you've seen this04 examples
Credit scoring (FCRA / ECOA)

US law requires lenders to give "specific reasons" for adverse credit decisions. SHAP attributions are the standard method for generating these reason codes — automated, audit-able, and per-application.

Medical decision support

"This model says 78% chance of sepsis — but why?" SHAP-explained alerts let clinicians sanity-check and override the model when the explanation looks wrong.

Fraud detection

When a payment is flagged, the analyst's queue shows the top SHAP-attributed features — was it the merchant, the amount, the IP, or the velocity? Cuts investigation time dramatically.

Bias auditing

Aggregate SHAP attributions across protected subgroups (race, gender) reveal whether a model is leaning on proxies for disparate treatment. Now standard in fair-lending and HR audits.

Further reading