Machine Learning, Visualized · Vol. XXXIII

Model
Interpretability

A modern model is a billion-parameter blackbox. SHAP and LIME crack it open just enough to answer the regulator's, the doctor's, and the user's question: which features pushed this particular prediction which way?

The concept

For a single prediction, ask: "if I remove this feature, what happens?" That difference, averaged over every possible combination of the other features, is each feature's SHAP value — its fair share of the prediction.

SHAP values are additive: start from the model's baseline (its average prediction across the dataset), add each feature's contribution, and you land exactly on the model's output for this case. It's a literal accounting trail.

LIME takes a different angle: fit a simple, interpretable model (a small linear regression) to the local neighborhood of one prediction. The simple model's coefficients are the explanation.

Why ML cares

Regulated industries — credit scoring (Equal Credit Opportunity Act), insurance, healthcare, hiring — require per-decision explanations. SHAP became the de facto standard because it satisfies a clean axiomatic foundation while being computationally feasible.

Even outside regulated settings, interpretability matters for debugging ("why did my fraud model flag this?"), trust ("why does the model recommend this treatment?"), and discovering bias ("does our hiring model rely on zip code?"). Every deployed ML team eventually needs one of these tools.

Try this

Pick a sample case. The waterfall starts at the baseline E[f(x)] (model's average) and walks step-by-step through each feature's SHAP value, ending at this case's actual prediction. Bars are sorted by magnitude — biggest pushers at the top.
Switch to What-if. Click any feature toggle to "remove" it (replace with its average) and watch the prediction shift. The size of that shift is roughly that feature's SHAP value — a tiny taste of how Shapley values are actually computed.
Switch to LIME bars. Same case, different method: a local linear surrogate's coefficients. Magnitudes line up with SHAP for clear-cut features but can diverge for interacting ones.

Plain-English math

f(x) · the model's prediction for this case (e.g., 0.83 = 83% fraud probability).
E[f(x)] · the model's average prediction over the entire dataset — the "baseline" or starting point of the waterfall.
φ_i (phi-i) · the SHAP value for feature i in this case. Positive = pushed the prediction up; negative = pushed it down.
The accounting: f(x) = E[f(x)] + Σ φ_i. The baseline plus the sum of all feature contributions equals the prediction, exactly.
The intuition: imagine asking "if I drop feature i, what happens?" SHAP averages that question across every possible coalition of other features. The answer is feature i's fair share.

· SHAP waterfall: start from the model's average prediction (left). Each feature is a step up (oxblood) or down (ink) based on its contribution to this sample's prediction. The final stop on the right is the model's actual output. What-if: turn features off in the left panel; the waterfall recomputes against the remaining set. LIME bars: coefficients of a local linear approximation around the same point.

Before this

Before SHAP (Lundberg & Lee 2017) and LIME (Ribeiro 2016), explaining a model's prediction was hand-waving. As ML moved into high-stakes domains — credit, medicine, criminal justice — regulators demanded per-decision reasons. SHAP grounded attribution in cooperative game theory: each feature's "fair share" of credit. It's now built into XGBoost, scikit-learn, and AWS SageMaker; the waterfall on the right is the standard plot.

A prediction, told as a story

A SHAP explanation reads top-to-bottom like a story: start with the model's average, then each feature is one chapter that pushes the prediction up or down. The bottom of the rail is exactly this case's prediction.

SHAP vs LIME, inline

SHAP: globally consistent (same feature contribution always means the same thing), axiomatically grounded. Slow on big feature sets — TreeSHAP and KernelSHAP are the practical approximations. LIME: fast, local, approximate. Fits a simple model in a small bubble around the prediction. Explanations can shift run-to-run with the random sampling.

Why "fair share"?

SHAP comes from cooperative game theory — the Shapley value (1953). Treat features as "players" cooperating to produce the prediction; the Shapley value is the unique payout rule that satisfies four reasonable fairness axioms. Skip the formula; the intuition is the substance.

Caveats

Both methods explain the model, not the world. If the model is biased, the explanation will be too — sometimes laundering bias into seemingly-objective numbers.

Where you've seen this04 examples

Credit scoring (FCRA / ECOA)

US law requires lenders to give "specific reasons" for adverse credit decisions. SHAP attributions are the standard method for generating these reason codes — automated, audit-able, and per-application.

Medical decision support

"This model says 78% chance of sepsis — but why?" SHAP-explained alerts let clinicians sanity-check and override the model when the explanation looks wrong.

Fraud detection

When a payment is flagged, the analyst's queue shows the top SHAP-attributed features — was it the merchant, the amount, the IP, or the velocity? Cuts investigation time dramatically.

Bias auditing

Aggregate SHAP attributions across protected subgroups (race, gender) reveal whether a model is leaning on proxies for disparate treatment. Now standard in fair-lending and HR audits.