Model
Interpretability
A modern model is a billion-parameter blackbox. SHAP and LIME crack it open just enough to answer the regulator's, the doctor's, and the user's question: which features pushed this particular prediction which way?
For a single prediction, ask: "if I remove this feature, what happens?" That difference, averaged over every possible combination of the other features, is each feature's SHAP value — its fair share of the prediction.
SHAP values are additive: start from the model's baseline (its average prediction across the dataset), add each feature's contribution, and you land exactly on the model's output for this case. It's a literal accounting trail.
LIME takes a different angle: fit a simple, interpretable model (a small linear regression) to the local neighborhood of one prediction. The simple model's coefficients are the explanation.
Regulated industries — credit scoring (Equal Credit Opportunity Act), insurance, healthcare, hiring — require per-decision explanations. SHAP became the de facto standard because it satisfies a clean axiomatic foundation while being computationally feasible.
Even outside regulated settings, interpretability matters for debugging ("why did my fraud model flag this?"), trust ("why does the model recommend this treatment?"), and discovering bias ("does our hiring model rely on zip code?"). Every deployed ML team eventually needs one of these tools.
- Pick a sample case. The waterfall starts at the baseline E[f(x)] (model's average) and walks step-by-step through each feature's SHAP value, ending at this case's actual prediction. Bars are sorted by magnitude — biggest pushers at the top.
- Switch to What-if. Click any feature toggle to "remove" it (replace with its average) and watch the prediction shift. The size of that shift is roughly that feature's SHAP value — a tiny taste of how Shapley values are actually computed.
- Switch to LIME bars. Same case, different method: a local linear surrogate's coefficients. Magnitudes line up with SHAP for clear-cut features but can diverge for interacting ones.
- f(x) · the model's prediction for this case (e.g., 0.83 = 83% fraud probability).
- E[f(x)] · the model's average prediction over the entire dataset — the "baseline" or starting point of the waterfall.
- φi (phi-i) · the SHAP value for feature i in this case. Positive = pushed the prediction up; negative = pushed it down.
- The accounting: f(x) = E[f(x)] + Σ φi. The baseline plus the sum of all feature contributions equals the prediction, exactly.
- The intuition: imagine asking "if I drop feature i, what happens?" SHAP averages that question across every possible coalition of other features. The answer is feature i's fair share.
US law requires lenders to give "specific reasons" for adverse credit decisions. SHAP attributions are the standard method for generating these reason codes — automated, audit-able, and per-application.
"This model says 78% chance of sepsis — but why?" SHAP-explained alerts let clinicians sanity-check and override the model when the explanation looks wrong.
When a payment is flagged, the analyst's queue shows the top SHAP-attributed features — was it the merchant, the amount, the IP, or the velocity? Cuts investigation time dramatically.
Aggregate SHAP attributions across protected subgroups (race, gender) reveal whether a model is leaning on proxies for disparate treatment. Now standard in fair-lending and HR audits.
- A Unified Approach to Interpreting Model Predictions (SHAP) paper Lundberg & Lee (2017) · The SHAP paper. Connects LIME, DeepLIFT, and several feature-attribution methods under the Shapley-value framework.
- "Why Should I Trust You?" Explaining Predictions of Any Classifier (LIME) paper Ribeiro, Singh, Guestrin (2016) · The original LIME paper. Local linear approximations as a model-agnostic explanation.
- Interpretable Machine Learning free book Christoph Molnar · The standard reference for interpretability methods. Covers SHAP, LIME, ICE plots, partial dependence, and more, with code.
- shap (Python library) library The reference implementation. Wide model support, beautiful plots, used in nearly every production interpretability stack.