Explainable Machine Learning

Machine Learning Artificial Intelligence Digital Transformation Reinforce Learning Intelligent information Probabilistic Generative Model Explainable Machine Learning Natural Language Processing Ontology Technology Navigation of this blog

Overview

Explainable Machine Learning (XAI) refers to mechanisms that provide reasons and justifications understandable to humans for the predictions and decisions made by machine learning models. In particular, with “black-box” models such as deep learning, it is often difficult to understand why a specific outcome was produced, which creates the need for techniques that enhance transparency and reliability.

Explainability is considered important from several perspectives:

First, in domains that directly affect human lives, such as medical diagnosis and credit assessment, predictions without explanation are difficult to accept, making improved trustworthiness indispensable.
Second, by identifying which features the model relies on, one can detect data leakage and biases, leading to model improvement and debugging.
Furthermore, explainability is essential from the standpoint of fairness and ethics, ensuring that specific races or genders are not unduly affected.
In addition, under the EU’s GDPR, individuals have the “right to explanation” for automated decision-making, making explainability a legal requirement as well.

There are two main approaches to achieving explainability:

Intrinsic interpretability, in which the model itself is simple and has a structure that is inherently easy for humans to understand.
Post-hoc explanation, in which explanations are added externally to complex black-box models such as deep learning and ensemble learning.

In this way, explainable machine learning enhances model transparency and reliability, and serves as a crucial foundation for effective collaboration between humans and AI.

Intrinsic Models and Post-hoc Models

As mentioned earlier, approaches to explainable machine learning can broadly be divided into two categories: intrinsically interpretable models and post-hoc explanations. Each is described below.

<Intrinsic Models>

These are models that are inherently simple and whose structures can be directly understood by humans.

For example, linear regression and logistic regression are based on mathematical formulas, and the coefficients represent the influence of each feature, making them intuitive to interpret. In logistic regression, the sign and magnitude of the coefficients indicate whether a feature pushes the prediction in a positive or negative direction.

Decision tree models are expressed in a rule-based format. For instance, a decision such as “If income is greater than 5 million yen and age is below 40, then approve the loan” can be directly read by humans from the branching structure. A drawback, however, is that if the tree becomes too deep, readability decreases.

Generalized Additive Models (GAMs) build models by summing the effects of each feature, allowing non-linear functions (such as splines) to be incorporated. As a result, they provide a good balance between flexibility and interpretability.

<Post-hoc Explanations>

This approach is used for models such as deep learning and ensemble methods, which are too complex to be directly understood. In these cases, explanations are added externally.

Feature importance is a representative method. In models such as Random Forests and XGBoost, metrics like Gini importance or permutation importance are used to measure which features most strongly influence the predictions. However, if there are strong correlations among features, interpretations may become distorted.

Surrogate models are another approach, where the inputs and outputs of a complex model are used to train a simpler model (for example, a decision tree) that approximates and explains the behavior of the black box.

Visualization methods are also important. In image recognition, for instance, saliency maps show which parts of an input image influenced the prediction, while Grad-CAM highlights attention regions of CNN convolutional layers as heatmaps.

In natural language processing, attention visualization in Transformer models makes it possible to understand which words are attending to which other words.

Furthermore, local explanations, which focus on individual predictions, are widely used. LIME generates synthetic data in the neighborhood of the target data point, passes them through the black-box model, and then approximates the local behavior with a simple linear model to clarify which features most strongly influenced that prediction. SHAP, on the other hand, applies the concept of Shapley values from game theory to fairly allocate the contribution of each feature to the prediction. The advantage of SHAP is that it can be used both for global explanations (overall feature importance) and local explanations (individual predictions), and that positive and negative contributions can be intuitively understood.

Here’s the English version of your implementation examples:

Implementation Examples

Below are examples of implementations for these approaches.

1. Intrinsically Interpretable Models (Intrinsic)

1-1. Linear Regression: Reading Influence through Coefficients

# Regression: coefficients indicate how much the target changes per unit increase in each feature
from sklearn.datasets import make_regression
from sklearn.linear_model import LinearRegression
import pandas as pd

X, y = make_regression(n_samples=300, n_features=5, noise=10, random_state=0)
model = LinearRegression().fit(X, y)

coef = pd.Series(model.coef_, index=[f"x{i}" for i in range(X.shape[1])]).sort_values(key=abs, ascending=False)
print("Intercept:", model.intercept_)
print("Coefficients:\n", coef)

1-2. Logistic Regression: Sign and Magnitude Indicate Contribution Direction and Strength

# Classification: coefficient sign (+/-) and magnitude indicate contribution toward class
from sklearn.datasets import load_breast_cancer
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
import pandas as pd

data = load_breast_cancer()
pipe = Pipeline([
    ("scaler", StandardScaler()),
    ("clf", LogisticRegression(max_iter=500))
]).fit(data.data, data.target)

coef = pd.Series(pipe.named_steps["clf"].coef_[0], index=data.feature_names).sort_values(key=abs, ascending=False)
print("Coefficients (positive -> contribution toward malignant class):\n", coef.head(10))

1-3. Decision Tree: Extracting if-then Rules

from sklearn.datasets import load_breast_cancer
from sklearn.tree import DecisionTreeClassifier, export_text

X, y = load_breast_cancer(return_X_y=True)
tree = DecisionTreeClassifier(max_depth=3, random_state=0).fit(X, y)

rules = export_text(tree, feature_names=load_breast_cancer().feature_names.tolist())
print(rules)  # Human-readable if-then branching rules

1-4. “GAM-like” Implementation (Spline + Additive Linear Model)

# Approximate GAM: apply spline transformation to each feature → additive model in scikit-learn
from sklearn.preprocessing import SplineTransformer
from sklearn.compose import ColumnTransformer
from sklearn.linear_model import Ridge
from sklearn.pipeline import Pipeline
from sklearn.datasets import make_regression

X, y = make_regression(n_samples=400, n_features=3, noise=8, random_state=42)
n_features = X.shape[1]

preproc = ColumnTransformer([
    (f"spline_{i}", SplineTransformer(degree=3, n_knots=6), [i]) for i in range(n_features)
])

gam_like = Pipeline([
    ("spline", preproc),
    ("ridge", Ridge(alpha=1.0))
]).fit(X, y)

print("R^2:", gam_like.score(X, y))  # Each feature effect curve can be visualized to show contribution

2. Post-hoc Explanations

2-1. Feature Importance (Gini & Permutation)

from sklearn.datasets import load_breast_cancer
from sklearn.ensemble import RandomForestClassifier
from sklearn.inspection import permutation_importance
import pandas as pd
import numpy as np

data = load_breast_cancer()
X, y = data.data, data.target

rf = RandomForestClassifier(n_estimators=300, random_state=0).fit(X, y)

# Gini Importance
gini_imp = pd.Series(rf.feature_importances_, index=data.feature_names).sort_values(ascending=False)
print("Gini Importance:\n", gini_imp.head(10))

# Permutation Importance (more robust to correlation, better interpretability)
pi = permutation_importance(rf, X, y, n_repeats=10, random_state=0)
perm_imp = pd.Series(pi.importances_mean, index=data.feature_names).sort_values(ascending=False)
print("\nPermutation Importance:\n", perm_imp.head(10))

2-2. Surrogate Model: Approximating a Black-box

# Example: XGBoost (black box) → approximate predictions with decision tree for explanation
import xgboost as xgb
from sklearn.tree import DecisionTreeRegressor, export_text
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0, stratify=y)

# Black-box (predicting probabilities here)
bst = xgb.XGBClassifier(n_estimators=300, max_depth=4, subsample=0.9, colsample_bytree=0.9, random_state=0)
bst.fit(X_train, y_train)
blackbox_pred = bst.predict_proba(X_train)[:, 1]

# Surrogate: train regression tree on black-box predictions
surrogate = DecisionTreeRegressor(max_depth=3, random_state=0).fit(X_train, blackbox_pred)
print(export_text(surrogate, feature_names=list(data.feature_names)))
print("Surrogate R^2 on train:", surrogate.score(X_train, blackbox_pred))

2-3. LIME (Local Explanation for a Single Instance)

# pip install lime
from lime.lime_tabular import LimeTabularExplainer
import numpy as np

explainer = LimeTabularExplainer(
    training_data=X_train,
    feature_names=data.feature_names,
    class_names=["benign","malignant"],
    discretize_continuous=True,
    mode="classification"
)

idx = 0  # Sample to explain (first in test set)
exp = explainer.explain_instance(
    data_row=X_test[idx],
    predict_fn=bst.predict_proba
)
print("LIME explanation for one sample:")
for feature, weight in exp.as_list()[:10]:
    print(f"{feature}: {weight:+.3f}")

2-4. SHAP (Global & Local Explanations)

# pip install shap
import shap
import numpy as np

explainer = shap.TreeExplainer(bst)
shap_values = explainer.shap_values(X_test)

# Local explanation for one instance (force_plot available in Jupyter)
sample_i = 0
print("Base value (expected output):", explainer.expected_value)
contrib = shap_values[sample_i]
top_idx = np.argsort(np.abs(contrib))[-10:][::-1]
for i in top_idx:
    print(f"{data.feature_names[i]}: {contrib[i]:+.4f}")

# Global importance (mean absolute SHAP value)
mean_abs = np.mean(np.abs(shap_values), axis=0)
glob_imp = pd.Series(mean_abs, index=data.feature_names).sort_values(ascending=False)
print("\nGlobal importance by mean |SHAP|:\n", glob_imp.head(10))

2-5. Partial Dependence Plots (PDP) – Average Effect Visualization

# Shows how the model output changes on average as a feature increases
from sklearn.inspection import PartialDependenceDisplay
import matplotlib.pyplot as plt

features = [0, 3]  # Example: first and fourth features
disp = PartialDependenceDisplay.from_estimator(bst, X, features=features, feature_names=data.feature_names)
plt.show()

3. (Reference) Visualizing Image Models with Grad-CAM (Minimal Example)

# Minimal Grad-CAM implementation for CNNs (highlight attention areas for classification results)
# pip install torch torchvision pillow
import torch, torchvision
import torch.nn.functional as F
from PIL import Image
from torchvision import transforms

# Model and target layer (last convolutional layer)
model = torchvision.models.resnet18(pretrained=True).eval()
target_layer = model.layer4[-1].conv2  # e.g., last block convolution

# Input preprocessing
preprocess = transforms.Compose([
    transforms.Resize((224,224)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485,0.456,0.406], std=[0.229,0.224,0.225]),
])
img = Image.open("your_image.jpg").convert("RGB")
x = preprocess(img).unsqueeze(0)

# Hooks to capture feature maps & gradients
feats = []
grads = []
def fwd_hook(m, i, o): feats.append(o.detach())
def bwd_hook(m, gi, go): grads.append(go[0].detach())
h1 = target_layer.register_forward_hook(fwd_hook)
h2 = target_layer.register_backward_hook(bwd_hook)

# Prediction & backpropagation on target class
logits = model(x)
cls = logits.argmax(dim=1)
model.zero_grad()
logits[0, cls].backward()

A = feats[0][0]           # [C,H,W]
G = grads[0][0]           # [C,H,W]
w = G.mean(dim=(1,2))     # [C] → weights
cam = (w[:,None,None] * A).sum(0)  # [H,W]
cam = F.relu(cam)
cam = (cam - cam.min()) / (cam.max() - cam.min() + 1e-6)

# Enlarge cam to 224x224 and overlay on original image (code omitted)
h1.remove(); h2.remove()

Application Cases

1. Medical Image Diagnosis (Pneumonia Detection in Chest X-rays)

Objective: Determine the presence or absence of lesions and assist radiological interpretation
Model: ResNet-based CNN
Explanation: Grad-CAM (heatmap of regions contributing to lesion detection)
Insights: Consistent strong responses near the tracheal bifurcation and lower right lung field → false positives caused by heart shadows or medical devices
Decision-making: Introduced into radiology double-check workflow (use heatmap threshold to assign review priority)

2. Credit Scoring (Loan Risk Assessment)

Objective: Estimate default risk and ensure accountability
Model: XGBoost / LightGBM
Explanation: SHAP (individual applicant contributions, global feature importance ranking)
Insights: Debt ratio, delinquency count, and years of employment are main factors. No excessive contribution from specific demographic attributes, but overlapping effects of correlated derived features detected
Decision-making: For borderline applicants, attach SHAP-based explanations (e.g., repayment plan adjustments or credit limit changes) to internal reviews

3. Churn Prediction (Subscription / Telecom / Video Streaming)

Objective: Enable proactive retention of high-risk customers
Model: CatBoost (robust to categorical variables and missing values)
Explanation: Permutation Importance + PDP (partial dependence) + LIME (personalized textual explanations)
Insights: Strong signals from decline in usage time over past 4 weeks, sudden bill increases, and unresolved customer service tickets
Decision-making: PDP shows that increasing usage from 2h → 4h per week lowers churn probability by 7pt. LIME extracts individual drivers, enabling automated delivery of personalized coupons/content offers

4. Manufacturing Anomaly Detection (Quality Control)

Objective: Predict defect occurrence from multivariate sensor time series
Model: Isolation Forest / XGBoost (features include statistical and frequency domain signals)
Explanation: SHAP (top contributing sensors for anomaly detection), PDP (safe parameter ranges)
Insights: Large deviations in heating zone 3 temperature combined with conveyor rotation fluctuations sharply increase defect rates
Decision-making: Redefined control logic thresholds as functions of top SHAP features; updated auto-shutdown conditions accordingly

5. Hiring Screening Bias Audit

Objective: Verify fairness of automated resume screening scores
Model: Logistic Regression (starting with Intrinsic for transparency)
Explanation: Coefficient signs/magnitudes, group-specific AUC, counterfactual tests (e.g., hold gender constant while changing other attributes)
Insights: University name acted as an overvalued proxy feature; actual explanatory power came from interaction of years of experience × job type
Decision-making: Removed proxy features, visualized experience curves in a GAM-like form, and standardized HR approval rules with interpretable criteria

6. Recommendation Explanations (E-commerce)

Objective: Increase customer trust in product suggestions and improve conversion rates
Model: Dual-tower model + gradient boosting reranker
Explanation: SHAP (contribution of reranker), case-based “why this item suits you” reasons (price range, co-viewing behavior, size compatibility)
Insights: Recommendations are formed by interplay of recent browsing, seasonal factors, and price tolerance. Displaying explanations in UI improved CTR by +5–8%
Decision-making: Display top 3 factors as badges (e.g., “Matches past purchases,” “Size in stock,” “Discount this week”)

7. Call Center Auto-Summarization and Root Cause Analysis (NLP)

Objective: Summarize inquiries and visualize dissatisfaction drivers
Model: Transformer-based summarization + topic modeling (BERTopic)
Explanation: Attention visualization (sentences influencing summary), SHAP (key phrases driving topic assignment)
Insights: Main complaint drivers are delivery delays × stockouts; ambiguous wording in FAQ exacerbates confusion
Decision-making: Revised FAQ and strengthened stock alert integration; daily dashboard review of summaries and root causes

8. Community Analysis (Directly Related to Your Domain)

Objective: Explain drivers of “enthusiasm score” and thread diffusion to inform activation strategies
Model: XGBoost (enthusiasm score regression / diffusion probability classification)
Explanation:
- SHAP (global/local): which indicators (posting frequency, reply network centrality, positive/negative emotions, topic novelty) contribute
- PDP: thresholds where growth slows (e.g., activity frequency, closeness centrality)
- LIME: case-specific explanations of why a user/thread scored highly
Insights:
- “Posting at times when replies are likely × distance to central users” is a main driver of enthusiasm
- Negative emotions drive short-term diffusion but harm long-term retention (confirmed by PDP diminishing returns)
Decision-making:
- Align event announcements with times central users are active
- Provide newcomers with templates/tags that facilitate getting replies
- Dashboard highlights “actionable levers” sorted by SHAP contribution

9. Demand Forecasting and Price Optimization (Retail)

Objective: Weekly demand forecasting and visualization of price elasticity
Model: Gradient Boosting / Prophet + XGBoost
Explanation: SHAP (promotion/weather/price contributions), ICE/PDP (demand curves under price changes)
Insights: Discounts less effective in rainy weather; shelf space expansion more impactful before holidays
Decision-making: Back-calculated weekly discount levels using PDP, simultaneously optimized inventory allocation

Practical Tips (Common Across Domains)

Design: Start with Intrinsic (regression/decision tree/GAM) → add black-box + post-hoc methods only where accuracy shortfall remains
Visualization: For executives, present Top-N SHAP + PDP as a single “control panel.” For frontline staff, provide LIME-based textual explanations
Validation: Use permutation importance to handle highly correlated features; re-train with time-series splits to check for leakage
Deployment: Expose explanation outputs via API and display them directly in dashboards (cases/user details)

Reference Books

Covering the Basics to Implementation

Interpretable Machine Learning – Christoph Molnar (2019, online & print)
One of the most frequently cited introductory books. Explains major XAI methods such as LIME, SHAP, PDP, and ICE with practical examples
Explainable AI: Interpreting, Explaining and Visualizing Deep Learning – Ankur Taly, Wojciech Samek, Klaus-Robert Müller (eds., Springer, 2019)
A specialized volume focusing on deep learning. Covers techniques such as Grad-CAM and Layer-wise Relevance Propagation (LRP).

Application- and Case Study-Oriented

Explainable AI in Healthcare and Medicine – Arjun Panesar (Apress, 2022)
Focused on healthcare applications. Discusses why explainability is needed and introduces real-world challenges of clinical adoption through case studies.
Responsible Machine Learning with Interpretability and Explainability – Patrick Hall, Navdeep Gill, Benjamin Cox (O’Reilly, 2021)
A practical guide including bias detection and regulatory compliance (e.g., GDPR). Well-suited for enterprise-level XAI implementation.

Theoretical Background and Ethical Aspects

Interpretable AI: Building Explainable and Transparent Machine Learning Models – Ajay Thampi (Packt, 2022)
Combines Python implementations with ethical and societal considerations. Accessible and useful for educational settings.
The Ethical Algorithm: The Science of Socially Aware Algorithm Design – Michael Kearns & Aaron Roth (Oxford University Press, 2019)
Not strictly an XAI book, but an excellent resource addressing explainability, fairness, and privacy in algorithms from a mathematical perspective.

More Specialized Resources

Explainable AI: Foundations, Methodologies and Applications – Uday Kamath, John Liu, James Whitaker (Springer, 2020)
Comprehensive coverage from foundations to applications. Includes SHAP, LIME, and the latest research trends in XAI.
Transparent Data Mining for Big and Small Data – Tania Cerquitelli, Daniele Quercia, Frank Pasquale (Springer, 2017)
A pioneering work connecting data mining with explainability.