Shapley Value
The Shapley value is a concept proposed in 1953 by Lloyd Shapley. It is a theory from cooperative game theory that provides a fair method for determining how much reward each player in a cooperative setting should receive.
This idea has also been applied in the field of machine learning, where each feature is considered a player, and the change in prediction score or profit is viewed as the reward. Using this framework, one can quantitatively evaluate how much each feature has contributed to the final prediction result.
For the Shapley value to function as a fair evaluation of contribution, it must satisfy the following four axioms(conditions):
-
Efficiency: The sum of the contributions (Shapley values) of all features must equal the model’s final prediction. In other words, there should be no “extra” contribution unassigned to any feature.
-
Symmetry: If two features contribute equally to all predictions, they should be assigned the same Shapley value.
-
Dummy player: If a feature does not affect the prediction in any combination, its Shapley value must be zero.
-
Additivity: If two different models are combined, the Shapley value in the resulting model should be the sum of the Shapley values from each original model.
By satisfying these four axioms, the Shapley value is regarded as the only theoretically fair method for allocating contributions.
Calculation of Shapley Values
Shapley values follow the game-theoretic framework to evaluate how much each feature contributes to a model’s prediction on average.
Each feature is treated as a “player,” and the change in the prediction is regarded as a “reward.”
Specifically, the process involves the following:
-
Enumerate all possible permutations of the feature order. For
n
features, there aren!
(factorial ofn
) permutations. -
For each permutation, add features one by one and record the change in prediction at each step.
-
Compute the average marginal contribution of a feature across all permutations.
Mathematically, the Shapley value is given by:
ϕi=∑S⊆N∖{i}∣S∣!⋅(∣N∣−∣S∣−1)!∣N∣![v(S∪{i})−v(S)]
Where:
-
: Shapley value (contribution) of feature
-
: The set of all features
-
: A subset of that does not include feature
-
are used
: The prediction value when only features in -
Although this method is mathematically rigorous, the computation cost increases rapidly as the number of features increases. Therefore, approximation techniques, such as sampling in the SHAP library, are commonly used in practice.
Intuitive Example: Loan Approval Model
To intuitively understand Shapley values (as implemented in SHAP), let’s consider a loan approval model.
This model predicts whether an individual’s loan application will be approved, based on personal attributes such as:
-
“Income”
-
“Occupation”
-
“Age”
-
“Loan amount”
Assume the model predicts that an applicant should be approved. Using SHAP, we can break down the contribution of each feature to this prediction:
-
Income: +0.3 (High income positively contributed to approval)
-
Occupation: +0.2 (Stable job status helped approval)
-
Age: -0.1 (Slight negative effect from age)
-
Loan Amount: -0.4 (Large loan request reduced the approval likelihood)
Adding up these contributions:
+0.3 + 0.2 – 0.1 – 0.4 = 0.0
This total represents the net change from the model’s base prediction (e.g., a neutral score), showing that the final prediction can be completely explained by the contributions of the individual features.
Thus, SHAP enables us to decompose a black-box prediction into understandable, quantifiable contributions, which significantly improves the model’s transparency and trustworthiness.
Applications of Shapley Values in Machine Learning
One of the most well-known applications of Shapley values in machine learning is through the use of SHAP (SHapley Additive exPlanations). SHAP is a tool based on the theory of Shapley values that enables users to quantitatively and intuitively visualize and explain how each feature contributes to individual predictions.
This library offers strong compatibility with gradient boosting algorithms such as XGBoost and LightGBM, and also supports a wide range of models including neural networks, linear models, and support vector machines (SVMs).
By using SHAP, the following types of applications become possible:
-
Local interpretation: Visualizing the contribution of each feature to individual predictions
-
Global interpretation: Analyzing feature importance trends across the entire dataset
-
Model transparency: Verifying whether the model behaves as expected
-
Accountability: Providing understandable justifications in domains such as business and healthcare where decision explanations are required
Other Applications of Shapley Values in Machine Learning
In addition to SHAP, Shapley values have a wide range of other applications in machine learning, including:
-
Model comparison and selection
Shapley values can be used to compare how much different models rely on specific features. -
Anomaly detection
By detecting unusual patterns of feature contributions, Shapley values can help identify root causes of anomalies. -
Fairness analysis
Ensuring that sensitive attributes such as age, gender, or race do not have undue influence on model predictions. -
Causal inference
Recent research has extended Shapley values to explain which variables contribute most to the effect of an intervention. -
Reward allocation in reinforcement learning
Shapley values can be used to fairly distribute rewards among multiple agents based on their contributions to a shared outcome.
SHAP has become a core technology in the field of Explainable AI (XAI), offering a principled, interpretable framework to understand and trust complex machine learning models.
Implementation Example
Below is a Python code example and visualization approach for interpreting predictions of a loan approval model using SHAP.
1. Prerequisites: Install Required Libraries
Before running the code, make sure to install the necessary libraries:
pip install shap xgboost scikit-learn matplotlib
2. model training and explanation by SHAP
import shap
import xgboost as xgb
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_breast_cancer # Using sample dataset
import matplotlib.pyplot as plt
# Load dataset (loan data should be used in a real scenario)
X, y = load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
# Train XGBoost model
model = xgb.XGBClassifier(eval_metric='logloss', use_label_encoder=False)
model.fit(X_train, y_train)
# Compute SHAP values
explainer = shap.Explainer(model)
shap_values = explainer(X_test)
3. Visualization by SHAP
Explanation of a single forecast (force plot)
# Single prediction (e.g., first test data)
shap.initjs()
shap.force_plot(explainer.expected_value, shap_values[0].values, X_test[0])
→ Interactive graphs show which features raised or lowered predictions by color
Visualize the overall impact of features (summary plot)
shap.summary_plot(shap_values, X_test)
→ A summary graph in the form of a violin plot showing “which features had a strong effect” on the overall prediction.
Visualization of the relationship between the influence of each feature and its value (dependence plot)
# Example: Relationship between the value of feature index=0 (first feature) and the SHAP value
shap.dependence_plot(0, shap_values.values, X_test)
→ Can see how the value of a feature affects the prediction
Application Examples of Shapley Values (SHAP)
Below are specific examples of how Shapley values (via SHAP) are applied in real-world scenarios.
1. Banking: Loan Approval (Credit Scoring Model)
-
Challenge: AI must answer the question, “Why was this person denied a loan?”—an issue of explainability.
-
SHAP Application:
-
For each applicant, SHAP provides individualized explanations such as “The loan amount is too high” or “The credit score is too low.”
-
Loan officers can better understand the model’s reasoning, improving both accountability and fairness.
-
-
Adoption:
-
SHAP has been implemented by several financial institutions in Europe, the U.S., and Japan.
-
2. Healthcare: Disease Risk Prediction
-
Challenge: When machine learning models help prioritize patients for treatment, decisions must be explainable to be usable.
-
SHAP Application:
-
SHAP reveals how much each factor—such as blood pressure, glucose levels, age, or family history—contributed to a patient’s risk score.
-
Physicians can refer to SHAP outputs to guide diagnoses and treatment plans.
-
-
Examples in Research:
-
Widely used in models for breast cancer, diabetes, and cardiovascular disease risk.
-
3. Manufacturing: Predictive Maintenance for Equipment Failures
-
Challenge: Anomaly detection systems based on sensor data often can’t explain why a particular anomaly was detected.
-
SHAP Application:
-
SHAP identifies which sensor readings (temperature, vibration, pressure, etc.) contributed to the prediction of an impending failure.
-
This allows engineers to plan preventative maintenance more effectively.
-
4. Insurance: Fraudulent Claim Detection
-
Challenge: When a claim is flagged as fraudulent, insurers must explain the reasoning behind it.
-
SHAP Application:
-
Based on claim history, previous accidents, and contract details, SHAP highlights suspicious features that triggered the fraud flag.
-
Enables insurers to maintain explainability while automating fraud detection.
-
5. E-commerce: Purchase Prediction and Recommendation
-
Challenge: Increase user satisfaction and trust in recommendation systems.
-
SHAP Application:
-
SHAP can explain recommendations with statements like “This item was suggested because it matches your past purchase patterns.”
-
This boosts customer trust and helps reduce churn.
-
References
1. Theoretical Foundations – For Those Interested in Game Theory
-
A Value for n-Person Games
Author: Lloyd Shapley (1953, academic paper)
Overview: The original work that introduced the Shapley value. It mathematically derives the axioms for fair value allocation in cooperative games.
2. Practical Machine Learning – Focused on SHAP and Model Explainability
-
Interpretable Machine Learning
Author: Christoph Molnar
Overview: A practical and accessible guide to explainable AI techniques such as SHAP, LIME, and Partial Dependence Plots. Includes illustrations and Python code. -
Explainable AI: Interpreting, Explaining and Visualizing Deep Learning
Author: Ankur Taly et al.
Publisher: Springer
Overview: Covers SHAP and a range of frameworks for interpreting deep learning models. Discusses the theoretical and practical aspects of explainability.
3. Hands-On Implementation – For Learning with Python
-
Hands-On Explainable AI (XAI) with Python
Author: Denis Rothman
Publisher: Packt Publishing
Overview: A practical guide that walks through implementing multiple XAI techniques—including SHAP—using Python, aimed at engineers and practitioners.
コメント