A. Interpretability and Explainability


Defining Explainability

In the context of machine learning models, explainability refers to the ability to understand and interpret the reasons behind a model's predictions.
We can more or less define an "explainable" model as one that can give you are reasonable estimate to the following questions:
Importance Measures - What features are explaining the changes in the predicted outcomes most? Explanation Methods - How would individual feature adjustements change the predicted outcome?
Plain Text
Some problems are just too complex to explain, e.g, 20-layer neural network with 4,000 features.
It's exactly their intractability to our brains that makes them ideal for equation-generating algorithms to solve.

Unintuitive Signals

It is for that reason that some firms don’t care too much about explainability, anecdotally, firms like RenTech sometimes have no idea why their models are doing what they are doing.
“In fact, the firm’s executives often joked that they didn’t understand half the things their models were doing.”
"By 1997, though, more than half of the trading signals Simons's team was discovering were nonintuitive, or those they couldn't fully understand.”

Importance Measures

Interpretable Models

For interpretable models you know the exact contribution of every feature to the final output. For uninterpretable models you only have an estimate of each feature’s contribution to the final output.
Interpretable models (white-box) are inherently explainable, we don’t need to use methods like Permutation or Shapley value calculation to identify the feature effects.
Uninterpretable (black-box) models are not interpretable by nature, as such we need to use explainability methods like calculating Permutation importance or Shapley values.
Explainability methods seeks to close the gap in understanding between white-box and black-box models.
No matter how many explainability methods you use, they will always be estimates, and will never give you the intrinsic explanations of say linear regression models, i.e., parameter coefficients.
A good performing model is a necessary criterion for trusting the explainability outcomes. Features that are deemed of low importance in a bad model might me very important in a good model.

Explanation Methods

Linear models are intrinsically interpretable, but perform poorly
Nonlinear models are powerful, but not intrinsically interpretable
Use approaches that make ML models interpretable (post hoc) (explainability)
Explanation methods allow us to use more powerful models (e.g., gradient boosting machines, neural networks) while still understanding how they work.
It helps to turn previous black-box models into grey-box models, so we have the benefit of model performance and model explainability.
Explainability methods are also sometimes called post-hoc interpretability because they don’t have intrinsic interpretability like linear regression models.
Intrinsic or post hoc? This criteria distinguishes whether interpretability is achieved by restricting the complexity of the machine learning model (intrinsic) or by applying explainability methods that analyze the model after training (post hoc)
Explainability methods can be applied to inherently interpretable models, e.g., permutation importance applied to linear regression models. (it just wouldn’t make sense as you are trading exact numbers for estimates)

Why explainability?

Model Performance Clarity
Understanding Success: Just as we analyze Buffett's strategies, we dissect model success to replicate and enhance it.
Example: Recognizing that specific volatility patterns precede market gains, guiding investment strategies.
Failure Analysis
Learning from Failure: We dissect model shortcomings during downturns to prevent repeat failures.
Example: Determining if unexpected transaction costs or unusual market conditions led to a strategy's underperformance.
Trust in Model Decisions
Building Model Trust: Demonstrating the model's logic earns trust for critical financial decisions.
Example: A model that uses well-understood factors like past returns to predict future performance builds investor confidence.
Data Collection Insights
Data Acquisition Focus: Invest in data that has proven value, informed by feature importance analysis.
Example: Increasing data collection on market volatility based features when it's shown to strongly influence model predictions.
Feature Selection Strategy
Optimizing Feature Selection: Exclude non-informative data to sharpen the model's focus and efficiency.
Example: Discarding general economic indicators that are note important to the asset being modeled.
Feature Engineering
Creative Feature Crafting: Use data insights to invent new predictive features from existing interactions.
Example: Creating a feature from the interaction between volatility and stock prices to predict market movements.
Empirical Discovery
Discovering Financial Insights: Apply model explanations to uncover new predictive financial patterns.
Example: Detecting that when both momentum and value are low, it might signal an upcoming increase in returns.

Explainability Types

Explainability methods have three criteria, are they (1) local or global, (2) model-specific or model-agnostic, and (3) numerical or visual.
Although not shown in the diagram above, all local methods can be turned into global methods. You can simply sum or average across the local methods to create a global method.
Local versus global: with local methods we calculate the contribution of every feature for every datapoint (row), with global methods we only have the aggregate importance value for every feature across the entire dataset.
Specific versus agnostic: specific model explainability values arise from some internal characteristic, e.g., only linear models have coefficients, only decision trees have features splits, whereas agnostic methods can be applied to any model.
Numerical versus visualization: some explanations are better communicated with visualizations, e.g., we can visualize how the output of a model changes with the progressive change of a feature value using Partial Dependence Plots.
It is my preference to work methods that unify local and global model-agnostic techniques, one method in particular called Shapley values come to mind.

Common Misconceptions

1. Explaining the model ≠ explaining the data

Model inspection only tells you about the model.
The model might not accurately reflect the data.
If the model performance is good, then it might better reflect the data relationships.
A model with 50% (or random) accuracy is not going to say anything useful about the data.

2. More Explainable + More Interpretable ≠ Better Decisions

You could be using the most interpretable and explainable models.
It doesn’t mean the performance of the model is great.
Nor does it mean that you are making the correct predictions.
Neither does it guarantee the robustness of the relationships over time.
The use of an interpretable linear regression doesn’t gaurantee the right conclusions; you might not have collected adequate data nor take important non-linearities and interactions into account.

3. Explainable ≠ Understandable ≠ Trusted

A Random Forest model might be explainable, but do stakeholders understand it?
Even when an explainable model is understandable, it doesn’t mean that it can be trusted.
For robustness, you have to conduct additional tests beyond just developing an explainable model.
Perhaps you can explain how a random forest works, but it doesn’t mean that it will be trusted by stakeholders.
Explaining the Model vs. Explaining the Data
MI(M) ≠ DE(D)
MI(M) + High MP(M;D) → Better DE(D)
If A(M;D) = 50%, then AU(M→D) = Low
Explainability and Interpretability vs. Decision Quality
EI(M) + High Interpretability(M) ≠ High Quality Decisions(D)
EI(M) + High Interpretability(M) ≠ Correct Predictions(P)
EI(M) + High Interpretability(M) ≠ Robustness(R)
Explainability vs. Understandability vs. Trust
Explainable(M) ≠ Understandable(M by S)
Understandable(M by S) ≠ Trusted(M by S)
Explainable(M) + Understandability(M by S) + Robustness Tests(M) → Trusted(M by S)
M = Model
D = Data
P = Predictions
R = Relationship Robustness
S = Stakeholders
MI = Model Inspection
DE = Data Explanation
MP = Model Performance
A = Accuracy
AU = Accuracy Usefulness
EI = Explainability and Interpretability