Machine Learning Interpretability

Regarding model interpretability, in addition to linear models and decision trees, which are inherently well interpretable, many models in sklean have the importance interface, and you can view the importance of features.

Classic global feature importance

using the plot_importance() method gives an attractively simple bar-chart representing the importance of each feature in our dataset

Weight. The number of times a feature is used to split the data across all trees.

xgboot.plot_importance(model, importance_type=”weight”)
Cover. The number of times a feature is used to split the data across all trees weighted by the number of training data points that go through those splits.

xgboot.plot_importance(model, importance_type=”cover”)
Gain. The average training loss reduction gained when using a feature for splitting.

xgboot.plot_importance(model, importance_type=”gain”)

Common importance evaluation methods

SHAP (SHapley Additive exPlanations)

A new individualized method that is both consistent and accurate. it shows not only importance of a feature but also show positiveness or negtiveness. below is an example to show SHAP value of room feature. The shap Python package makes SHAP easier. We first call shap.TreeExplainer(model).shap_values(X) to explain every prediction, then call shap.summary_plot(shap_values, X) to plot these explanations

explainer

deep works for deep learning model, based on DeepLIFT algos
gradient
kernel works for any model
linear Linear model with independent and uncorrelated features
tree

import shap shap.initjs() X,y = shap.datasets.boston() model = xgboost.train({“learning_rate”: 0.01}, xgboost.DMatrix(X, label=y), 100) #there are many explainer available, here use tree explainer for example explainer = shap.TreeExplainer(model) shap_values = explainer.shap_values(X)
sampling based on hypothesis that feature independence
kerne

#using kmeans X_train_summary = shap.kmeans(X_train, 10) t0 = time.time() explainerKNN = shap.KernelExplainer(knn.predict, X_train_summary) shap_values_KNN_train = explainerKNN.shap_values(X_train) shap_values_KNN_test = explainerKNN.shap_values(X_test) timeit=time.time()-t0 timeit

visualization

Local Interper

single prediction display contribution of each feature for one prediction

shap.force_plot(explainer.expected_value, shap_values[0,:], X.iloc[0,:])
multi predictions

shap.force_plot(explainer.expected_value, shap_values, X)

Global Interper

explanation of the overall structure of the model

summary_plot

#summarize the effects of all the features shap.summary_plot(shap_values, X)
Feature Importance

shap.summary_plot(shap_values, X, plot_type=”bar”)
Interaction Values

shap_interaction_values = explainer.shap_interaction_values(X) shap.summary_plot(shap_interaction_values, X)
dependence_plot

#create a SHAP dependence plot to show the effect of a single feature across the whole dataset
shap.dependence_plot(“RM”, shap_values, X)
permutation importance https://github.com/Qiuyan918/Permutation_Importance_Experiment

visualization of network

Networkx non interactive visualization of Graph

import networkx as nx G = nx.from_pandas_edgelist(df, source=’Source’, target=’Target’,edge_attr=’weight’)

the network can be visualized with following functions.

PyVis interactive visualization of network Graph

from pyvis.network import Network net=Network(notebook=True) net.from_nx(G) net.show(“test.html”)
visdcc in dash

Saabas.

An individualized heuristic feature attribution method.

mean(|Tree SHAP|).

A global attribution method based on the average magnitude of the individualized Tree SHAP attributions.

Gain.

The same method used above in XGBoost, and also equivalent to the Gini importance measure used in scikit-learn tree models.

Split count.

Represents both the closely related “weight” and “cover” methods in XGBoost, but is computed using the “weight” method.

Permutation.

The resulting drop in accuracy of the model when a single feature is randomly permuted in the test data set.

* train model on dataset A
* test model on dataset B, get MSE and loglos
* random permute one feature in dataset B, and test again. the difference of scores between 2 and 3 is permutation importance