SHAP example
Here we show how one could use the shap python library and the TreeExplainer object to calculate the shap values for the LGBMOrdinal model.
We note that the LGBMOrdinal model is based off the LGBMRegressor object, which is not a classifier. This allows us to calculate only one shap value per prediction, however many classes there are to predict, making it much easier to explain the contribution of each feature to the prediction as well as the direction of the contribution.
import shap
from shap import TreeExplainer
from sklearn.metrics import ConfusionMatrixDisplay
from sklearn.model_selection import train_test_split
from ordinalgbt.data import make_ordinal_classification
from ordinalgbt.lgb import LGBMOrdinal
/home/docs/checkouts/readthedocs.org/user_builds/ordinalgbt/envs/latest/lib/python3.9/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
from .autonotebook import tqdm as notebook_tqdm
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
Cell In[1], line 6
3 from sklearn.metrics import ConfusionMatrixDisplay
4 from sklearn.model_selection import train_test_split
----> 6 from ordinalgbt.data import make_ordinal_classification
7 from ordinalgbt.lgb import LGBMOrdinal
ModuleNotFoundError: No module named 'ordinalgbt'
Training a model
X,y = make_ordinal_classification(n_classes=4,n_samples=1000, n_features = 100, n_informative =10,noise=2,
random_state=42)
X_train,X_test, y_train, y_test = train_test_split(X,y,train_size=0.8)
model = LGBMOrdinal()
model.fit(X_train, y_train)
LGBMOrdinal(objective=<function LGBMOrdinal._lgb_loss_factory.<locals>.loss at 0x143ea2040>)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
LGBMOrdinal(objective=<function LGBMOrdinal._lgb_loss_factory.<locals>.loss at 0x143ea2040>)
ConfusionMatrixDisplay.from_predictions(y_test, model.predict(X_test))
<sklearn.metrics._plot.confusion_matrix.ConfusionMatrixDisplay at 0x143d28f40>
Calculating the shap values
explainer = TreeExplainer(model, model_output='raw')
shap_values = explainer.shap_values(X_test)
shap.summary_plot(shap_values,X_test)
shap_values = explainer(X_test)
shap.plots.bar(shap_values)