InputIBA: Fine-Grained Neural Network Explanation by Identifying Input Features with Predictive Information
This repository is the official implementation of our paper accepted in NeurIPS 2021.
We propose an attribution method called InputIBA to have input-level explanation by leveraging a information-botleneck on latent layer and a GAN to fit distributions. For details of the method please refer to our paper. Other information can be found from the project's homepage.
The method results to fine-grained attribution map, which is directly optimized on the input, so the attribution has the resolution of input and can provide more details. From the example below, the generated attribution map is directly reflecting regions of interest for NN model's decision, and other similar features (like coins in the image) are ruled out.
Moreover, our method released some assumptions of the previous method, resulting to our method being model-agnostic. We demostrated this model-agnostic ability on both vision and NLP tasks, e.g. recurrent neural network and convolutional neural network.
Here is an example of attribution maps produced by various attribution methods. By inspection, we can see that the attribution map of our method is much more fine-grained than other explanation methods.
Another example of identifing informative tokens (words & symbols). Our method has highlighed important features, and the result is more interpretable to humans compare to other methods.
-
Install
torch
andtorchvision
(andtorchtext
for NLP tasks) following the official instructions of pytorch -
Install
mmcv
ormmcv-full
following the official instructions of mmcv.
Since our code only uses limited features from MMCV, a lite version can be simply installed withpip install mmcv
-
Install additional requirements with
pip install -r requirements.txt
. -
Install the package in develop mode:
python setup.py develop
.
We provide two jupyter notebooks for NLP and Computer Vision task under tutorials/
, the tutorial notebooks provide a interactive way for showing how to run
attribution with InputIBA on single sample.
Two jupyter notebooks are here for vision task and here for NLP task .
The below scripts works for batch generation of attribution.
-
Download ImageNet validation set. Format the sets to
torchvison.dataset.ImageFolder
style if necessary. Use this script to generate two small sets: estimation set and attribution set. The estimation set is for estimating the mean and standard deviation of hidden features, while the attribution set consists of images for the neural network to explain. Cop this json file to the dataset root. The dataset should have following structure:. |-- annotations | `-- attribution | | |-- n01440764 | | |-- n01443537 | | |-- n01484850 | | ... |-- imagenet_class_index.json `-- images |-- attribution | |-- n01440764 | |-- n01443537 | |-- n01484850 | ... `-- estimation | |-- n01440764 | |-- n01443537 | |-- n01484850 | ...
Note that the
annotations/
directory is only necessary for evaluating localization ability of attribution methods (the EHR metric proposed in the paper). One can modify line 35 in the config file towith_bbox=False
, if no bounding box annotations are available.We also provide a preprocessed small ImageNet dataset, which can be downloaded from this link
-
Create a directory under this repository:
mkdir data
, and link the imagenet data path todata/imagenet
:ln -s path/to/imagenet_data/ data/imagenet
. -
Create a directory to store the output files
mkdir workdirs
. -
Run training script with specified configuration file (e.g. vgg16_imagenet) to train the attributor:
python tools/vision/train.py \ configs/vgg_imagenet.py \ --work-dir workdirs/vgg_imagenet/ \ --gpu-id 0 \ --pbar
-
Check the results saved in
workdirs/vgg_imagenet/
:input_masks/
contains the final attribution maps, whilefeat_masks/
contains the attribution maps produced by the IB at feature map level (the original IBA)
-
We provide a pretrained multi-layer LSTM on IMDb dataset. Download the checkpoint file from this link.
-
mkdir pretrained
and move the downloaded checkpoint file topretrained
. -
Run training script with specified configuration (deep_lstm) to train the attributor:
python tools/nlp/train_nlp.py \ configs/deep_lstm.py \ --work-dir workdirs/lstm_imdb/ \ --gpu-id 0 \ --pbar
-
Check the results saved in
workdirs/lstm_imdb/
:input_masks/
contains the final attribution maps (at word level), whilefeat_masks/
contains the attribution maps produced by the IB at feature map level.
Like many attribution methods, our method can only be applied in a per-image
manner. For each new image, the Attributor
will train new components
(FeatureIBA
, WGAN
, InputIBA
). Attribution methods are used for explain already trained models.
Thus, there is no need to provide any pre-trained models here.
We implemented a handful of evaluation metrics including Sanity Check, Insertion/Deletion, Sensitivity-N, and our own proposed metric called EHR (Effective Heat Ratios).
Details of how to run evaluations on attribution methods can be found in input_iba/evaluation
or in this tutorial.
This repository is released under the MIT license.