- OpenXAI: Towards a Transparent Evaluation of Model Explanations
- Concpt based XAI library - CXAI
- Xplique
- DALEX
- AIX360
- ALIBI - Python XAI toolkit
- Proceedings of ICML 2021 Workshop on Theoretic Foundation, Criticism, and Application Trend of Explainable AI
- Neurocartography-Tool - Global explanation - Neuron level visualization
- TorchEsegata- Github repository
- Tutorials
- An alternative perspective on problems due to XAI *Quantus: An Explainable AI Toolkit for Responsible Evaluation of Neural Network Explanations
Title | Paper Title | Source Link | Code | Tags |
---|---|---|---|---|
Visualization of CNN | Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps | CVPR2013 | PyTorch | Visualization gradient-based saliency maps |
Title | Paper Title | Source Link | Code | Tags |
---|---|---|---|---|
CAM | Learning Deep Features for Discriminative Localization | CVPR2016 | PyTorch (Official) | class activation mapping |
LIME | “Why Should I Trust You?”Explaining the Predictions of Any Classifier | KDD2016 | PyTorch (Official) | trust a prediction |
Title | Paper Title | Source Link | Code | Tags |
---|---|---|---|---|
Grad-CAM | Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization | ICCV2017, CVPR2016 (original) | PyTorch | Visualization gradient-based saliency maps |
Network Dissection | Network Dissection: Quantifying Interpretability of Deep Visual Representations | CVPR2017 | PyTorch (Official) | Visualization |
Title | Paper Title | Source Link | Code | Tags |
---|---|---|---|---|
TCAV | Interpretability Beyond Feature Attribution:Quantitative Testing with Concept Activation Vectors (TCAV) | ICML 2018 | Tensorflow 1.15.2 | interpretability method |
Interpretable CNN | Interpretable Convolutional Neural Networks | CVPR 2018 | Tensorflow 1.x | explainability by design |
Anchors | Anchors: High-Precision Model-Agnostic Explanations | AAAI 2018 | sklearn (Official) | model-agnostic |
Sanity Checks | Sanity checks for saliency maps | NeurIPS 2018 | PyTorch | saliency methods vs edge detector |
Grad Cam++ | Grad Cam++:Improved Visual Explanations forDeep Convolutional Networks | WACV 2018 | PyTorch | saliency maps |
Interpretable Basis | Interpretable Basis Decomposition for Visual Explanation | ECCV 2018 | PyTorch | ibd |
Title | Paper Title | Source Link | Code | Tags |
---|---|---|---|---|
Full-grad | Full-Gradient Representation for Neural Network Visualization | NeurIPS2019 | PyTorch (Official) Tensorflow | saliency map representation |
This looks like that | This Looks Like That: Deep Learning for Interpretable Image Recognition | NeurIPS2019 | PyTorch (Official) | object |
Counterfactual visual explanations | Counterfactual visual explanations | ICML2019 | interpretability |
|
concept with contribution interpretable cnn | Explaining Neural Networks Semantically and Quantitatively | ICCV 2019 | ||
SIS | What made you do this? Understanding black-box decisions with sufficient input subsets | AISTATS 2019 - Supplementary Material | Tensorflow 1.x | |
Filter as concept detector | Filters in Convolutional Neural Networks as Independent Detectors of Visual Concepts | ACM |
Title | Paper Title | Source Link | Code | Tags |
---|---|---|---|---|
INN | Making Sense of CNNs: Interpreting Deep Representations & Their Invariances with INNs | ECCV 2020 | PyTorch | explainability by design |
X-Grad CAM | Axiom-based Grad-CAM: Towards Accurate Visualization and Explanation of CNNs | PyTorch | ||
Revisiting BP saliency | There and Back Again: Revisiting Backpropagation Saliency Methods | CVPR 2020 | PyTorch | grad cam failure noted |
Interacting with explanation | Making deep neural networks right for the right scientific reasons by interacting with their explanations | Nature Machine Intelligence | sklearn | |
Class specific Filters | Training Interpretable Convolutional Neural Networks by Differentiating Class-specific Filters | ECCV Supplementary Material | Code - not yet updated | ICLR rejected version with reviews |
Interpretable Decoupling | Interpretable Neural Network Decoupling | ECCV 2020 | ||
iCaps | iCaps: An Interpretable Classifier via Disentangled Capsule Networks | ECCV Supplementary Material | ||
VQA | Interpretable Visual Reasoning via Probabilistic Formulation under Natural Supervision | ECCV 2020 | PyTorch | |
When explanations lie | When Explanations Lie: Why Many Modified BP Attributions Fail | ICML 2020 | PyTorch | |
Similarity models | Towards Visually Explaining Similarity Models | Arxiv | ||
Quantify trust | How Much Should I Trust You? Modeling Uncertainty of Black Box Explanations | NeurIPS 2020 submission | hima_lakkaraju ,sameer_singh ,model-agnostic |
|
Concepts for segmentation task | ABSTRACTING DEEP NEURAL NETWORKS INTO CONCEPT GRAPHS FOR CONCEPT LEVEL INTERPRETABILITY | Arxiv | Tensorflow 1.14 | brain tumour segmentation |
Deep Lift based Network Pruning | Utilizing Explainable AI for Quantization and Pruning of Deep Neural Networks | Arxiv NeurIPS format | nas ,deep_lift |
|
Unifed Attribution Framework | A Unified Taylor Framework for Revisiting Attribution Methods | Arxivupdated | taylor ,attribution_framework |
|
Global Cocept Attribution | Towards Global Explanations of Convolutional Neural Networks with Concept Attribution | CVPR 2020 | ||
relevance estimation | Determining the Relevance of Features for Deep Neural Networks | ECCV 2020 | ||
localized concept maps | Explaining AI-based Decision Support Systems using Concept Localization Maps | Arxiv | Just repository created | |
quantify saliency | Quantifying Explainability of Saliency Methods in Deep Neural Networks | Arxiv | PyTorch | |
generalization of LIME - MeLIME | MeLIME: Meaningful Local Explanation for Machine Learning Models | Arxiv | Tensorflow 1.15 | |
global counterfactual explanations | Interpretable and Interactive Summaries of Actionable Recourses | Arxiv | ||
fine grained counterfactual heatmaps | SCOUTER: Slot Attention-based Classifier for Explainable Image Recognition | Arxiv | PyTorch | scouter |
quantify trust | How Much Can We Really Trust You? Towards Simple, Interpretable Trust Quantification Metrics for Deep Neural Networks | Arxiv | ||
Non-negative concept activation vectors | IMPROVING INTERPRETABILITY OF CNN MODELS USING NON-NEGATIVE CONCEPT ACTIVATION VECTORS | Arxiv | ||
different layer activations | Explaining Neural Networks by Decoding Layer Activations | Arxiv | ||
concept bottleneck networks | Concept Bottleneck Models | ICML 2020 | PyTorch | |
attribution | Visualizing the Impact of Feature Attribution Baselines | Distill | ||
CSI | Contextual Semantic Interpretability | Arxiv | explainable_by_design |
|
Improve black box via explanation | Introspective Learning by Distilling Knowledge from Online Self-explanation | Arxiv | kowledge_distillation |
|
Patch explanations | Information-Theoretic Visual Explanation for Black-Box Classifiers | Arxiv | Tensorflow 1.13.1 | patch_sampling ,information_theory |
Causality | Long-Tailed Classification by Keeping the Good and Removing the Bad Momentum Causal Effect | NeurIPS 2020 | PyTorch | |
Concept in Time series data | Conceptual Explanations of Neural Network Prediction for Time Series | IJCNN 2020 | time series , see if useful someway |
|
Explainable by Design | Trustworthy Convolutional Neural Networks:A Gradient Penalized-based Approach | Arxiv | ||
Colorwise Saliency | Visualizing Color-wise Saliency of Black-Box Image Classification Models | Arxiv | ||
concept based | Concept Discovery for The Interpretation of Landscape Scenicness | Downloadable File | ||
Integrated Score CAM | IS-CAM: Integrated Score-CAM for axiomatic-based explanations | Arxiv | ||
Grad LAM | Grad-LAM: Visualization of Deep Neural Networks for Unsupervised Learning | EURASIP 2020 | ||
Cites TCAV | Integrating Intrinsic and Extrinsic Explainability: The Relevance of Understanding Neural Networks for Human-Robot Interaction | AAAI 2020 | ||
Attribution | Learning Propagation Rules for Attribution Map Generation | Arxiv | ||
Zoom CAM | Zoom-CAM: Generating Fine-grained Pixel Annotations from Image Labels | Arxiv | must read before modularity proposal |
|
Masking based saliency maps investigation | INVESTIGATING AND SIMPLIFYING MASKING-BASED SALIENCY MAP METHODS FOR MODEL INTERPRETABILITY | Arxiv | PyTorch | |
Evaluation | Evaluating Attribution Methods using White-Box LSTMs | EMNLP Workshop | PyTorch | cites TCAV , says all explanations fail their test |
Interpretable Bayesian Neural Networks | Incorporating Interpretable Output Constraints in Bayesian Neural Networks | NeurIPS 2020 | PyTorch | |
Survey - Counterfactual explanations | Counterfactual Explanations for Machine Learning: A Review | Arxiv | ||
Standardised Explainability | The Need for Standardised Explainability | ICML 2020 Workshop | ||
CME | Now You See Me (CME): Concept-based Model Extraction | CIKM 2020 workshop | sklearn | |
Q FIT | Q-FIT: The Quantifiable Feature Importance Technique for Explainable Machine Learning | Arxiv | ||
Outside black box | Learning outside the Black-Box: The pursuit of interpretable models | NeurIPS 2020 | sklearn | |
Discrete Mask | Interpreting Image Classifiers by Generating Discrete Masks | IEEE - PAMI | ||
Contrastive explanations | Learning Global Transparent Models Consistent with Local Contrastive Explanations | NeurIPS 2020 | ||
Empirical study of Ideal Explanations | How Can I Explain This to You? An Empirical Study of Deep Neural Network Explanation Methods | NeurIPS 2020 | tensorflow 1.15 | Example based matching library |
This Looks Like That + Relevance | This Looks Like That, Because ... Explaining Prototypes for Interpretable Image Recognition | Arxiv | PyTorch | must read before relevance |
Concept based posthoc | ProtoViewer: Visual Interpretation and Diagnostics of Deep Neural Networks with Factorized Prototypes | Paper | refer human subject experiments |
|
Shapley Flow | Shapley Flow: A Graph-based Approach to Interpreting Model Predictions | Arxiv | ||
Attention Vs Saliency and Beyond | The elephant in the interpretability room: Why use attention as explanation when we have saliency methods? | Arxiv | ||
Unification of removal methods | Feature Removal Is A Unifying Principle For Model Explanation Methods | NeurIPS 2020 workshop | PyTorch | from the authors of SHAP Extended Arxiv version |
Robust and Stable Black Box Explanations | Robust and Stable Black Box Explanations | ICML 2020 | hima lakkaraju |
|
Debugging test | Debugging Tests for Model Explanations | Arxiv | ||
AISTATS 2020 submission | Ensuring Actionable Recourse via Adversarial Training | Arxiv | hima lakkaraju |
|
Layer wise explanation | Investigating Learning in Deep Neural Networks using Layer-Wise Weight Change | ResearchGate | ||
cites TCAV | Debiasing Convolutional Neural Networks via Meta Orthogonalization | Arxiv | Code page not found | |
Introducing concepts | SeXAI: Introducing Concepts into Black Boxes for Explainable Artificial Intelligence | Paper | Tensorflow 1.4 | |
Additive explainers | Learning simplified functions to understand | Paper | ||
BIN | Born Identity Network: Multi-way Counterfactual Map Generation to Explain a Classifier’s Decision | Arxiv | Tensorflow 2.2 | counterfactual explanations |
Explantion using Generative models | Explaining image classifiers by removing input features using generative models | ACCV 2020 | Tensorflow 1.12 & Pytorch 1.1 | Nguyen's paper |
Action Recognition Explanation | Play Fair: Frame Attributions in Video Models | ACCV 2020 | PyTorch | |
Concepts in VQA | Interpretable Visual Reasoning via Induced Symbolic Space | Arxiv | Code not yet updated, just repository created | |
Recourses | Beyond Individualized Recourse: Interpretable and Interactive Summaries of Actionable Recourses | NeurIPS 2020 | hima lakkaraju |
|
Feature Importance of CNN | Measuring Feature Importance of Convolutional Neural Networks | IEEE | ||
Causal Inference | Causal inference using deep neural networks | Arxiv | Keras | |
Match up | Match Them Up: Visually Explainable Few-shot Image Classification | Arxiv | PyTorch | |
Right for the Right Concept | Right for the Right Concept: Revising Neuro-Symbolic Concepts by Interacting with their Explanations | Arxiv | ||
MALC | Transparency Promotion with Model-Agnostic Linear Competitors | ICML 2020 | ||
Shapley Taylor Index | The Shapley Taylor Interaction Index | ICML 2020 | ||
Concept based explanation + user feedback | Teaching the Machine to Explain Itself using Domain Knowledge | Openreview | ||
Counterfactual produces Adversarial | Semantics and explanation: why counterfactual explanations produce adversarial examples in deep neural networks | AIJ submission | ||
MEME | MEME: Generating RNN Model Explanations via Model Extraction | OpenReview | Keras | RNN specific LIME, see if any improvisations for MACE comes from here |
ProtoPShare | ProtoPShare: Prototype Sharing for Interpretable Image Classification and Similarity Discovery | Arxiv - Accepted at ACM SIGKDD 2021 | PyTorch | Improved ProtoPNet (This looks like that) |
RANCC | RANCC: Rationalizing Neural Networks via Concept Clustering | ACL | Tensorflow 1.x | |
EAN | Efficient Attention Network: Accelerate Attention by Searching Where to Plug | Arxiv | PyTorch | |
LIME Analysis | Why model why? Assessing the strengths and limitations of LIME | Arxiv | sklearn | |
Rethink positive aggregation | Rethinking Positive Aggregation and Propagation of Gradients in Gradient-based Saliency Methods | ICML 2020 workshop WHI | ||
Pixel wise interpretation metric | A Metric to Compare Pixel-wise Interpretation Methods for Neural Networks | IEEE | ||
Latent space debiasing | Fair Attribute Classification through Latent Space De-biasing | Arxiv | PyTorch | |
Explanation - Teacher Student | Evaluating Explanations: How much do explanations from the teacher aid students? | Arxiv | ||
Neural Prototype Trees | Neural Prototype Trees for Interpretable Fine-grained Image Recognition | Arxiv | PyTorch | same group of This looks like that + relevance |
FixOut | FixOut: an ensemble approach to fairer models | Paper | ||
Concepts on Tabular data | Learning Interpretable Concept-Based Models with Human Feedback | Arxiv | ||
BayLIME | BayLIME: Bayesian Local Interpretable Model-Agnostic Explanations | Arxiv | Keras | |
PPI | Proactive Pseudo-Intervention: Causally Informed Contrastive Learning For Interpretable Vision Models | Arxiv | Anonymous PyTorch code link given | |
Generalized distillation | Understanding Interpretability by generalized distillation in Supervised Classification | AAAI 2021 submission | Code will be public upon acceptance | |
RIG | A Singular Value Perspective on Model Robustness | Arxiv | ||
Activation analysis | Explaining Predictions of Deep Neural Classifier via Activation Analysis | Arxiv | ||
Evaluation metrics | Evaluating Explainable Methods for Predictive Process Analytics: A Functionally-Grounded Approach | Arxiv | sklearn | |
Explanations based on train set | Explainable Artificial Intelligence: How Subsets of the Training Data Affect a Prediction | Arxiv | ||
DAX | DAX: Deep Argumentative eXplanation for Neural Networks | Arxiv | ||
Debiased CAM | Debiased-CAM for bias-agnostic faithful visual explanations of deep convolutional networks | Arxiv | Tensorflow 2.1.0 | lot of human subject experiments found |
Bias via explanation | Investigating Bias in Image Classification using Model Explanations | ICML WHI 2020 | ||
Shapley Credit Allocation | On Shapley Credit Allocation for Interpretability | Arxiv | ||
Dependency Decomposition | Dependency Decomposition and a Reject Option for Explainable Models | Arxiv | ||
Interpretation Network | xRAI: Explainable Representations through AI | Arxiv | ||
Explainable by Design | Evolutionary Generative Contribution Mappings | IEEE | explainable by design |
|
Transformer Explanation | Transformer Interpretability Beyond Attention Visualization | Arxiv CVPR format | PyTorch | |
MANE | MANE: Model-Agnostic Non-linear Explanations for Deep Learning Model | IEEE | see how similar to MAIRE |
|
Why and Why Not Explanations | On Relating ‘Why?’ and ‘Why Not?’ Explanations | Arxiv | sklearn | gives theoretical relationship between feature importance and counterfactual techniques |
cites ACE | Analyzing Representations inside Convolutional Neural Networks | Arxiv | PyTorch | |
CEN | CEN: Concept Evolution Network for Image Classification Tasks | ACM RICAI 2020 | explainable by design |
|
Quantitative evaluation metrics | Quantitative Evaluations on Saliency Methods: An Experimental Study | Arxiv | ||
Integrating black box and Interpretable model | IB-M: A Flexible Framework to Align an Interpretable Model and a Black-box Model | IEEE - BIBM 2020 | ||
X-GradCAM | Axiom-based Grad-CAM: Towards Accurate Visualization and Explanation of CNNs | BMVC 2020 | ||
RCAV | Robust Semantic Interpretability: Revisiting Concept Activation Vectors | ICML WHI 2020 | PyTorch |
Title | Paper Title | Source Link | Code | Tags |
---|---|---|---|---|
Debiasing concepts | Debiasing Concept Bottleneck Models with Instrumental Variables | ICLR 2021 submissions page - Accepted as Poster | causality |
|
Prototype Trajectory | Interpretable Sequence Classification Via Prototype Trajectory | ICLR 2021 submissions page | this looks like that styled RNN |
|
Shapley dependence assumption | Shapley explainability on the data manifold | ICLR 2021 submissions page | ||
High dimension Shapley | Human-interpretable model explainability on high-dimensional data | ICLR 2021 submissions page | ||
L2x like paper | A Learning Theoretic Perspective on Local Explainability | ICLR 2021 submissions page | ||
Evaluation | Evaluation of Similarity-based Explanations | ICLR 2021 submissions page | like adebayo paper for this looks like that styled methods |
|
Model correction | Defuse: Debugging Classifiers Through Distilling Unrestricted Adversarial Examples | ICLR 2021 submissions page | ||
Subspace explanation | Constraint-Driven Explanations of Black-Box ML Models | ICLR 2021 submissions page | to see how close to MUSE by Hima Lakkaraju 2019 |
|
Catastrophic forgetting | Remembering for the Right Reasons: Explanations Reduce Catastrophic Forgetting | ICLR 2021 submissions page | Code available in their Supplementary zip file | |
Non trivial counterfactual explanations | Beyond Trivial Counterfactual Generations with Diverse Valuable Explanations | ICLR 2021 submissions page | ||
Explainable by Design | Interpretability Through Invertibility: A Deep Convolutional Network With Ideal Counterfactuals And Isosurfaces | ICLR 2021 submissions page | ||
Gradient attribution | Rethinking the Role of Gradient-based Attribution Methods for Model Interpretability | ICLR 2021 submissions page | looks like extension of Sixt et al paper |
|
Mask based Explainable by Design | Investigating and Simplifying Masking-based Saliency Methods for Model Interpretability | ICLR 2021 submissions page | ||
NBDT - Explainable by Design | NBDT: Neural-Backed Decision Tree | ICLR 2021 submissions page | ||
Variational Saliency Maps | Variational saliency maps for explaining model's behavior | ICLR 2021 submissions page | ||
Network dissection with coherency or stability metric | Importance and Coherence: Methods for Evaluating Modularity in Neural Networks | ICLR 2021 submissions page | ||
Modularity | Are Neural Nets Modular? Inspecting Functional Modularity Through Differentiable Weight Masks | ICLR 2021 submissions page | Code made anonymous for review, link given in paper | |
Explainable by design | A self-explanatory method for the black problem on discrimination part of CNN | ICLR 2021 submissions page | seems concepts of game theory applied |
|
Attention not Explanation | Why is Attention Not So Interpretable? | ICLR 2021 submissions page | ||
Ablation Saliency | Ablation Path Saliency | ICLR 2021 submissions page | ||
Explainable Outlier Detection | Explainable Deep One-Class Classification | ICLR 2021 submissions page | ||
XAI without approximation | Explainable AI Wthout Interpretable Model | Arxiv | ||
Learning theoretic Local Interpretability | A LEARNING THEORETIC PERSPECTIVE ON LOCAL EXPLAINABILITY | Arxiv | ||
GANMEX | GANMEX: ONE-VS-ONE ATTRIBUTIONS USING GAN-BASED MODEL EXPLAINABILITY | Arxiv | ||
Evaluating Local Explanations | Evaluating local explanation methods on ground truth | Artificial Intelligence Journal Elsevier | sklearn | |
Structured Attention Graphs | Structured Attention Graphs for Understanding Deep Image Classifications | AAAI 2021 | PyTorch | see how close to MACE |
Ground truth explanations | Data Representing Ground-Truth Explanations to Evaluate XAI Methods | AAAI 2021 | sklearn | trained models available in their github repository |
AGF | Visualization of Supervised and Self-Supervised Neural Networks via Attribution Guided Factorization | AAAI 2021 | PyTorch | |
RSP | Interpreting Deep Neural Networks with Relative Sectional Propagation by Analyzing Comparative Gradients and Hostile Activations | AAAI 2021 | ||
HyDRA | HYDRA: Hypergradient Data Relevance Analysis for Interpreting Deep Neural Networks | AAAI 2021 | PyTorch | |
SWAG | SWAG: Superpixels Weighted by Average Gradients for Explanations of CNNs | WACV 2021 | ||
FastIF | FASTIF: Scalable Influence Functions for Efficient Model Interpretation and Debugging | Arxiv | PyTorch | |
EVET | EVET: Enhancing Visual Explanations of Deep Neural Networks Using Image Transformations | WACV 2021 | ||
Local Attribution Baselines | On Baselines for Local Feature Attributions | AAAI 2021 | PyTorch | |
Differentiated Explanations | Differentiated Explanation of Deep Neural Networks with Skewed Distributions | IEEE - TPAMI journal | PyTorch | |
Human game based survey | Explainable AI and Adoption of Algorithmic Advisors: an Experimental Study | Arxiv | ||
Explainable by design | Learning Semantically Meaningful Features for Interpretable Classifications | Arxiv | ||
Expred | Explain and Predict, and then Predict again | ACM WSDM 2021 | PyTorch | |
Progressive Interpretation | An Information-theoretic Progressive Framework for Interpretation | Arxiv | PyTorch | |
UCAM | Uncertainty Class Activation Map (U-CAM) using Gradient Certainty method | IEEE - TIP | Project Page | PyTorch |
progressive GAN explainability- smiling dataset- ICLR 2020 group | Explaining the Black-box Smoothly - A Counterfactual Approach | Arxiv | ||
Head pasted in another image - experimented | WHAT DO DEEP NETS LEARN? CLASS-WISE PATTERNS REVEALED IN THE INPUT SPACE | Arxiv | ||
Model correction | ExplOrs Explanation Oracles and the architecture of explainability | Paper | ||
Explanations - Knowledge Representation | A Basic Framework for Explanations in Argumentation | IEEE | ||
Eigen CAM | Eigen-CAM: Visual Explanations for Deep Convolutional Neural Networks | Springer | ||
Evaluation of Posthoc | How can I choose an explainer? An Application-grounded Evaluation of Post-hoc Explanations | ACM | ||
GLocalX | GLocalX - From Local to Global Explanations of Black Box AI Models | Arxiv | ||
Consistent Interpretations | Explainable Models with Consistent Interpretations | AAAI 2021 | ||
SIDU | Introducing and assessing the explainable AI (XAI) method: SIDU | Arxiv | ||
cites This looks like that | Explaining black-box classifiers using post-hoc explanations-by-example: The effect of explanations and error-rates in XAI user studies | AIJ | ||
i-Algebra | i-Algebra: Towards Interactive Interpretability of Deep Neural Networks | AAAI 2021 | ||
Shape texture bias | SHAPE OR TEXTURE: UNDERSTANDING DISCRIMINATIVE FEATURES IN CNNS | ICLR 2021 | ||
Class agnostic features | THE MIND’S EYE: VISUALIZING CLASS-AGNOSTIC FEATURES OF CNNS | Arxiv | ||
IBEX | A Multi-layered Approach for Tailored Black-box Explanations | Paper | Code | |
Relevant explanations | Learning Relevant Explanations | Paper | ||
Guided Zoom | Guided Zoom: Zooming into Network Evidence to Refine Fine-grained Model Decisions | IEEE | ||
XAI survey | A Survey on Understanding, Visualizations, and Explanation of Deep Neural Networks | Arxiv | ||
Pattern theory | Convolutional Neural Network Interpretability with General Pattern Theory | Arxiv | PyTorch | |
Gaussian Process based explanations | Bandits for Learning to Explain from Explanations | AAAI 2021 | sklearn | |
LIFT CAM | LIFT-CAM: Towards Better Explanations for Class Activation Mapping | Arxiv | ||
ObAIEx | Right for the Right Reasons: Making Image Classification Intuitively Explainable | Paper | tensorflow | |
VAE based explainer | Combining an Autoencoder and a Variational Autoencoder for Explaining the Machine Learning Model Predictions | IEEE | ||
Segmentation based explanation | Deep Co-Attention Network for Multi-View Subspace Learning | Arxiv | PyTorch | |
Integrated CAM | INTEGRATED GRAD-CAM: SENSITIVITY-AWARE VISUAL EXPLANATION OF DEEP CONVOLUTIONAL NETWORKS VIA INTEGRATED GRADIENT-BASED SCORING | ICASSP 2021 | PyTorch | |
Human study | VitrAI - Applying Explainable AI in the Real World | Arxiv | ||
Attribution Mask | Attribution Mask: Filtering Out Irrelevant Features By Recursively Focusing Attention on Inputs of DNNs | Arxiv | PyTorch | |
LIME faithfulness | What does LIME really see in images? | Arxiv | Tensorflow 1.x | |
Assess model reliability | Intuitively Assessing ML Model Reliability through Example-Based Explanations and Editing Model Inputs | Arxiv | ||
Perturbation + Gradient unification | Towards the Unification and Robustness of Perturbation and Gradient Based Explanations | Arxiv | hima lakkaraju |
|
Gradients faithful? | Do Input Gradients Highlight Discriminative Features? | Arxiv | PyTorch | |
Untrustworthy predictions | Identifying Untrustworthy Predictions in Neural Networks by Geometric Gradient Analysis | Arxiv | ||
Explaining misclassification | Explaining Inaccurate Predictions of Models through k-Nearest Neighbors | Paper | cites Oscar Li AAAI 2018 prototypes paper | |
Explanations inside predictions | Have We Learned to Explain?: How Interpretability Methods Can Learn to Encode Predictions in their Interpretations | AISTATS 2021 | ||
Layerwise interpretation | LAYER-WISE INTERPRETATION OF DEEP NEURAL NETWORKS USING IDENTITY INITIALIZATION | Arxiv | ||
Visualizing Rule Sets | Visualizing Rule Sets: Exploration and Validation of a Design Space | Arxiv | PyTorch | |
Human experiments | Are Explanations Helpful? A Comparative Study of the Effects of Explanations in AI-Assisted Decision-Making | IUI 2021 | ||
Attention fine-grained classification | Interpretable Attention Guided Network for Fine-grained Visual Classification | Arxiv | ||
Concept construction | Explaining Classifiers by Constructing Familiar Concepts | Paper | PyTorch | |
EbD | Human-Understandable Decision Making for Visual Recognition | Arxiv | ||
Bridging XAI algorithm , Human needs | Towards Connecting Use Cases and Methods in Interpretable Machine Learning | Arxiv | ||
Generative trustworthy classifiers | Generative Classifiers as a Basis for Trustworthy Image Classification | Paper | Github | |
Counterfactual explanations | Generating Interpretable Counterfactual Explanations By Implicit Minimisation of Epistemic and Aleatoric Uncertainties | AISTATS 2021 | PyTorch | |
Role categorization of CNN units | Quantitative Effectiveness Assessment and Role Categorization of Individual Units in Convolutional Neural Networks | ICML 2021 | ||
Non-trivial counterfactual explanations | Beyond Trivial Counterfactual Explanations with Diverse Valuable Explanations | Arxiv | ||
NP-ProtoPNet | These do not Look Like Those: An Interpretable Deep Learning Model for Image Recognition | IEEE | ||
Correcting neural networks based on explanations | Refining Neural Networks with Compositional Explanations | Arxiv | Code link given in paper, but page not found | |
Contrastive reasoning | Contrastive Reasoning in Neural Networks | Arxiv | ||
Concept based | Intersection Regularization for Extracting Semantic Attributes | Arxiv | ||
Boundary explanations | Boundary Attributions Provide Normal (Vector) Explanations | Arxiv | PyTorch | |
Generative Counterfactuals | ECINN: Efficient Counterfactuals from Invertible Neural Networks | Arxiv | ||
ICE | Invertible Concept-based Explanations for CNN Models with Non-negative Concept Activation Vectors | AAAI 2021 | ||
Group CAM | Group-CAM: Group Score-Weighted Visual Explanations for Deep Convolutional Networks | Arxiv | PyTorch | |
HMM interpretability | Towards interpretability of Mixtures of Hidden Markov Models | AAAI 2021 | sklearn | |
Empirical Explainers | Efficient Explanations from Empirical Explainers | Arxiv | PyTorch | |
FixNorm | FIXNORM: DISSECTING WEIGHT DECAY FOR TRAINING DEEP NEURAL NETWORKS | Arxiv | ||
CoDA-Net | Convolutional Dynamic Alignment Networks for Interpretable Classifications | CVPR 2021 | Code link given in paper. Repository not yet created | |
Like Dr. Chandru sir's (IITPKD) XAI work | Neural Response Interpretation through the Lens of Critical Pathways | Arxiv | PyTorch- Pathway GradPyTorch - ROAR | |
Inaugment | InAugment: Improving Classifiers via Internal Augmentation | Arxiv | Code yet to be updated | |
Gradual Grad CAM | Enhancing Deep Neural Network Saliency Visualizations with Gradual Extrapolation | Arxiv | PyTorch | |
A-FMI | A-FMI: LEARNING ATTRIBUTIONS FROM DEEP NETWORKS VIA FEATURE MAP IMPORTANCE | Arxiv | ||
Trust - Regression | To Trust or Not to Trust a Regressor: Estimating and Explaining Trustworthiness of Regression Predictions | AAAI 2021 | sklearn | |
Concept based explanations - study | IS DISENTANGLEMENT ALL YOU NEED? COMPARING CONCEPT-BASED & DISENTANGLEMENT APPROACHES | ICLR 2021 workshop | tensorflow 2.3 | |
Faithful attribution | Mutual Information Preserving Back-propagation: Learn to Invert for Faithful Attribution | Arxiv | ||
Counterfactual explanation | Counterfactual attribute-based visual explanations for classification | Springer | ||
User based explanations | That's (not) the output I expected!” On the role of end user expectations in creating explanations of AI systems | AIJ | ||
Human understandable concept based explanations | Towards Human-Understandable Visual Explanations: Imperceptible High-frequency Cues Can Better Be Removed | Arxiv | ||
Improved attribution | Improving Attribution Methods by Learning Submodular Functions | Arxiv | ||
SHAP tractability | On the Complexity of SHAP-Score-Based Explanations: Tractability via Knowledge Compilation and Non-Approximability Results | Arxiv | ||
SHAP explanation network | SHAPLEY EXPLANATION NETWORKS | ICLR 2021 | PyTorch | |
Concept based dataset shift explanation | FAILING CONCEPTUALLY: CONCEPT-BASED EXPLANATIONS OF DATASET SHIFT | ICLR 2021 workshop | tensorflow 2 | |
EbD | Towards Human-Understandable Visual Explanations: Imperceptible High-frequency Cues Can Better Be Removed | Arxiv | ||
Evaluating CAM | Revisiting The Evaluation of Class Activation Mapping for Explainability: A Novel Metric and Experimental Analysis | Arxiv | ||
EFC-CAM | Exclusive Feature Constrained Class Activation Mapping for Better Visual Explanation | IEEE | ||
Causal Interpretation | Instance-wise Causal Feature Selection for Model Interpretation | Arxiv | PyTorch | |
Fairness in Learning | Learning to Learn to be Right for the Right Reasons | Arxiv | ||
Feature attribution correctness | Do Feature Attribution Methods Correctly Attribute Features? | Arxiv | Code not yet updated | |
NICE | NICE: AN ALGORITHM FOR NEAREST INSTANCE COUNTERFACTUAL EXPLANATIONS | Arxiv | Own Python Package | |
SCG | A Peek Into the Reasoning of Neural Networks: Interpreting with Structural Visual Concepts | Arxiv | ||
Visual Concepts | A Peek Into the Reasoning of Neural Networks: Interpreting with Structural Visual Concepts | Arxiv | ||
This looks like that - drawback | This Looks Like That... Does it? Shortcomings of Latent Space Prototype Interpretability in Deep Networks | Arxiv | PyTorch | |
Exemplar based classification | Visualizing Association in Exemplar-Based Classification | ICASSP 2021 | ||
Correcting classification | CORRECTING CLASSIFICATION: A BAYESIAN FRAMEWORK USING EXPLANATION FEEDBACK TO IMPROVE CLASSIFICATION ABILITIES | Arxiv | ||
Concept Bottleneck Networks | DO CONCEPT BOTTLENECK MODELS LEARN AS INTENDED? | ICLR workshop 2021 | ||
Sanity for saliency | Sanity Simulations for Saliency Methods | Arxiv | ||
Concept based explanations | Cause and Effect: Concept-based Explanation of Neural Networks | Arxiv | ||
CLIMEP | How to Explain Neural Networks: A perspective of data space division | Arxiv | ||
Sufficient explanations | Probabilistic Sufficient Explanations | Arxiv | Empty Repository | |
SHAP baseline | Learning Baseline Values for Shapley Values | Arxiv | ||
Explainable by Design | EXoN: EXplainable encoder Network | Arxiv | tensorflow 2.4.0 | explainable VAE |
Concept based explanations | Aligning Artificial Neural Networks and Ontologies towards Explainable AI | AAAI 2021 | ||
XAI via Bayesian teaching | ABSTRACTION, VALIDATION, AND GENERALIZATION FOR EXPLAINABLE ARTIFICIAL INTELLIGENCE | Arxiv | ||
Explanation blind spots | DO NOT EXPLAIN WITHOUT CONTEXT: ADDRESSING THE BLIND SPOT OF MODEL EXPLANATIONS | Arxiv | ||
BLA | Bounded logit attention: Learning to explain image classifiers | Arxiv | tensorflow | L2X++ |
Interpretability - mathematical model | The Definitions of Interpretability and Learning of Interpretable Models | Arxiv | ||
Similar to our ICML workshop 2021 work | The effectiveness of feature attribution methods and its correlation with automatic evaluation scores | Arxiv | ||
EDDA | EDDA: Explanation-driven Data Augmentation to Improve Model and Explanation Alignment | Arxiv | ||
Relevant set explanations | Efficient Explanations With Relevant Sets | Arxiv | ||
Model transfer | Making CNNs Interpretable by Building Dynamic Sequential Decision Forests with Top-down Hierarchy Learning | Arxiv | ||
Model correction | Finding and Fixing Spurious Patterns with Explanations | Arxiv | ||
Neuron graph communities | On the Evolution of Neuron Communities in a Deep Learning Architecture | Arxiv | ||
Mid level features explanations | A general approach for Explanations in terms of Middle Level Features | Arxiv | see how different from MUSE by Hima Lakkaraju group | |
Concept based knowledge distillation | Towards Black-Box Explainability with Gaussian Discriminant Knowledge Distillation | CVPR 2021 workshop | compare and contrast with network dissection | |
CNN high frequency bias | Dissecting the High-Frequency Bias in Convolutional Neural Networks | CVPR 2021 workshop | Tensorflow | |
Explainable by design | Entropy-based Logic Explanations of Neural Networks | Arxiv | PyTorch | concept based |
CALM | Keep CALM and Improve Visual Feature Attribution | Arxiv | PyTorch | |
Relevance CAM | Relevance-CAM: Your Model Already Knows Where to Look | CVPR 2021 | PyTorch | |
S-LIME | S-LIME: Stabilized-LIME for Model Explanation | Arxiv | sklearn | |
Local + Global | Best of both worlds: local and global explanations with human-understandable concepts | Arxiv | Been Kim's group | |
Guided integrated gradients | Guided Integrated Gradients: an Adaptive Path Method for Removing Noise | CVPR 2021 | ||
Concept based | Meaningfully Explaining a Model’s Mistakes | Arxiv | ||
Explainable by design | It’s FLAN time! Summing feature-wise latent representations for interpretability | Arxiv | ||
SimAM | SimAM: A Simple, Parameter-Free Attention Module for Convolutional Neural Networks | ICML 2021 | PyTorch | |
DANCE | DANCE: Enhancing saliency maps using decoys | ICML 2021 | Tensorflow 1.x | |
EbD Concept formation | Explore Visual Concept Formation for Image Classification | ICML 2021 | PyTorch | |
Explainable by design | Interpretable Compositional Convolutional Neural Networks | Arxiv | ||
Attribution aggregation | Explaining Convolutional Neural Networks through Attribution-Based Input Sampling and Block-Wise Feature Aggregation | AAAI 2021 - pdf | ||
Perturbation based activation | A Novel Visual Interpretability for Deep Neural Networks by Optimizing Activation Maps with Perturbation | AAAI 2021 | ||
Global explanations | Feature Synergy, Redundancy, and Independence in Global Model Explanations using SHAP Vector Decomposition | Arxiv | Github package | |
L2E | Learning to Explain: Generating Stable Explanations Fast | ACL 2021 | PyTorch | NLE |
Joint Shapley | Joint Shapley values: a measure of joint feature importance | Arxiv | ||
Explainable by design | Align Yourself: Self-supervised Pre-training for Fine-grained Recognition via Saliency Alignment | Arxiv | ||
Explainable by design | SONG: SELF-ORGANIZING NEURAL GRAPHS | Arxiv | ||
Explainable by design | Designing Shapelets for Interpretable Data-Agnostic Classification | AIES 2021 | sklearn | Interpretable block of time series extended to other data modalitites like image, text, tabular |
Global explanations + Model correction | Where do Models go Wrong? Parameter-Space Saliency Maps for Explainability | Arxiv | PyTorch | |
HIL- Model correction | Human-in-the-loop Extraction of Interpretable Concepts in Deep Learning Models | Arxiv | ||
Activation based Cause Analysis | Activation-Based Cause Analysis Method for Neural Networks | IEEE Access 2021 | ||
Local explanations | Leveraging Latent Features for Local Explanations | ACM SIGKDD 2021 | Amit Dhurandhar group | |
Fairness | Adequate and fair explanations | Arxiv - Accepted in CD-MAKE 2021 | ||
Global explanations | Finding Representative Interpretations on Convolutional Neural Networks | ICCV 2021 | ||
Groupwise explanations | Learning Groupwise Explanations for Black-Box Models | IJCAI 2021 | PyTorch | |
Mathematical | On Smoother Attributions using Neural Stochastic Differential Equations | IJCAI 2021 | ||
AGI | Explaining Deep Neural Network Models with Adversarial Gradient Integration | IJCAI 2021 | PyTorch | |
Accountable attribution | Longitudinal Distance: Towards Accountable Instance Attribution | Arxiv | Tensorflow Keras | |
Global explanation | Understanding of Kernels in CNN Models by Suppressing Irrelevant Visual Features in Images | Arxiv | ||
Concepts based - Explainable by design | Inducing Semantic Grouping of Latent Concepts for Explanations: An Ante-Hoc Approach | Arxiv | IITH Vineeth sir group | |
Explainable by design | This looks more like that: Enhancing Self-Explaining Models by Prototypical Relevance Propagation | Arxiv | ||
MIL | ProtoMIL: Multiple Instance Learning with Prototypical Parts for Fine-Grained Interpretability | Arxiv | ||
Concept based explanations | Instance-wise or Class-wise? A Tale of Neighbor Shapley for Concept-based Explanation | Arxiv | ||
Counterfactual explanation + Theory of Mind | CX-ToM: Counterfactual Explanations with Theory-of-Mind for Enhancing Human Trust in Image Recognition Models | Arxiv | ||
Evaluation metric | Counterfactual Evaluation for Explainable AI | Arxiv | ||
CIM - FSC | CIM: Class-Irrelevant Mapping for Few-Shot Classification | Arxiv | ||
Causal Concepts | Unsupervised Causal Binary Concepts Discovery with VAE for Black-box Model Explanation | Arxiv | ||
ECE | Ensemble of Counterfactual Explainers | Paper | Code - seems hybrid of tf and torch | |
Structured Explanations | From Heatmaps to Structured Explanations of Image Classifiers | Arxiv | ||
XAI metric | An Objective Metric for Explainable AI - How and Why to Estimate the Degree of Explainability | Arxiv | ||
DisCERN | DisCERN:Discovering Counterfactual Explanations using Relevance Features from Neighbourhoods | Arxiv | ||
PSEM | Towards Better Model Understanding with Path-Sufficient Explanations | Arxiv | Amit Dhurandhar sir group | |
Evaluation traps | The Logic Traps in Evaluating Post-hoc Interpretations | Arxiv | ||
Interactive explanations | Explainability Requires Interactivity | Arxiv | PyTorch | |
CounterNet | CounterNet: End-to-End Training of Counterfactual Aware Predictions | Arxiv | PyTorch | |
Evaluation metric - Concept based explanation | Detection Accuracy for Evaluating Compositional Explanations of Units | Arxiv | ||
Explanation - Uncertainity | Effects of Uncertainty on the Quality of Feature Importance Explanations | Arxiv | ||
Survey Paper | TOWARDS USER-CENTRIC EXPLANATIONS FOR EXPLAINABLE MODELS: A REVIEW | JISTM Journal Paper | ||
Feature attribution | The Struggles and Subjectivity of Feature-Based Explanations: Shapley Values vs. Minimal Sufficient Subsets | AAAI 2021 workshop | ||
Contextual explanation | Context-based image explanations for deep neural networks | Image and Vision Computing Journal | ||
Causal + Counterfactual | Counterfactual Instances Explain Little | Arxiv | ||
Case based Posthoc | Explaining Deep Learning using examples: Optimal feature weighting methods for twin systems using post-hoc, explanation-by-example in XAI | Elsevier | ||
Debugging gray box model | Toward a Unified Framework for Debugging Gray-box Models | Arxiv | ||
Explainable by design | Optimising for Interpretability: Convolutional Dynamic Alignment Networks | Arxiv | ||
XAI negative effect | Explainability Pitfalls: Beyond Dark Patterns in Explainable AI | Arxiv | ||
Evaluate attributions | WHO EXPLAINS THE EXPLANATION? QUANTITATIVELY ASSESSING FEATURE ATTRIBUTION METHODS | Arxiv | ||
Counterfactual explanations | Designing Counterfactual Generators using Deep Model Inversion | Arxiv | ||
Model correction using explanation | Consistent Explanations by Contrastive Learning | Arxiv | ||
Visualize feature maps | Visualizing Feature Maps for Model Selection in Convolutional Neural Networks | ICCV 2021 Workshop | Tensorflow 1.15 | |
SPS | Stochastic Partial Swap: Enhanced Model Generalization and Interpretability for Fine-grained Recognition | ICCV 2021 | PyTorch | |
DMBP | Generating Attribution Maps with Disentangled Masked Backpropagation | ICCV 2021 | ||
Better CAM | Towards Better Explanations of Class Activation Mapping | ICCV 2021 | ||
LEG | Statistically Consistent Saliency Estimation | ICCV 2021 | Keras | |
IBA | Fine-Grained Neural Network Explanation by Identifying Input Features with Predictive Information | NeurIPS 2021 | PyTorch | |
Looks similar to This Looks Like That | Interpretable Image Recognition by Constructing Transparent Embedding Space | ICCV 2021 | Code not yet publicly released | |
Causal Imagenet | CAUSAL IMAGENET: HOW TO DISCOVER SPURIOUS FEATURES IN DEEP LEARNING? | Arxiv | ||
Model correction | Logic Constraints to Feature Importances | Arxiv | ||
Receptive field Misalignment CAM | On the Receptive Field Misalignment in CAM-based Visual Explanations | Pattern recognition Letters | PyTorch | |
Simplex | Explaining Latent Representations with a Corpus of Examples | Arxiv | PyTorch | |
Sanity checks | Revisiting Sanity Checks for Saliency Maps | Arxiv - NeurIPS 2021 workshop | ||
Model correction | Debugging the Internals of Convolutional Networks | |||
SITE | Self-Interpretable Model with Transformation Equivariant Interpretation | Arxiv | Accepted at NeurIPS 2021 | EbD |
Influential examples | Revisiting Methods for Finding Influential Examples | Arxiv | ||
SOBOL | Look at the Variance! Efficient Black-box Explanations with Sobol-based Sensitivity Analysis | NeurIPS 2021 | Tensorflow and PyTorch | |
Feature vectors | Beyond Importance Scores: Interpreting Tabular ML by Visualizing Feature Semantics | Arxiv | global interpretability | |
OOD in explainability | The Out-of-Distribution Problem in Explainability and Search Methods for Feature Importance Explanations | NeurIPS 2021 | sklearn | |
RPS LJE | Representer Point Selection via Local Jacobian Expansion for Post-hoc Classifier Explanation of Deep Neural Networks and Ensemble Models | NeurIPS 2021 | PyTorch | |
Model correction | Editing a Classifier by Rewriting Its Prediction Rules | NeurIPS 2021 | Code | |
suppressor variable litmus test | Scrutinizing XAI using linear ground-truth data with suppressor variables | Arxiv | ||
Explainable knowledge distillation | Learning Interpretation with Explainable Knowledge Distillation | Arxiv | ||
STEEX | STEEX: Steering Counterfactual Explanations with Semantics | Arxiv | Code | |
Binary counterfactual explanation | Counterfactual Explanations via Latent Space Projection and Interpolation | Arxiv | ||
ECLAIRE | Efficient Decompositional Rule Extraction for Deep Neural Networks | Arxiv | R | |
CartoonX | Cartoon Explanations of Image Classifiers | Researchgate | ||
concept based explanation | Explanations in terms of Hierarchically organised Middle Level Features | Paper | see how close to MACE and PACE | |
Concept ball | Ontology-based 𝑛-ball Concept Embeddings Informing Few-shot Image Classification | Paper | ||
SPARROW | SPARROW: Semantically Coherent Prototypes for Image Classification | BMVC 2021 | ||
XAI evaluation criteria | Objective criteria for explanations of machine learning models | Paper | ||
Code inversion with human perception | EXPLORING ALIGNMENT OF REPRESENTATIONS WITH HUMAN PERCEPTION | Arxiv | ||
Deformable ProtoPNet | Deformable ProtoPNet: An Interpretable Image Classifier Using Deformable Prototypes | Arxiv | ||
ICSN | Interactive Disentanglement: Learning Concepts by Interacting with their Prototype Representations | Arxiv | ||
HIVE | HIVE: Evaluating the Human Interpretability of Visual Explanations | Arxiv | Project Page | |
Jitter CAM | Jitter-CAM: Improving the Spatial Resolution of CAM-Based Explanations | BMVC 2021 | PyTorch | |
Interpreting last layer | dentifying Class Specific Filters with L1 Norm Frequency Histograms in Deep CNNs | Arxiv | ||
FCP | Forward Composition Propagation for Explainable Neural Reasoning | Arxiv | ||
Protopool | Interpretable Image Classification with Differentiable Prototypes Assignment | Arxiv | ||
PRELIM | Pedagogical Rule Extraction for Learning Interpretable Models | Arxiv | ||
Fair correction vectors | FAIR INTERPRETABLE LEARNING VIA CORRECTION VECTORS | ICLR 2021 | ||
Smooth LRP | SmoothLRP: Smoothing LRP by Averaging over Stochastic Input Variations | ESANN 2021 | ||
Causal CAM | EXTRACTING CAUSAL VISUAL FEATURES FOR LIMITED LABEL CLASSIFICATION | ICIP 2021 |
Title | Paper Title | Source Link | Code | Tags |
---|---|---|---|---|
SNI | Semantic Network Interpretation | WACV 2022 | ||
F-CAM | F-CAM: Full Resolution Class Activation Maps via Guided Parametric Upscaling | WACV 2022 | PyTorch | |
PCACE | PCACE: A Statistical Approach to Ranking Neurons for CNN Interpretability | Arxiv | ||
Evaluating Attribution methods | Evaluating Attribution Methods in Machine Learning Interpretability | IEEE International Conference on Big Data | ||
X-decision making | Explainable Decision Making with Lean and Argumentative Explanations | Arxiv | ||
Include domain knowledge to neural network | A review of some techniques for inclusion of domain‑knowledge into deep neural networks | Nature | ||
CNN Hierarchical Decomposition | Deeply Explain CNN via Hierarchical Decomposition | Arxiv | ||
Explanatory learning | EXPLANATORY LEARNING: BEYOND EMPIRICISM IN NEURAL NETWORKS | Arxiv | ||
Conceptor CAM | Conceptor Learning for Class Activation Mapping | IEEE-TIP | ||
Classifier orthogonalization | CONTROLLING DIRECTIONS ORTHOGONAL TO A CLASSIFIER | ICLR 2022 | PyTorch | |
Attention not explanation | Attention cannot be an Explanation | Arxiv | ||
CNN sensitivity analysis | A Comprehensive Study of Image Classification Model Sensitivity to Foregrounds, Backgrounds, and Visual Attributes | Arxiv | ||
Trusting extrapolation | To what extent should we trust AI models when they extrapolate? | Arxiv | ||
LAP | LAP: An Attention-Based Module for Faithful Interpretation and Knowledge Injection in Convolutional Neural Networks | Arxiv | concept based explanations | |
Saliency map evaluation metrics | Metrics for saliency map evaluation of deep learning explanation methods | Arxiv | ||
LINEX | Locally Invariant Explanations: Towards Stable and Unidirectional Explanations through Local Invariant Learning | Arxiv | ||
ROAD | Evaluating Feature Attribution: An Information-Theoretic Perspective | Arxiv | PyTorch | |
CBM-AUC | Concept Bottleneck Model with Additional Unsupervised Concepts | Arxiv | ||
Explainability as dialogue | Rethinking Explainability as a Dialogue: A Practitioner’s Perspective | Arxiv | ||
IAA | Aligning Eyes between Humans and Deep Neural Network through Interactive Attention Alignment | Arxiv | ||
Plug in | A Novel Plug-in Module for Fine-Grained Visual Classification | Arxiv | PyTorch | |
Hierarchical concepts | Cause and Effect: Hierarchical Concept-based Explanation of Neural Networks | Arxiv | ||
Model correction by design | LEARNING ROBUST CONVOLUTIONAL NEURAL NETWORKS WITH RELEVANT FEATURE FOCUSING VIA EXPLANATIONS | Arxiv | ||
Concept discovery | Discovering Concepts in Learned Representations using Statistical Inference and Interactive Visualization | Arxiv | ||
Rare spurious correlation | Understanding Rare Spurious Correlations in Neural Networks | Arxiv | PyTorch | |
Causal | Matching Learned Causal Effects of Neural Networks with Domain Priors | Arxiv | ||
PYLON | Improved image classification explainability with high accuracy heatmaps | iScience Journal | ||
Causal counterfactual | REALISTIC COUNTERFACTUAL EXPLANATIONS BY LEARNED RELATIONS | Arxiv | ||
Argumentative Causal explanation | Forging Argumentative Explanations from Causal Models | Paper | ||
EVA | Don’t Lie to Me! Robust and Efficient Explainability with Verified Perturbation Analysis | Arxiv | ||
Conceptual modelling | ConceptSuperimposition: Using Conceptual Modeling Method for Explainable AI | Paper | ||
SIDU | Visual Explanation of Black-Box Model : Similarity Difference and Uniqueness (SIDU) Method | Pattern Recognition Journal | Tensorflow 2.x | |
Explainable representations | Explaining, Evaluating and Enhancing Neural Networks’ Learned Representations | Arxiv | ||
XAI Overview | Explanatory Paradigms in Neural Networks | Arxiv | ||
Evaluating attribution methods | Evaluating Feature Attribution Methods in the Image Domain | Arxiv | PyTorch | |
Prototype vector + perturbation | The Need for Empirical Evaluation of Explanation Quality | Arxiv | ||
ADVISE | ADVISE: ADaptive Feature Relevance and VISual Explanations for Convolutional Neural Networks | Arxiv | Matlab | |
Improving Grad CAM | Improving the Interpretability of GradCAMs in Deep Classification Networks | Science Direct | ||
Explainable by design | Interpretable part-whole hierarchies and conceptual-semantic relationships in neural networks | CVPR 2022 | PyTorch | |
CAMP | Do Explanations Explain? Model Knows Best | Arxiv | PyTorch | |
Attribution stability | RETHINKING STABILITY FOR ATTRIBUTION-BASED EXPLANATIONS | Arxiv | ||
SSSCD | Sparse Subspace Clustering for Concept Discovery (SSCCD) | Arxiv | ||
Model improvement | Beyond Explaining: Opportunities and Challenges of XAI-Based Model Improvement | Arxiv | ||
Causal explanations | Trying to Outrun Causality in Machine Learning: Limitations of Model Explainability Techniques for Identifying Predictive Variables | Arxiv | sklearn | |
Causal explanations | Diffusion Causal Models for Counterfactual Estimation | Arxiv | ||
Causal inference influence functions | A Free Lunch with Influence Functions? Improving Neural Network Estimates with Concepts from Semiparametric Statistics | Arxiv | PyTorch | |
Causal discovery | Causal discovery for observational sciences using supervised machine learning | Arxiv | ||
Causal DA | Causal Domain Adaptation with Copula Entropy based Conditional Independence Test | Arxiv | ||
Causal experimental design | Interventions, Where and How? Experimental Design for Causal Models at Scale | Arxiv | seems ICML format | |
Causal discovery | SCORE MATCHING ENABLES CAUSAL DISCOVERY OF NONLINEAR ADDITIVE NOISE MODELS | Arxiv | ||
Causal Explanation - Cynthia Rudin | WHY INTERPRETABLE CAUSAL INFERENCE IS IMPORTANT FOR HIGH-STAKES DECISION MAKING FOR CRITICALLY ILL PATIENTS AND HOW TO DO IT | Arxiv | ||
Semantically consistent counterfactuals | Making Heads or Tails: Towards Semantically Consistent Visual Counterfactuals | Arxiv | ||
Posthoc global hypersphere | Post-hoc Global Explanation using Hypersphere Sets | ICAART 2022 | ||
CapsNet explanation | Investigation of Capsule Networks Regarding their Potential of Explainability and Image Rankings | ICAART 2022 | ||
XAI evaluation | A Unified Study of Machine Learning Explanation Evaluation Metrics | Arxiv | ||
Concept based counterfactual explanations | DISSECT: Disentangled Simultaneous Explanations via Concept Traversals | ICLR 2022 | tensorflow 1.12 | Been Kim's group |
concept evolution | ConceptEvo: Interpreting Concept Evolution in Deep Learning Training | Arxiv | ||
Poly-CAM | Backward recursive Class Activation Map refinement for high resolution saliency map | Paper | ||
Interactive Concept explanation | ConceptExplainer: Interactive Explanation for Deep Neural Networks from a Concept Perspective | Arxiv | ||
Quasi ProtoPNet | Think positive: An interpretable neural network for image recognition | Neural Networks Journal | ||
TAM | VISUALIZING DEEP NEURAL NETWORKS WITH TOPOGRAPHIC ACTIVATION MAPS | Arxiv | ||
S-XAI | Semantic interpretation for convolutional neural networks: What makes a cat a cat? | Arxiv | ||
See through DNN | Perception Visualization: Seeing Through the Eyes of a DNN | Arxiv | ||
IOM | Understanding CNNs from excitations | Arxiv | ||
KICE | Integrating Prior Knowledge in Post-hoc Explanations | Arxiv |