This is the official repository of Washing The Unwashable : On The (Im)possibility of Fairwashing Detection, a work published in the Thirty-Sixth Conference on Neural Information Processing Systems (NeurIPS), New Orleans, Louisiana, USA, November 28 - December 9, 2022.
-
Clone source code from GitHub:
git clone https://github.com/cleverhans-lab/FRAUD-Detect.git
-
Create virtual environment (use Python 3.8):
python3 -m venv FRAUDDetect_env source /FRAUDDetect_env/bin/activate pip3 install -r requirements.txt
The code contains:
- Training black-box models
- Fairwashing and deteting fairwashing using FRAUD-Detect
- Evading FRAUD-Detect using an informed adversary
The ./FRAUD-Detect_code/models/
directory contains several files to train
black-box models (AdaBoost, DNN, RF, and XgBoost) from scratch, their
architectures and all hyper-parameters are located in the train_models.py
script, though main.sh
will train the models for all seed and generate
labels for both the suing and test set which are identical to those used
in our paper.
Note: training the black-box models requires tensorflow.
Trained black-box models. Note that all the pretrained models are provided at ./FRAUD-Detect/FRAUD-Detect_code/models/pretrained/
directory.
FRAUD-Detect detects fairwashed models by measuring the Kullback–Leibler (KL) divergence over subpopulation-wise confusion matrices of the interpretable model.
To observe the KL_confusion_matrix vs Demographic Parity graph for 10 seeds of a given dataset, interpretable model, and black-box model combination:
cd FRAUD-Detect_code/washing_analysis
python analysis.py --help
Rerun the last line with requested arguments (note, epsilons are
optional and given as space separated floats).
Results will be generated in ./sample_results/
directory.
The code currently supports:
- 3 datasets (Adult income, COMPAS, and Bank Marketing)
- 4 black-box models (DNN, AdaBoost, XgBoost, and Random Forest)
- 2 interpretable models (Descision Trees, Logistic Regression)
- The KL divergence as a function of Demographic Parity plot for each seed with dashed lines showing the black-box fairness;
- Demographic Parity and KL divergence as a function of fairwashing strength plot with error shadings.
We assume an informed adversary who is aware of the FRAUD-Detect and desires to evade FRAUD-Detect while performing fairwashing.
The ./FRAUD-Detect/FRAUD-Detect_code/quantifying_fairwashing/
directory contains codes that explore the range of fairness gap given a fixed value of fidelity and a fixed value of KL divergence via solving the informed adversary optimization problem.
A plot showing the range of Demographic Parity gap achievable by the informed adversary seeking to evade the fairwashing detector.
If you use our code, please cite the following paper:
@InProceedings{shamsabadi2022FraudDetect,
title = {Washing The Unwashable : On The (Im)possibility of Fairwashing Detection},
author = {Shahin Shamsabadi, Ali and Yaghini, Mohammad and Dullerud, Natalie and Wyllie, Sierra and Aïvodji, Ulrich and Alaagib, Aisha and Gambs, Sébastien and Papernot, Nicolas},
booktitle = {The Thirty-Sixth Conference on Neural Information Processing Systems (NeurIPS)},
year = {2022},
address = {New Orleans, Louisiana, USA},
month = November 28-December 9
}