This repository consists of tools for applying information bottleneck (IB) analysis to neural networks, and specifically for exact IB analysis in quantized neural networks. The repo contains experiments for the ICLR 2022 paper [1]:
The study applies and exact version of the IB analysis for neural networks as investigated and discussed in [2,3].
Information plane for the `Bottleneck-2` network applied to MNIST.The code has been tested using (Arch) Linux, but runs in Windows with only few modifications. The following python libraries are required:
numpy
pandas
tensorflow
tensorflow_model_optimization
Furthermore, lualatex
is required for making the plots.
The important directories and files in the repository are:
IB
: Module containing the code for models, training procedure and computation of mutual information (MI).IB/experiment.py
: Code for training procedure.IB/util/estimator.py
: Code for MI estimation/computation.IB/models/models.py
: Networks analysed in the experiments.IB/data
: Contains code for loading MNIST [4] and the synthetic data set of [2].experiment
: Directory with experiments contained in the paper [1].experiment/Makefile
: Linux Makefile for running the experiments and making plots (see below).experiment/helpers/{binning,quantize}.py
: Helper python scripts for running binning and quantization experiments.
A Makefile is supplied in the experiment
directory for running the experiments of [1] and creating plots.
Before running any experiments, cd
to experiment
and run
make link
make data
This will link the IB
module and collect the MNIST data set [4].
To run the experiments contained in the main body of [1], use:
make quantize
To run the experiments from the appendix of [1], use:
make binning # Non-quantized/Binning experiments, Appendix A
make bit-width # 4-/32-bit quantization experiments, Appendix C
make quantize-prefit # Quantization with randomized prefitting, Appendix D
make quantize-archs # Quantization of different MNIST architectures, Appendix E
Please note, that the experiments above may run for a long time and require a significant amount of memory. Most experiments run for many iterations and computes/estimates MI between large quantities. Time per iteration may be minutes (synthetic data) or more than 12 hours (convolutional MNIST network).
To create the plots from [1] (corresponding to the above experiments), use:
make plot-quantize # Main body and Appendix B
make plot-binning # Appendix A
make plot-bit-width # Appendix C
make plot-prefit # Appendix D
make plot-archs # Appendix E
make plot-accuracy # Accuracy plots, Appendix F
The plots are made using lualatex and will take a few minutes to compile, as many data points are plotted for each information plane.
Other experiments can be run using the module directly (note, some options are left out here, use --help
for the full list):
python -m IB experiment [-h] [-n NETWORK] [-af ACT_FUNC] [-lr LR] [-e EPOCHS] [-q]
[-b BITS] [-d DATA] [-r REPEATS]
options:
-n NETWORK Network to use.
-af ACT_FUNC Activation function.
-lr LR Learning rate used in training.
-e EPOCHS Number of epochs.
-q Quantize the model (changes default binning strategy!).
-b BITS Number of bits for quantization, if -q set, must be in (4,8).
-d DATA Data for experiment
-r REPEATS Number of experiment repeats
Alternatively, one may use the python helper scripts in experiment/helpers
:
python helpers/binning.py <experiment> [prefit=0] [repeats=50]
python helpers/quantize.py <experiment> <bits> [prefit=0] [repeats=50]
where experiment
is one of SYN-{Tanh,Relu}
or MNIST-{4x10,Conv,Bottleneck-2,Bottleneck-4,HourGlass}
, bits
are the precision of the quantization (4, 8 or 32), and prefit
is the number of randomized prefitting epochs (0 means no prefitting).
Please cite our paper, if you found the code useful in your work/publication:
@inproceedings{
lorenzen2022information,
title={Information Bottleneck: Exact Analysis of (Quantized) Neural Networks}, author={Stephan Sloth Lorenzen and Christian Igel and Mads Nielsen}, booktitle={International Conference on Learning Representations}, year={2022}, url={https://openreview.net/forum?id=kF9DZQQrU0w}
}
[3] Andrew M. Saxe, Yamini Bansal, Joel Dapello, Madhu Advani, Artemy Kolchinsky, & Brendan D. Tracey. On the information bottleneck theory of deep learning. ICLR (2018).
[4] Li Deng. The MNIST database of handwritten digit images for machine learning research. IEEE Signal Processing Magazine (2021).