Codes for [NeurIPS'21] Chasing Sparsity in Vision Transformers: An End-to-End Exploration.
Tianlong Chen, Yu Cheng, Zhe Gan, Lu Yuan, Lei Zhang, Zhangyang Wang
Extensive results on ImageNet with diverse ViT backbones validate the effectiveness of our proposals which obtain significantly reduced computational cost and almost unimpaired generalization. Perhaps most surprisingly, we find that the proposed sparse (co-)training can even improve the ViT accuracy rather than compromising it, making sparsity a tantalizing “free lunch”. For example, our sparsified DeiT-Small at (5%, 50%) sparsity for (data, architecture), improves 0.28% top-1 accuracy, and meanwhile enjoys 49.32% FLOPs and 4.40% running time savings.
conda create -n vit python=3.6
pip install torch==1.7.1+cu101 torchvision==0.8.2+cu101 torchaudio==0.7.2 -f https://download.pytorch.org/whl/torch_stable.html
pip install tqdm scipy timm
git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
pip install -v --disable-pip-version-check --no-cache-dir ./
Command for unstructured sparsity, i.e., SViTE.
- SViTE-Small
bash cmd/ vm/0426/vm1.sh 0,1,2,3,4,5,6,7
Details
CUDA_VISIBLE_DEVICES=$1 \
python -m torch.distributed.launch \
--nproc_per_node=8 \
--use_env main.py \
--model deit_small_patch16_224 \
--epochs 600 \
--batch-size 64 \
--data-path ../../imagenet \
--output_dir ./small_dst_uns_0426_vm1 \
--dist_url tcp://127.0.0.1:23305 \
--sparse_init fixed_ERK \
--density 0.4 \
--update_frequency 15000 \
--growth gradient \
--death magnitude \
--redistribution none
- SViTE-Base
bash cmd/ vm/0426/vm3.sh 0,1,2,3,4,5,6,7
Details
CUDA_VISIBLE_DEVICES=$1 \
python -m torch.distributed.launch \
--nproc_per_node=8 \
--use_env main.py \
--model deit_base_patch16_224 \
--epochs 600 \
--batch-size 128 \
--data-path ../../imagenet \
--output_dir ./base_dst_uns_0426_vm3 \
--dist_url tcp://127.0.0.1:23305 \
--sparse_init fixed_ERK \
--density 0.4 \
--update_frequency 7000 \
--growth gradient \
--death magnitude \
--redistribution none
Remark. More commands can be found under the "cmd" folder.
@misc{chen2021chasing,
title={Chasing Sparsity in Vision Transformers:An End-to-End Exploration},
author={Tianlong Chen and Yu Cheng and Zhe Gan and Lu Yuan and Lei Zhang and Zhangyang Wang},
year={2021},
eprint={2106.04533},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
ViT : https://github.com/jeonsworld/ViT-pytorch
ViT : https://github.com/google-research/vision_transformer