Code of CVPR 2023 paper: Token Contrast for Weakly-Supervised Semantic Segmentation.
We proposed Token Contrast to address the over-smoothing issue and further leverage the virtue of ViT for the Weakly-Supervised Semantic Segmentation task.
VOC dataset
wget http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar
tar –xvf VOCtrainval_11-May-2012.tar
The augmented annotations are from SBD dataset. Here is a download link of the augmented annotations at
DropBox. After downloading SegmentationClassAug.zip
, you should unzip it and move it to VOCdevkit/VOC2012
. The directory sctructure should thus be
VOCdevkit/
└── VOC2012
├── Annotations
├── ImageSets
├── JPEGImages
├── SegmentationClass
├── SegmentationClassAug
└── SegmentationObject
COCO dataset
wget http://images.cocodataset.org/zips/train2014.zip
wget http://images.cocodataset.org/zips/val2014.zip
To generate VOC style segmentation labels for COCO dataset, you could use the scripts provided at this repo. Or, just download the generated masks from Google Drive.
I recommend to organize the images and labels in coco2014
and SegmentationClass
, respectively.
MSCOCO/
├── coco2014
│ ├── train2014
│ └── val2014
└── SegmentationClass
├── train2014
└── val2014
I used docker to build the enviroment.
## build docker
docker bulid -t toco --network=host -< Dockerfile
## activate docker
docker run -it --gpus all --network=host --ipc=host -v $CODE_PATH:/workspace/TOCO -v /$VOC_PATH:/workspace/VOCdevkit -v $COCO_ANNO_PATH:/workspace/MSCOCO -v $COCO_IMG_PATH:/workspace/coco2014 toco:latest /bin/bash
git clone https://github.com/rulixiang/toco.git
cd toco
To use the regularized loss, download and compile the python extension, see Here.
To start training, just run:
## for VOC
CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 --master_port=29501 scripts/dist_train_voc_seg_neg.py --work_dir work_dir_voc
## for COCO
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nproc_per_node=4 --master_port=29501 scripts/dist_train_coco_seg_neg.py --work_dir work_dir_coco
To evaluation:
## for VOC
python tools/infer_seg_voc.py --model_path $model_path --backbone vit_base_patch16_224 --infer val
## for COCO
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nproc_per_node=4 --master_port=29501 tools/infer_seg_voc.py --model_path $model_path --backbone vit_base_patch16_224 --infer val
Here we report the performance on VOC and COCO dataset. MS+CRF
denotes multi-scale test and CRF processing.
Dataset | Backbone | val | Log | Weights | val (with MS+CRF) | test (with MS+CRF) |
---|---|---|---|---|---|---|
VOC | DeiT-B | 68.1 | log | weights | 69.8 | 70.5 |
VOC | ViT-B | 69.2 | log | weights | 71.1 | 72.2 |
COCO | DeiT-B | -- | log | weights | 41.3 | -- |
COCO | ViT-B | -- | log | weights | 42.2 | -- |
Please kindly cite our paper if you find it's helpful in your work.
@inproceedings{ru2023token,
title = {Token Contrast for Weakly-Supervised Semantic Segmentation},
author = {Lixiang Ru and Heliang Zheng and Yibing Zhan and Bo Du}
booktitle = {CVPR},
year = {2023},
}
We mainly use ViT-B and DeiT-B as the backbone, which are based on timm. Also, we use the Regularized Loss. Many thanks to their brilliant works!