PyTorch implementation for BiCro: Noisy Correspondence Rectification for Multi-modality Data via Bi-directional Cross-modal Similarity Consistency (CVPR 2023).
If you have any questions, feel free to contact [email protected]
- Python 3.7
- PyTorch ~1.7.1
- numpy
- scikit-learn
- Punkt Sentence Tokenizer:
import nltk
nltk.download()
> d punkt
We follow SCAN to obtain image features and vocabularies.
We use a subset of Conceptual Captions (CC), named CC152K. CC152K contains training 150,000 samples from the CC training split, 1,000 validation samples and 1,000 testing samples from the CC validation split. We follow the pre-processing step in SCAN to obtain the image features and vocabularies.
Modify some necessary parameters and run it.
For Flickr30K:
sh train_f30k.sh
For MSCOCO:
sh train_coco.sh
For CC152K:
sh train_cc152k.sh
The pre-trained models are available here:
F30K 20% noise model Download
F30K 40% noise model Download
F30K 60% noise model Download
If BiCro is useful for your research, please cite the following paper:
@inproceedings{BiCro2023,
author = {Shuo Yang, xu Zhao Pan, Kai Wang, Yang You, Hongxun Yao, Tongliang Liu, Min Xu},
title = {BiCro: Noisy Correspondence Rectification for Multi-modality Data via Bi-directional Cross-modal Similarity Consistency},
year = {2023},
booktitle = {CVPR},
}
The code is based on NCR, SGRAF, and SCAN licensed under Apache 2.0.