An implementation of BYOL with DistributedDataParallel (1GPU : 1Process) in pytorch.
This allows scalability to any batch size; as an example a batch size of 4096 is possible using 64 gpus, each with batch size of 64 at a resolution of 224x224x3 in FP32 (see below for FP16 support).
NOTE0: this will not produce SOTA results, but is good for debugging. The authors use a batch size of 4096+ for SOTA.
NOTE1: Setup your github ssh tokens; if you get an authentication issue from the git clone this is most likely it.
> git clone --recursive git+ssh://[email protected]/jramapuram/BYOL.git
# DATADIR is the location of imagenet or anything that works with imagefolder.
> ./docker/run.sh "python main.py --data-dir=$DATADIR \
--batch-size=64 \
--num-replicas=1 \
--epochs=100" 0 # add --debug-step to do a single minibatch
The bash script docker/run.sh
pulls the appropriate docker container.
If you want to setup your own environment use:
environment.yml
(conda) in addition torequirements.txt
(pip)
or just take a look at the Dockerfile in docker/Dockerfile
.
Setup stuff according to the slurm bash script. Then:
> cd slurm && sbatch run.sh
- Start each replica worker pointing to the master using
--distributed-master=
. - Set the total number of replicas appropriately using
--num-replicas=
. - Set each node to have a unique
--distributed-rank=
ranging from[0, num_replicas)
. - Ensure network connectivity between workers. You will get NCCL errors if there are resolution problems here.
- Profit.
For example, with a 2 node setup run the following on the master node:
python main.py \
--epochs=100 \
--data-dir=<YOUR_DATA_DIR> \
--batch-size=128 \ # divides into 64 per node
--convert-to-sync-bn \
--visdom-url=http://MY_VISDOM_URL \ # optional, not providing uses tensorboard
--visdom-port=8097 \ # optional, not providing uses tensorboard
--num-replicas=2 \ # specifies total available nodes, 2 in this example
--distributed-master=127.0.0.1 \
--distributed-port=29301 \
--distributed-rank=0 \ # rank-0 is the master
--uid=byolv00_0
and the following on the child node:
export MASTER=<IP_ADDR_OF_MASTER_ABOVE>
python main.py \
--epochs=100 \
--data-dir=<YOUR_DATA_DIR> \
--batch-size=128 \ # divides into 64 per node
--convert-to-sync-bn \
--visdom-url=http://MY_VISDOM_URL \ # optional, not providing uses tensorboard
--visdom-port=8097 \ # optional, not providing uses tensorboard
--num-replicas=2 \ # specifies total available nodes, 2 in this example
--distributed-master=$MASTER \
--distributed-port=29301 \
--distributed-rank=1 \ # rank-1 is this child, increment for extra nodes
--uid=byolv00_0
Grab imagenet, do standard pre-processing and use --data-dir=${DATA_DIR}
. Note: This SimCLR implementation expects two pytorch imagefolder
locations: train
and test
as opposed to val
in the preprocessor above.
If you have GPUs that works well with FP16, you can try the --half
flag.
This will allow faster training with larger batch sizes (~95 with a 12Gb GPU memory).
If training doesn't work well try chaning the AMP optimization level here.
Try increasing --workers-per-replica
for dataloading or placing your dataset on a drive with larger IOPS.
Optionally, you can also try to use the Nvidia DALI image loading backend by specifying --task=dali_multi_augment_image_folder
. However, the latter is missing the grayscale and gaussian blur augmentations, so model performance might be degraded.
This implementation supports tensorboard and visdom.
Omitting the --visdom-url
and --visdom-port
args defaults to tensorboard (which stores in ./runs
).
Cite the original authors on doing some great work:
@article{DBLP:journals/corr/abs-2006-07733,
author = {Jean{-}Bastien Grill and
Florian Strub and
Florent Altch{\'{e}} and
Corentin Tallec and
Pierre H. Richemond and
Elena Buchatskaya and
Carl Doersch and
Bernardo {\'{A}}vila Pires and
Zhaohan Daniel Guo and
Mohammad Gheshlaghi Azar and
Bilal Piot and
Koray Kavukcuoglu and
R{\'{e}}mi Munos and
Michal Valko},
title = {Bootstrap Your Own Latent: {A} New Approach to Self-Supervised Learning},
journal = {CoRR},
volume = {abs/2006.07733},
year = {2020},
url = {https://arxiv.org/abs/2006.07733},
archivePrefix = {arXiv},
eprint = {2006.07733},
timestamp = {Wed, 17 Jun 2020 14:28:54 +0200},
biburl = {https://dblp.org/rec/journals/corr/abs-2006-07733.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
Like this replication? Buy me a beer.