triplet_loss_dataloader.py #4

YoonSeongGyeol · 2020-10-27T04:34:58Z

Hello, I'm daniel,
While running your project, one question arose.

In dataloader/triplet_loss_dataloader,
It is a system that generates (pos, neg) class randomly as the number of triplets allocated for each processor, and randomly selects images,
but, When using the function of np.random.choice, I confirmed that the same random value is outputted for each processor.
So I used np.random.RandomState(), and I was able to use a different random value for each processor.

Please let me know if I understand this processor well or not.

Thank you.
Daniel

tamerthamoqa · 2020-10-27T05:53:54Z

Hi Daniel,

Thank you very much for catching this one. The intention was only to speed up the triplet generation process and not to re-replicate the generated triplets across the spawned processes, hehe. I have edited the dataloader as you described and the RandomState() object would be initialized with seed=None so every time the seed would be a random number and would then randomly choose the required elements for triplet creation.

To be clear, the current pre-trained model was trained on 10 million generated triplets that were not generated with the multi-processing method.

The reason why I am using the "triplet generation" method is to have some kind of naive reproducibility when changing some training parameters, the intention is to conduct future experiments with a set number of human identities per triplet batch whereby the dataloader would generate and yield a set number of triplets per training iteration instead of a pre-generated list of triplets like with the current version.

However, there are two current issues I am dealing with that you should be aware of before using this project:

1- After some training "epochs", the BatchNorm2D operation would require more VRAM allocation and would cause a CudaOutofMemory Exception. I was training one epoch per day since one epoch was taking around 11 hours on my PC and I would turn off the process when it is done so I would use my PC for other things, so I managed to somehow get the 256 batch size training to work but would cause an OOM if left for several epochs. So I would recommend you use a lower batch size value that would initially allocate around 40-60% of your GPU VRAM.

2- I tried switching to CPU for the iterations that caused the OOM in order to continue training. Unfortunately, switching to CPU has a negative impact on model performance metrics, I still don't know why that is the case so far.

Again, thank you very much for catching the issue.

YoonSeongGyeol · 2020-10-27T09:34:26Z

Hello.

Thank you for answering my question.
In my PC gpu, had TITAN 4ea (12GB), so I used multi-gpu (data-parallel), In fact, a network has 256/4=64 batches.
currently, I finished 1-epoch (10,000,000 triplet num data) approximately 3-hours.

There is no problem at present, and the slightly different point is that the performance is low, but most of them use torch.cuda.empty_cache () to avoid OOM.
Now, Training without any problems.

AGenchev · 2021-01-23T20:44:58Z

We may work on this as well. I noticed that the triplet generation is not a very fast process. Probably data-frames are not that fast for this kind of usage.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

triplet_loss_dataloader.py #4

triplet_loss_dataloader.py #4

YoonSeongGyeol commented Oct 27, 2020

tamerthamoqa commented Oct 27, 2020 •

edited

Loading

YoonSeongGyeol commented Oct 27, 2020

AGenchev commented Jan 23, 2021 •

edited

Loading

triplet_loss_dataloader.py #4

triplet_loss_dataloader.py #4

Comments

YoonSeongGyeol commented Oct 27, 2020

tamerthamoqa commented Oct 27, 2020 • edited Loading

YoonSeongGyeol commented Oct 27, 2020

AGenchev commented Jan 23, 2021 • edited Loading

tamerthamoqa commented Oct 27, 2020 •

edited

Loading

AGenchev commented Jan 23, 2021 •

edited

Loading