Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

triplet_loss_dataloader.py #4

Open
YoonSeongGyeol opened this issue Oct 27, 2020 · 3 comments
Open

triplet_loss_dataloader.py #4

YoonSeongGyeol opened this issue Oct 27, 2020 · 3 comments

Comments

@YoonSeongGyeol
Copy link

Hello, I'm daniel,
While running your project, one question arose.

In dataloader/triplet_loss_dataloader,
It is a system that generates (pos, neg) class randomly as the number of triplets allocated for each processor, and randomly selects images,
but, When using the function of np.random.choice, I confirmed that the same random value is outputted for each processor.
So I used np.random.RandomState(), and I was able to use a different random value for each processor.

Please let me know if I understand this processor well or not.

Thank you.
Daniel

@tamerthamoqa
Copy link
Owner

tamerthamoqa commented Oct 27, 2020

Hi Daniel,

Thank you very much for catching this one. The intention was only to speed up the triplet generation process and not to re-replicate the generated triplets across the spawned processes, hehe. I have edited the dataloader as you described and the RandomState() object would be initialized with seed=None so every time the seed would be a random number and would then randomly choose the required elements for triplet creation.

To be clear, the current pre-trained model was trained on 10 million generated triplets that were not generated with the multi-processing method.

The reason why I am using the "triplet generation" method is to have some kind of naive reproducibility when changing some training parameters, the intention is to conduct future experiments with a set number of human identities per triplet batch whereby the dataloader would generate and yield a set number of triplets per training iteration instead of a pre-generated list of triplets like with the current version.

However, there are two current issues I am dealing with that you should be aware of before using this project:

1- After some training "epochs", the BatchNorm2D operation would require more VRAM allocation and would cause a CudaOutofMemory Exception. I was training one epoch per day since one epoch was taking around 11 hours on my PC and I would turn off the process when it is done so I would use my PC for other things, so I managed to somehow get the 256 batch size training to work but would cause an OOM if left for several epochs. So I would recommend you use a lower batch size value that would initially allocate around 40-60% of your GPU VRAM.

2- I tried switching to CPU for the iterations that caused the OOM in order to continue training. Unfortunately, switching to CPU has a negative impact on model performance metrics, I still don't know why that is the case so far.

Again, thank you very much for catching the issue.

@YoonSeongGyeol
Copy link
Author

Hello.

Thank you for answering my question.
In my PC gpu, had TITAN 4ea (12GB), so I used multi-gpu (data-parallel), In fact, a network has 256/4=64 batches.
currently, I finished 1-epoch (10,000,000 triplet num data) approximately 3-hours.

There is no problem at present, and the slightly different point is that the performance is low, but most of them use torch.cuda.empty_cache () to avoid OOM.
Now, Training without any problems.

@AGenchev
Copy link
Contributor

AGenchev commented Jan 23, 2021

We may work on this as well. I noticed that the triplet generation is not a very fast process. Probably data-frames are not that fast for this kind of usage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants