-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
triplet_loss_dataloader.py #4
Comments
Hi Daniel, Thank you very much for catching this one. The intention was only to speed up the triplet generation process and not to re-replicate the generated triplets across the spawned processes, hehe. I have edited the dataloader as you described and the RandomState() object would be initialized with seed=None so every time the seed would be a random number and would then randomly choose the required elements for triplet creation. To be clear, the current pre-trained model was trained on 10 million generated triplets that were not generated with the multi-processing method. The reason why I am using the "triplet generation" method is to have some kind of naive reproducibility when changing some training parameters, the intention is to conduct future experiments with a set number of human identities per triplet batch whereby the dataloader would generate and yield a set number of triplets per training iteration instead of a pre-generated list of triplets like with the current version. However, there are two current issues I am dealing with that you should be aware of before using this project: 1- After some training "epochs", the BatchNorm2D operation would require more VRAM allocation and would cause a CudaOutofMemory Exception. I was training one epoch per day since one epoch was taking around 11 hours on my PC and I would turn off the process when it is done so I would use my PC for other things, so I managed to somehow get the 256 batch size training to work but would cause an OOM if left for several epochs. So I would recommend you use a lower batch size value that would initially allocate around 40-60% of your GPU VRAM. 2- I tried switching to CPU for the iterations that caused the OOM in order to continue training. Unfortunately, switching to CPU has a negative impact on model performance metrics, I still don't know why that is the case so far. Again, thank you very much for catching the issue. |
Hello. Thank you for answering my question. There is no problem at present, and the slightly different point is that the performance is low, but most of them use torch.cuda.empty_cache () to avoid OOM. |
We may work on this as well. I noticed that the triplet generation is not a very fast process. Probably data-frames are not that fast for this kind of usage. |
Hello, I'm daniel,
While running your project, one question arose.
In dataloader/triplet_loss_dataloader,
It is a system that generates (pos, neg) class randomly as the number of triplets allocated for each processor, and randomly selects images,
but, When using the function of np.random.choice, I confirmed that the same random value is outputted for each processor.
So I used np.random.RandomState(), and I was able to use a different random value for each processor.
Please let me know if I understand this processor well or not.
Thank you.
Daniel
The text was updated successfully, but these errors were encountered: