torch.utils.data.DataLoader with BalancedBatchSampler results into higher amount of batches than intented? #3

candasunal · 2022-05-29T19:03:03Z

Hello Federico,

First of all thank you very much for this repo, it seems like your solution is just the one I needed.

I wanted my batches have the equal amount of samples from each class (in this case 10 samples from each MNIST class). When I wanted to use it in torch.utils.data.DataLoader as sampler argument with the batch size as 100, the result has larger size than it is supposed to be.

For example, the code below creates 675 trainloader items (this term might be wrong) instead of 600:

batch_size = 100
train_MNIST = datasets.MNIST('./content/MNIST_DATA/train/', 
                             train = True, 
                             transform = transforms.ToTensor(), 
                             download = True)

trainloader = torch.utils.data.DataLoader(train_MNIST, 
                                          sampler=BalancedBatchSampler(train_MNIST), 
                                          batch_size = batch_size)

I attach the result I see from spyder IDE.

Am I missing something, shouldn't it be 600 instead of 675?

Thank you in advance.

The text was updated successfully, but these errors were encountered:

uzair789 · 2022-08-20T16:16:35Z

HI, Did you manage to figure this out? I have the same issue. Thank you

uzair789 · 2022-08-20T22:57:17Z

I figured it out. Its because of the imbalance in your dataset. The while loop will keep running as long as the count is less than the balanced_max value. Hence if your balance_max (which is the number of samples from your largest class) is very large and the other class counts are very less, then in order to cover all the samples from the largest class, additional batches will be created.

candasunal · 2022-08-25T05:25:40Z

Hello Uzair,

I used the conventional MNIST dataset for this and that's not unbalanced.

After I couldn't find the solution for that I moved on to something else, but will take another look at the implementation that I did along with your comment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

torch.utils.data.DataLoader with BalancedBatchSampler results into higher amount of batches than intented? #3

torch.utils.data.DataLoader with BalancedBatchSampler results into higher amount of batches than intented? #3

candasunal commented May 29, 2022 •

edited

Loading

uzair789 commented Aug 20, 2022

uzair789 commented Aug 20, 2022

candasunal commented Aug 25, 2022

torch.utils.data.DataLoader with BalancedBatchSampler results into higher amount of batches than intented? #3

torch.utils.data.DataLoader with BalancedBatchSampler results into higher amount of batches than intented? #3

Comments

candasunal commented May 29, 2022 • edited Loading

uzair789 commented Aug 20, 2022

uzair789 commented Aug 20, 2022

candasunal commented Aug 25, 2022

candasunal commented May 29, 2022 •

edited

Loading