add micro batches #148

lukas-blecher · 2022-05-13T11:20:53Z

Proposed in issue #147

lukas-blecher · 2022-05-14T09:34:36Z

Maybe scale all gradients by args.micro_batchsize/args.batchsize because right now the gradients are summed over for each mirco batch, resulting in a larger gradient norm on average.
But I've tested it out on a toy model and this constant factor did not hinder convergence.
This can also be compensated by choosing other betas and initial learning rate in case of adam optimizer.

Still might be better for consistency.
Add

for p in model.parameters():
    p.grad=p.grad*microbatch/args.batchsize

before

LaTeX-OCR/pix2tex/train.py

Line 57 in 4c94f3d

opt.step()

TITC · 2022-05-14T15:03:50Z

As always, learned from your code and comment.

I found some code on some websites and they do not directly average gradient but average the loss before backward propagation.

I think they come up to same result. Although it named gradient descent but in fact it's Directional derivative. Because the direction is fixed when the architecture is determined.

I not sure if this is the correct understanding, I want listening your opinion. @lukas-blecher

lukas-blecher · 2022-05-14T18:06:56Z

Yes, I think you're right it is equivalent.
Scaling the loss would be computationally more efficient

also more consistent validation frequency

add micro batches

4c94f3d

lukas-blecher added the training label May 13, 2022

lukas-blecher linked an issue May 13, 2022 that may be closed by this pull request

The result of retraining is not good #147

Closed

lukas-blecher added 3 commits May 15, 2022 10:02

scale loss according to micro batch size

8c95ceb

keep track of total loss

720978d

better memory check

24342ec

also more consistent validation frequency

lukas-blecher merged commit 6a91f0f into main May 17, 2022

lukas-blecher deleted the micro-batch branch May 17, 2022 08:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add micro batches #148

add micro batches #148

lukas-blecher commented May 13, 2022

lukas-blecher commented May 14, 2022

TITC commented May 14, 2022

lukas-blecher commented May 14, 2022

add micro batches #148

add micro batches #148

Conversation

lukas-blecher commented May 13, 2022

lukas-blecher commented May 14, 2022

TITC commented May 14, 2022

lukas-blecher commented May 14, 2022