-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Params for Training en-indic model #23
Comments
hello, We use the following command for the en-indic training.
^ for results in our paper, we ensured the effective batch size (max_tokens * distributed_world_size * update_freq) = ~64K. We haven't tried training 4x model only for en-hi |
@gowtham1997 - thanks for sharing the params. any specific reason for this ? |
Sorry, I missed replying to this yesterday. We observed that larger effective batch sizes utilized the GPUs fully and also showed better results in our initial experiments and hence, we chose ~64K. Effective batch sizes > 64K would also help but with time constraints in mind, we choose to use ~64K for our paper. |
ohk.. got it. thank you for the info. |
We are trying to replicate the results from samantar indictrans paper. We are training the model for only en-hi translations. We are currently using these params following the paper :
fairseq-train ../en_hi_4x/final_bin --max-source-positions=210 --max-target-positions=210 --save-interval-updates=10000 --arch=transformer_4x --criterion=label_smoothed_cross_entropy --source-lang=SRC --lr-scheduler=inverse_sqrt --target-lang=TGT --label-smoothing=0.1 --optimizer adam --adam-betas '(0.9, 0.98)' --clip-norm 1.0 --warmup-init-lr 1e-07 --lr 0.0005 --warmup-updates 4000 --dropout 0.2 --save-dir ../en_hi_4x/model --keep-last-epochs 5 --patience 5 --skip-invalid-size-inputs-valid-test --fp16 --user-dir model_configs --wandb-project 'train_1' --max-tokens 300"
Can you please share the params you have used for training the en-indic model or specifically if you have tried en-hi separately?
The text was updated successfully, but these errors were encountered: