Find the best values for batch size 🧮 #2232

wasertech · 2022-05-31T15:47:17Z

wasertech
May 31, 2022
Collaborator

I wanted to find the biggest batch size I could fit on my GPUs but none worked above 64 so a made a script to find all the values given a sample count.

Checkout my gist

git clone https://gist.github.com/99518a1b43c1cf3be29c9838eec9b763.git ./batch_utils & \
chmod +x ./batch_utils/compute_batch_sizes.py
python ./batch_utils/compute_batch_sizes.py
How much samples to batch?: 428864
Finding all possible batch sizes for training with 428864 samples.
Possible batch sizes are:
[1, 2, 4, 8, 16, 32, 64, 6701, 13402, 26804, 53608, 107216, 214432, 428864]

In my case with 428 864 samples you can see that the next possible size after 64 is 6 701. This is way too much for my 24Gb/GPUs... so I'm stuck with 64...

Test with your sample size now 🧮

wasertech · 2022-06-10T14:58:19Z

wasertech
Jun 10, 2022
Collaborator Author

Turns out this is not as useful as I planned it to be…

Split sets are divided using given batch sizes and takes into account the amount of computing units at hand so my script computation is incorrect.

Batch size is really a value to bridge our data length onto our available hardware for computing.

If you are still looking to find your ideal batch size for your data/hardware setup, you can try to test for a high batch size (128) and see if you can pass the batch test.

Use :

python -m coqui_stt_training.train \
			--train_cudnn true \
			--alphabet_config_path /mnt/models/alphabet.txt \
			--scorer_path /mnt/lm/kenlm.scorer \
			--feature_cache /mnt/sources/feature_cache \
			--train_files ${all_train_csv} \
			--dev_files ${all_dev_csv} \
			--train_batch_size ${TRAIN_BATCH_SIZE} \
			--dev_batch_size ${DEV_BATCH_SIZE} \
			--n_hidden ${N_HIDDEN} \
			--epochs 1 \
			--learning_rate ${LEARNING_RATE} \
			--dropout_rate ${DROPOUT} \
			--checkpoint_dir /mnt/checkpoints/ \
			--skip_batch_test false

If the test fails with 128, try with 64, 32, 16, 8, 4 or 2.
If you are on CPU try with your number of thread or 1 (--train_cudnn false and --load_cudnn true).

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Find the best values for batch size 🧮 #2232

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Find the best values for batch size 🧮 #2232

wasertech May 31, 2022 Collaborator

Replies: 1 comment

wasertech Jun 10, 2022 Collaborator Author

wasertech
May 31, 2022
Collaborator

wasertech
Jun 10, 2022
Collaborator Author