Optimal hardware setup (GPUs) #2296

SuperKogito · 2022-09-14T10:21:35Z

SuperKogito
Sep 14, 2022

I am trying to speed up my training process. Current setup: 1 GPU RTX 2080 8Gb (12 CPUs server + 32 gb ram) and I am training using over 1k hours of data. Unfortunately, the process is very slow; it takes almost 1 day for only 1 epoch and I can only use a batch size of 1 (over 4Gb are used in the GPU ram). My questions here are:

Why does the training occupy so much space in the GPU ram; isn't it supposed to load the model weights and the voice file and that's it? shouldn't I be able to use larger batches?
What hardware / setup can you recommend to speed the process? (I have seen people saying that 2 GPUs will help but not necessarily 4).
Any other recommendations on how to speed things up ?

I appreciate all the advice and details of your experience.
Thank you in advance :))

wasertech · 2022-09-14T16:42:13Z

wasertech
Sep 14, 2022
Collaborator

I am trying to speed up my training process.

You need more VRAM per GPU to fit bigger batch sizes.

I have seen people saying that 2 GPUs will help

Adding another 8 Gb GPU won't really help.

Why does the training occupy so much space in the GPU ram; isn't it supposed to load the model weights and the voice file and that's it?

You need to fit 16 or 32 batched samples 3 times on each GPU so try to aim for 24 Gb for the biggest batch sizes (64, 128) but 12 Gb or 11 Gb might just cut it for you.

What hardware / setup can you recommend to speed the process?

Only then you can think about adding more GPUs of the same size to make training even faster.

shouldn't I be able to use larger batches?

Try to limit the length of each sample (10s < 20s). This is the biggest factor in the batch size computation so use it carefully.

Any other recommendations on how to speed things up ?

Use automatic mixed precision (AMP) to enable bigger batch sizes and make the training faster but use it only for exploring with different parameters. Once you are happy with your tests, you can make a final training without AMP to export your final model.

17 replies

This comment has been hidden.

Sign in to view

HarikalarKutusu Sep 20, 2022

@HarikalarKutusu I'll go with my calc on this one 😅

Please select gigabytes, not Gigabits 🔢, you are multiplying with 8...

@SuperKogito, if you will be using Coqui STT, you need 16 KHz mono WAV files. For those who like calculators, here is one:
https://www.colincrawley.com/audio-file-size-calculator/

The formula is (results is in BYTES):
file_size = sample_rate * (bit_depth / 8) * channels * duration

As it gives 625/640, probably you were checking on a stereo file?

Perhaps use GB instead of Gb on GPU memory sizes etc?

SuperKogito Sep 20, 2022
Author

I actually used the same calculator but I might have not gave the right inputs. However, it still made my point that the size of the audios is actually small compared to the occupied memory on the VRAM.

FivomFive Sep 20, 2022

@wasertech it is more convenient for you to use more batches on a single with a lot of memory than multiple gpus. As I understand multi gpu usage correctly one will also get a bigger effective batchsize but with the overhead of aggregating parameter updates. Aggregation step should be much lower than the time spend to evaluate each batch.
I'm using a batchsize of 128 on each gpu which should produce highest compute time but using 4 gpus I get a low gpu utilization and bad scaling as described in #2258

You seem to be more into it can you explain what is the problem with using multiple gpus instead of one with larger memory?

wasertech Sep 20, 2022
Collaborator

@FivomFive I don't have 4 GPUs so I can really test anything. My best guess, is that it's expected. Adding more GPU doesn't scale well. At one point it's better to upgrade your GPUs with more memory rather than to add another. You already are reaching the limits of the hardware, so I can't really do much for you. 🤷

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimal hardware setup (GPUs) #2296

{{title}}

Replies: 1 comment 17 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

This comment has been hidden.

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

Optimal hardware setup (GPUs) #2296

SuperKogito Sep 14, 2022

Replies: 1 comment · 17 replies

wasertech Sep 14, 2022 Collaborator

This comment has been hidden.

HarikalarKutusu Sep 20, 2022

SuperKogito Sep 20, 2022 Author

FivomFive Sep 20, 2022

wasertech Sep 20, 2022 Collaborator

SuperKogito
Sep 14, 2022

Replies: 1 comment 17 replies

wasertech
Sep 14, 2022
Collaborator

SuperKogito Sep 20, 2022
Author

wasertech Sep 20, 2022
Collaborator