Fine tuning with different sample rate than the one used to create checkpoints #1723

JRMeyer · 2021-03-08T08:40:04Z

JRMeyer
Mar 8, 2021
Maintainer

>>> fhalamos
[January 20, 2021, 9:42pm]

Hi all.

I have to fine-tune models for various accents of Spanish, English and
French. My plan is to use the checkpoints found here
https://gitlab.com/Jaco-Assistant/deepspeech-polyglot#language-models-and-checkpoints
and then retrain with the audios I have with specific accents. I have
some questions on how to do this properly.

1. When training, do I have to use audios with the same sample rate as
the ones used to generate the checkpoints? For example, my Spanish
audios sample rate are 8khz, but I believe that checkpoints I am
using were generated by training with 16khz audios. Hence, I am
receiving the following message in the Optimization step: slash
'WARNING: sample rate of sample '.../train.wav' ( 8000 ) does not
match FLAGS.audio_sample_rate. This can lead to incorrect results.' slash
And the transcription on the test files, which are also 8khz, are
just an empty strings.

I guess that the way to proceed is to previously transform all my .wav
files to 16hz, but I thought that DeepSpeech was already supporting
training with different samplerates. Is that incorrect? I know that
client.py has that feature enabled, but I am not sure if that is also
working for training.

2. When fine tuning, I should use the slash --load_evaluate last flag right?
So that testing uses the new trained checkpoints, and not the ones
that I am using as a starting point.

3. How do I know how much data do I need for finetuning? For example,
if I use only 1 data point for fine tuning, model performance
actually decreases. I guess this makes sense cause the model might
be overfitting that single data point.

Thanks a lot

Code I am running:

python3 -u DeepSpeech.py slash
--train_files .../train.csv slash
--test_files .../test.csv slash
--train_batch_size 1 slash
--test_batch_size 1 slash
--load_cudnn true slash
--epochs 3 slash
--checkpoint_dir .../DeepSpeech-Ployglot-ES-20201026T155049Z-001/checkpoint/cclmtv slash
--learning_rate 0.0001 slash
--alphabet_config_path ../deepspeech-polyglot/data/alphabet_es.txt
--load_evaluate last

[This is an archived TTS discussion thread from discourse.mozilla.org/t/fine-tuning-with-different-sample-rate-than-the-one-used-to-create-checkpoints]

JRMeyer · 2021-03-08T08:40:07Z

JRMeyer
Mar 8, 2021
Maintainer Author

>>> othiele
[January 20, 2021, 10:14pm]

> When training, do I have to use audios with the same sample rate as
> the ones used to generate the checkpoints?

Ideally yes. Most people just use 16 KHz. But search this forum for '8
KHz' and you'll find some good comments on upsampling, ...

> And the transcription on the test files, which are also 8khz, are just
> an empty strings.

You have a really high learning rate, maybe you are shaping off too
much. Oh, are taking away a layer at all? Doesn't look like it.

> I thought that DeepSpeech was already supporting training with
> different samplerates. Is that incorrect?

In theory no, but read the other comments. Best to stick to just one
system.

> When fine tuning, I should use the slash --load_evaluate last flag right?
> So that testing uses the new trained checkpoints, and not the ones
> that I am using as a starting point.

No, please read about deep learning somewhere. The last checkpoint is
not necessarily the best.

> How do I know how much data do I need for finetuning? For example, if
> I use only 1 data point for fine tuning, model performance actually
> decreases. I guess this makes sense cause the model might be
> overfitting that single data point.

You mean a single file of a couple seconds. This won't lead to anything.
Fine tuning usually requires thousands of chunks. Depending on the task
even more. The models are training with millions, use tens of thousands
to fine tune.

[Archived Post]

0 replies

JRMeyer · 2021-03-08T08:40:10Z

JRMeyer
Mar 8, 2021
Maintainer Author

>>> othiele
[January 20, 2021, 10:17pm]

And you have no dev set. This will lead nowhere. Study some of the
examples or other code here in the forum. And start reading [the
#fine-tuning-same-alphabet).

[Archived Post]

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fine tuning with different sample rate than the one used to create checkpoints #1723

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

Fine tuning with different sample rate than the one used to create checkpoints #1723

JRMeyer Mar 8, 2021 Maintainer

Replies: 2 comments

JRMeyer Mar 8, 2021 Maintainer Author

JRMeyer Mar 8, 2021 Maintainer Author

JRMeyer
Mar 8, 2021
Maintainer

JRMeyer
Mar 8, 2021
Maintainer Author

JRMeyer
Mar 8, 2021
Maintainer Author