Replies: 2 comments
-
>>> othiele |
Beta Was this translation helpful? Give feedback.
-
>>> othiele |
Beta Was this translation helpful? Give feedback.
-
>>> othiele |
Beta Was this translation helpful? Give feedback.
-
>>> othiele |
Beta Was this translation helpful? Give feedback.
-
>>> fhalamos
[January 20, 2021, 9:42pm]
Hi all.
I have to fine-tune models for various accents of Spanish, English and
French. My plan is to use the checkpoints found here
https://gitlab.com/Jaco-Assistant/deepspeech-polyglot#language-models-and-checkpoints
and then retrain with the audios I have with specific accents. I have
some questions on how to do this properly.
1. When training, do I have to use audios with the same sample rate as
the ones used to generate the checkpoints? For example, my Spanish
audios sample rate are 8khz, but I believe that checkpoints I am
using were generated by training with 16khz audios. Hence, I am
receiving the following message in the Optimization step: slash
'WARNING: sample rate of sample '.../train.wav' ( 8000 ) does not
match FLAGS.audio_sample_rate. This can lead to incorrect results.' slash
And the transcription on the test files, which are also 8khz, are
just an empty strings.
I guess that the way to proceed is to previously transform all my .wav
files to 16hz, but I thought that DeepSpeech was already supporting
training with different samplerates. Is that incorrect? I know that
client.py has that feature enabled, but I am not sure if that is also
working for training.
2. When fine tuning, I should use the slash --load_evaluate last flag right?
So that testing uses the new trained checkpoints, and not the ones
that I am using as a starting point.
3. How do I know how much data do I need for finetuning? For example,
if I use only 1 data point for fine tuning, model performance
actually decreases. I guess this makes sense cause the model might
be overfitting that single data point.
Thanks a lot
Code I am running:
python3 -u DeepSpeech.py slash
--train_files .../train.csv slash
--test_files .../test.csv slash
--train_batch_size 1 slash
--test_batch_size 1 slash
--load_cudnn true slash
--epochs 3 slash
--checkpoint_dir .../DeepSpeech-Ployglot-ES-20201026T155049Z-001/checkpoint/cclmtv slash
--learning_rate 0.0001 slash
--alphabet_config_path ../deepspeech-polyglot/data/alphabet_es.txt
--load_evaluate last
[This is an archived TTS discussion thread from discourse.mozilla.org/t/fine-tuning-with-different-sample-rate-than-the-one-used-to-create-checkpoints]
Beta Was this translation helpful? Give feedback.
All reactions