Spanish blank inferences #1677

JRMeyer · 2021-03-08T08:29:18Z

JRMeyer
Mar 8, 2021
Maintainer

>>> manuelservex
[December 15, 2020, 3:20pm]

I trained my model with the common voice dataset in Spanish for 10
epochs. The results both in the validation of the training and the
inference that is obtained when executing:

> deepspeech slash --model deepspeech-0.9.1-models.pbmm slash --scorer
> deepspeech-0.9.1-models.scorer slash --audio my_audio_file.wav

They return a blank result:

Example:

WER: 1.000000, CER: 0.864865, loss: 107.391472
- wav: file:///home/manuel_servex/training_audios/audios_entrenamiento/temp/alpha_deepspeech/clips/archivo-1565384151948609.wav
- src: 'qué peleas se agarraban entre ustedes'
- res: ' '
--------------------------------------------------------------------------------
WER: 1.000000, CER: 0.852941, loss: 107.340851
- wav: file:///home/manuel_servex/training_audios/audios_entrenamiento/temp/alpha_deepspeech/clips/common_voice_es_19602468.wav
- src: 'sentí que cada riff estaba escrito'
- res: ' '
--------------------------------------------------------------------------------
WER: 1.000000, CER: 0.852941, loss: 107.299416
- wav: file:///home/manuel_servex/training_audios/audios_entrenamiento/temp/alpha_deepspeech/clips/archivo-1562620039745670.wav
- src: 'oyó a un grupo releyendo geografía'
- res: ' '
--------------------------------------------------------------------------------
WER: 1.000000, CER: 0.810811, loss: 107.287590
- wav: file:///home/manuel_servex/training_audios/audios_entrenamiento/temp/alpha_deepspeech/clips/common_voice_es_19139609.wav
- src: 'en roma estuvo en el colegio de lieja'
- res: ' '
--------------------------------------------------------------------------------
Worst WER:
--------------------------------------------------------------------------------
WER: 1.000000, CER: 1.333333, loss: 21.902287
- wav: file:///home/manuel_servex/training_audios/audios_entrenamiento/temp/alpha_deepspeech/clips/archivo-1557876943950223.wav
- src: 'non'
- res: ' '
--------------------------------------------------------------------------------
WER: 1.000000, CER: 1.000000, loss: 20.615292
- wav: file:///home/manuel_servex/training_audios/audios_entrenamiento/temp/alpha_deepspeech/clips/archivo-156452630840292.wav
- src: 'rossi'
- res: ' '
--------------------------------------------------------------------------------
WER: 1.000000, CER: 1.000000, loss: 20.549049
- wav: file:///home/manuel_servex/training_audios/audios_entrenamiento/temp/alpha_deepspeech/clips/archivo-1556823942669887.wav
- src: 'sisisi'
- res: ' '
--------------------------------------------------------------------------------
WER: 1.000000, CER: 1.000000, loss: 17.611378
- wav: file:///home/manuel_servex/training_audios/audios_entrenamiento/temp/alpha_deepspeech/clips/archivo-1565617749088932.wav
- src: 'enid'
- res: ' '
--------------------------------------------------------------------------------
WER: 1.000000, CER: 0.800000, loss: 17.374151
- wav: file:///home/manuel_servex/training_audios/audios_entrenamiento/temp/alpha_deepspeech/clips/archivo-1556197352940137.wav
- src: 'no no'
- res: ' '
--------------------------------------------------------------------------------

Training execution line:

> CUDA_VISIBLE_DEVICES=0 python3 DeepSpeech.py slash --train_files
> slash ~/training_audios/audios_entrenamiento/temp/deepspeech/clips/train.csv
> slash --dev_files
> slash ~/training_audios/audios_entrenamiento/temp/deepspeech/clips/dev.csv
> slash --test_files
> slash ~/training_audios/audios_entrenamiento/temp/deepspeech/clips/test.csv
> slash --automatic_mixed_precision slash --alphabet_config_path
> slash ~/train_deepspeech/alphabet.txt slash --checkpoint_dir
> slash ~/train_deepspeech/deepspeech/checkpoints slash --export_dir
> slash ~/train_deepspeech/deepspeech/checkpoints/export slash --log_level 0
> slash --epochs 10 slash --limit_test 5000

Number dataset files:

train.csv: 256522 slash
dev.csv: 28611 slash
test.csv: 21574

Alphabet.txt:

> a slash
> á slash
> à slash
> â slash
> ä slash
> b slash
> c slash
> d slash
> e slash
> é slash
> è slash
> ê slash
> ë slash
> f slash
> g slash
> h slash
> i slash
> í slash
> ì slash
> î slash
> ï slash
> j slash
> k slash
> l slash
> m slash
> n slash
> ñ slash
> o slash
> ó slash
> ò slash
> ô slash
> ö slash
> p slash
> q slash
> r slash
> s slash
> t slash
> u slash
> ú slash
> ù slash
> û slash
> ü slash
> v slash
> w slash
> x slash
> y slash
> z slash
> ! slash
> ¡ slash
> ? slash
> ¿ slash
> ´ slash
> ¨ slash
> ' slash
> 'blank space'

Environment:

[This is an archived TTS discussion thread from discourse.mozilla.org/t/spanish-blank-inferences]

JRMeyer · 2021-03-08T08:29:20Z

JRMeyer
Mar 8, 2021
Maintainer Author

>>> othiele
[December 15, 2020, 3:36pm]

Thanks Manuel for a well written post. Blanks can mean that it is not
trained enough, but 10 epochs for that size should produce something.

1. What about dropout and learning rate? Standard droput is not
suitable. Maybe 0.3

2. Did you build your own scorer or is this the English one?

3. Don't limit the test set, reduce it's size if you want to.

4. Reduce the alphabet to just Spanish letters. Maybe even just the
English ones. The more letters, the more training material.

5. Use a batch size for all 3 of 32 or 64. A V100 should be able to
process that.

6. Blank Space ist just a blank in the file, but doesn't show here?

[Archived Post]

0 replies

JRMeyer · 2021-03-08T08:29:23Z

JRMeyer
Mar 8, 2021
Maintainer Author

>>> dan.bmh
[December 16, 2020, 11:25am]

You could also try to use the Spanish checkpoint+scorer from
DeepSpeech-Polyglot
project as basis, and run transfer-learning on top of it, if you want to
keep your alphabet.

[Archived Post]

0 replies

JRMeyer · 2021-03-08T08:29:26Z

JRMeyer
Mar 8, 2021
Maintainer Author

>>> othiele
[December 16, 2020, 12:01pm]

how silly of me not to
mention your models
Great idea. You'll find an alphabet and a working scorer if you want to
train on just your material.

[Archived Post]

0 replies

JRMeyer · 2021-03-08T08:29:28Z

JRMeyer
Mar 8, 2021
Maintainer Author

>>> manuelservex
[January 5, 2021, 7:54pm]

I don't have any scorer
files. Is it really necessary?

I did the workouts with the recommendations you made me and I got these
results:

> CUDA_VISIBLE_DEVICES=0 python3 DeepSpeech.py slash --train_files
> slash ~/training_audios/audios_entrenamiento/temp/servex_common_voice/clips/train.csv
> slash --dev_files
> slash ~/training_audios/audios_entrenamiento/temp/servex_common_voice/clips/dev.csv
> slash --test_files
> slash ~/training_audios/audios_entrenamiento/temp/servex_common_voice/clips/test.csv
> slash --automatic_mixed_precision slash --alphabet_config_path
> slash ~/train_deepspeech/alphabet.txt slash --checkpoint_dir
> slash ~/train_deepspeech/train_04_01_2021/checkpoints slash --export_dir
> slash ~/train_deepspeech/train_04_01_2021/checkpoints/export slash --log_level 0
> slash --epochs 10 slash --dropout_rate 0.3 slash --train_batch_size 64
> slash --dev_batch_size 64 slash --test_batch_size 64 slash --export_batch_size 64

and I get the following results:

--------------------------------------------------------------------------------
Best WER:
--------------------------------------------------------------------------------
WER: 1.000000, CER: 0.857143, loss: 30.235117
- wav: file:///home/manuel_servex/training_audios/audios_entrenamiento/temp/servex_common_voice/clips/common_voice_es_21987879.wav
- src: 'firefox'
- res: 'o'
--------------------------------------------------------------------------------
WER: 1.000000, CER: 0.833333, loss: 23.909798
- wav: file:///home/manuel_servex/training_audios/audios_entrenamiento/temp/servex_common_voice/clips/common_voice_es_21942424.wav
- src: 'cuatro'
- res: 'oo'
--------------------------------------------------------------------------------
WER: 1.000000, CER: 1.000000, loss: 16.605837
- wav: file:///home/manuel_servex/training_audios/audios_entrenamiento/temp/servex_common_voice/clips/common_voice_es_21961351.wav
- src: 'nueve'
- res: ''
--------------------------------------------------------------------------------
WER: 1.000000, CER: 1.000000, loss: 14.875109
- wav: file:///home/manuel_servex/training_audios/audios_entrenamiento/temp/servex_common_voice/clips/common_voice_es_22036789.wav
- src: 'siete'
- res: ''
--------------------------------------------------------------------------------
WER: 1.000000, CER: 1.000000, loss: 14.321535
- wav: file:///home/manuel_servex/training_audios/audios_entrenamiento/temp/servex_common_voice/clips/common_voice_es_21983279.wav
- src: 'cinco'
- res: ''
--------------------------------------------------------------------------------
Median WER:
--------------------------------------------------------------------------------
WER: 1.000000, CER: 1.000000, loss: 14.321535
- wav: file:///home/manuel_servex/training_audios/audios_entrenamiento/temp/servex_common_voice/clips/common_voice_es_21983279.wav
- src: 'cinco'
- res: ''
--------------------------------------------------------------------------------
WER: 1.000000, CER: 1.000000, loss: 13.271736
- wav: file:///home/manuel_servex/training_audios/audios_entrenamiento/temp/servex_common_voice/clips/common_voice_es_22043319.wav
- src: 'tres'
- res: ''
--------------------------------------------------------------------------------
WER: 1.000000, CER: 1.000000, loss: 12.867823
- wav: file:///home/manuel_servex/training_audios/audios_entrenamiento/temp/servex_common_voice/clips/common_voice_es_21886210.wav
- src: 'hey'
- res: ''
--------------------------------------------------------------------------------
WER: 1.000000, CER: 1.000000, loss: 12.829750
- wav: file:///home/manuel_servex/training_audios/audios_entrenamiento/temp/servex_common_voice/clips/common_voice_es_21944989.wav
- src: 'cero'
- res: ''
--------------------------------------------------------------------------------
WER: 1.000000, CER: 1.000000, loss: 11.550548
- wav: file:///home/manuel_servex/training_audios/audios_entrenamiento/temp/servex_common_voice/clips/common_voice_es_21962058.wav
- src: 'seis'
- res: ''
--------------------------------------------------------------------------------
Worst WER:
--------------------------------------------------------------------------------
WER: 1.000000, CER: 1.000000, loss: 11.550548
- wav: file:///home/manuel_servex/training_audios/audios_entrenamiento/temp/servex_common_voice/clips/common_voice_es_21962058.wav
- src: 'seis'
- res: ''
--------------------------------------------------------------------------------
WER: 1.000000, CER: 1.000000, loss: 10.331042
- wav: file:///home/manuel_servex/training_audios/audios_entrenamiento/temp/servex_common_voice/clips/common_voice_es_21944355.wav
- src: 'sí'
- res: ''
--------------------------------------------------------------------------------
WER: 1.000000, CER: 1.000000, loss: 9.695055
- wav: file:///home/manuel_servex/training_audios/audios_entrenamiento/temp/servex_common_voice/clips/common_voice_es_21989345.wav
- src: 'dos'
- res: ''
--------------------------------------------------------------------------------
WER: 1.000000, CER: 1.000000, loss: 9.292615
- wav: file:///home/manuel_servex/training_audios/audios_entrenamiento/temp/servex_common_voice/clips/common_voice_es_21939380.wav
- src: 'uno'
- res: ''
--------------------------------------------------------------------------------
WER: 1.000000, CER: 1.000000, loss: 5.831096
- wav: file:///home/manuel_servex/training_audios/audios_entrenamiento/temp/servex_common_voice/clips/common_voice_es_21939330.wav
- src: 'no'
- res: ''
--------------------------------------------------------------------------------

And in the last trainings it is always the same test files?

how do i build my own scorer?

[Archived Post]

0 replies

JRMeyer · 2021-03-08T08:29:31Z

JRMeyer
Mar 8, 2021
Maintainer Author

>>> othiele
[January 5, 2021, 8:20pm]

1. I am not sure what happens if you don't use a scorer for testing,
kind of defeats the purpose ... Try inferencing without a scorer for
a known/trained chunk.

2. Why did you choose the export-batch of 64. I have never seen that. I
advised for all 3, you used 4. Please try to understand what each
parameter does, read some other posts, ... This is not an end user
product yet

3. What are the loss values at the end of each epoch like for
train/dev? This should give an indication how the training went
along.

4. How many hours is your material or what is the mean? Looks like
really short commands.

[Archived Post]

0 replies

JRMeyer · 2021-03-08T08:29:33Z

JRMeyer
Mar 8, 2021
Maintainer Author

>>> manuelservex
[January 5, 2021, 9:45pm]

Thanks for your answer

1. I did tests for the english model from the english repository in the
scorer and everything worked correctly

2. I chose it for simple tests, because of the results I was getting,
but the results with or without export-batch of 64 are the same.

3.

}

```
Loss:
Train: 13.899992
Dev: 15.962547

4. But less than 350 hours

[Archived Post]

0 replies

JRMeyer · 2021-03-08T08:29:36Z

JRMeyer
Mar 8, 2021
Maintainer Author

>>> othiele
[January 5, 2021, 10:09pm]

Please read the docs carefully and try to understand what the scorer
does. It looks like you did not try the model for just inference without
a scorer as I suggested?

Losses only make sense over time, single data points are not really
helpful.

What is your use case?

[Archived Post]

0 replies

JRMeyer · 2021-03-08T08:29:39Z

JRMeyer
Mar 8, 2021
Maintainer Author

>>> manuelservex
[January 6, 2021, 8:36pm]

slash
I have downloaded the model that you have trained in Spanish from your
repository, how can I test it?

[Archived Post]

0 replies

JRMeyer · 2021-03-08T08:29:41Z

JRMeyer
Mar 8, 2021
Maintainer Author

>>> dan.bmh
[January 8, 2021, 5:58pm]

There are multiple ways to test it. You can either follow the setup
steps in DS-Polyglot, or use the DS testing script directly, which might
be faster if you already have set up DS for training. You can also use
the provided .pbmm and .scorer files for normal inference like you did
in your first post.

[Archived Post]

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spanish blank inferences #1677

{{title}}

Replies: 9 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Spanish blank inferences #1677

JRMeyer Mar 8, 2021 Maintainer

Replies: 9 comments

JRMeyer Mar 8, 2021 Maintainer Author

JRMeyer Mar 8, 2021 Maintainer Author

JRMeyer Mar 8, 2021 Maintainer Author

JRMeyer Mar 8, 2021 Maintainer Author

JRMeyer Mar 8, 2021 Maintainer Author

JRMeyer Mar 8, 2021 Maintainer Author

JRMeyer Mar 8, 2021 Maintainer Author

JRMeyer Mar 8, 2021 Maintainer Author

JRMeyer Mar 8, 2021 Maintainer Author

JRMeyer
Mar 8, 2021
Maintainer

JRMeyer
Mar 8, 2021
Maintainer Author

JRMeyer
Mar 8, 2021
Maintainer Author

JRMeyer
Mar 8, 2021
Maintainer Author

JRMeyer
Mar 8, 2021
Maintainer Author

JRMeyer
Mar 8, 2021
Maintainer Author

JRMeyer
Mar 8, 2021
Maintainer Author

JRMeyer
Mar 8, 2021
Maintainer Author

JRMeyer
Mar 8, 2021
Maintainer Author

JRMeyer
Mar 8, 2021
Maintainer Author