Replies: 9 comments
-
>>> othiele |
Beta Was this translation helpful? Give feedback.
-
>>> dan.bmh |
Beta Was this translation helpful? Give feedback.
-
>>> othiele |
Beta Was this translation helpful? Give feedback.
-
>>> manuelservex |
Beta Was this translation helpful? Give feedback.
-
>>> othiele |
Beta Was this translation helpful? Give feedback.
-
>>> manuelservex |
Beta Was this translation helpful? Give feedback.
-
>>> othiele |
Beta Was this translation helpful? Give feedback.
-
>>> manuelservex |
Beta Was this translation helpful? Give feedback.
-
>>> dan.bmh |
Beta Was this translation helpful? Give feedback.
-
>>> manuelservex
[December 15, 2020, 3:20pm]
I trained my model with the common voice dataset in Spanish for 10
epochs. The results both in the validation of the training and the
inference that is obtained when executing:
> deepspeech slash --model deepspeech-0.9.1-models.pbmm slash --scorer
> deepspeech-0.9.1-models.scorer slash --audio my_audio_file.wav
They return a blank result:
Example:
WER: 1.000000, CER: 0.864865, loss: 107.391472
- wav: file:///home/manuel_servex/training_audios/audios_entrenamiento/temp/alpha_deepspeech/clips/archivo-1565384151948609.wav
- src: 'qué peleas se agarraban entre ustedes'
- res: ' '
--------------------------------------------------------------------------------
WER: 1.000000, CER: 0.852941, loss: 107.340851
- wav: file:///home/manuel_servex/training_audios/audios_entrenamiento/temp/alpha_deepspeech/clips/common_voice_es_19602468.wav
- src: 'sentí que cada riff estaba escrito'
- res: ' '
--------------------------------------------------------------------------------
WER: 1.000000, CER: 0.852941, loss: 107.299416
- wav: file:///home/manuel_servex/training_audios/audios_entrenamiento/temp/alpha_deepspeech/clips/archivo-1562620039745670.wav
- src: 'oyó a un grupo releyendo geografía'
- res: ' '
--------------------------------------------------------------------------------
WER: 1.000000, CER: 0.810811, loss: 107.287590
- wav: file:///home/manuel_servex/training_audios/audios_entrenamiento/temp/alpha_deepspeech/clips/common_voice_es_19139609.wav
- src: 'en roma estuvo en el colegio de lieja'
- res: ' '
--------------------------------------------------------------------------------
Worst WER:
--------------------------------------------------------------------------------
WER: 1.000000, CER: 1.333333, loss: 21.902287
- wav: file:///home/manuel_servex/training_audios/audios_entrenamiento/temp/alpha_deepspeech/clips/archivo-1557876943950223.wav
- src: 'non'
- res: ' '
--------------------------------------------------------------------------------
WER: 1.000000, CER: 1.000000, loss: 20.615292
- wav: file:///home/manuel_servex/training_audios/audios_entrenamiento/temp/alpha_deepspeech/clips/archivo-156452630840292.wav
- src: 'rossi'
- res: ' '
--------------------------------------------------------------------------------
WER: 1.000000, CER: 1.000000, loss: 20.549049
- wav: file:///home/manuel_servex/training_audios/audios_entrenamiento/temp/alpha_deepspeech/clips/archivo-1556823942669887.wav
- src: 'sisisi'
- res: ' '
--------------------------------------------------------------------------------
WER: 1.000000, CER: 1.000000, loss: 17.611378
- wav: file:///home/manuel_servex/training_audios/audios_entrenamiento/temp/alpha_deepspeech/clips/archivo-1565617749088932.wav
- src: 'enid'
- res: ' '
--------------------------------------------------------------------------------
WER: 1.000000, CER: 0.800000, loss: 17.374151
- wav: file:///home/manuel_servex/training_audios/audios_entrenamiento/temp/alpha_deepspeech/clips/archivo-1556197352940137.wav
- src: 'no no'
- res: ' '
--------------------------------------------------------------------------------
Training execution line:
> CUDA_VISIBLE_DEVICES=0 python3 DeepSpeech.py slash --train_files
> slash ~/training_audios/audios_entrenamiento/temp/deepspeech/clips/train.csv
> slash --dev_files
> slash ~/training_audios/audios_entrenamiento/temp/deepspeech/clips/dev.csv
> slash --test_files
> slash ~/training_audios/audios_entrenamiento/temp/deepspeech/clips/test.csv
> slash --automatic_mixed_precision slash --alphabet_config_path
> slash ~/train_deepspeech/alphabet.txt slash --checkpoint_dir
> slash ~/train_deepspeech/deepspeech/checkpoints slash --export_dir
> slash ~/train_deepspeech/deepspeech/checkpoints/export slash --log_level 0
> slash --epochs 10 slash --limit_test 5000
Number dataset files:
train.csv: 256522 slash
dev.csv: 28611 slash
test.csv: 21574
Alphabet.txt:
> a slash
> á slash
> à slash
> â slash
> ä slash
> b slash
> c slash
> d slash
> e slash
> é slash
> è slash
> ê slash
> ë slash
> f slash
> g slash
> h slash
> i slash
> í slash
> ì slash
> î slash
> ï slash
> j slash
> k slash
> l slash
> m slash
> n slash
> ñ slash
> o slash
> ó slash
> ò slash
> ô slash
> ö slash
> p slash
> q slash
> r slash
> s slash
> t slash
> u slash
> ú slash
> ù slash
> û slash
> ü slash
> v slash
> w slash
> x slash
> y slash
> z slash
> ! slash
> ¡ slash
> ? slash
> ¿ slash
> ´ slash
> ¨ slash
> ' slash
> 'blank space'
Environment:
[This is an archived TTS discussion thread from discourse.mozilla.org/t/spanish-blank-inferences]
Beta Was this translation helpful? Give feedback.
All reactions