Skip to content

Modèle Français 0.5.2

Pre-release
Pre-release
Compare
Choose a tag to compare
@lissyx lissyx released this 26 Aug 14:34
· 21 commits to master since this release
99d2b70

Jeux de données :

  • Lingua Libre (~40h)
  • Common Voice FR (v2) (~490h, en autorisant jusqu'à 32 duplicatas)
  • Training Speech (~180h)
  • African Accented French (~15h)
  • M-AILABS French (~315h)

Total : ~1040h

Paramètres :

  • EPOCHS=30
  • LEARNING_RATE=0.0001
  • DROPOUT=0.3
  • BATCH_SIZE=64
  • LM_ALPHA=0.7203202402564637
  • LM_BETA=1.5747698919871918

Language Model : dump wikipedia + dump débats assemblée nationale.

Fonctionne avec DeepSpeech v0.7, v0.8, v0.9.

Correction du packaging de kenlm.scorer
Correction des valeurs par défaut de alpha/beta dans kenlm.scorer

Résultats test set:

Test on /mnt/extracted/data/African_Accented_French/African_Accented_French/African_Accented_French_test.csv - WER: 0.442362, CER: 0.235577, loss: 42.941334
Test on /mnt/extracted/data/M-AILABS/fr_FR/fr_FR_test.csv - WER: 0.092794, CER: 0.026505, loss: 11.276774
Test on /mnt/extracted/data/trainingspeech/ts_2019-04-11_fr_FR_test.csv - WER: 0.200373, CER: 0.059958, loss: 16.225618
Test on /mnt/extracted/data/cv-fr/clips/test.csv - WER: 0.300508, CER: 0.147202, loss: 39.204407
Test on /mnt/extracted/data/lingualibre/lingua_libre_Q21-fra-French_test.csv - WER: 0.577170, CER: 0.171211, loss: 6.977585