Pre-trained model - how are they created? #2060
-
Hello, The docs https://stt.readthedocs.io/en/latest/AUGMENTATION.html mention various augmentation methods. I was wondering if the pre-trained models for download were created using these methods or they are straight from Librispeech and Common Voice unmodified? I get OK results on my audio when I have very good recording but it quickly degrade when I use real-life recordings of meetings / phone calls as the audio has noise (like AC) and other "quality" (like roomy sounding, not full tonality as not close enough to mic, etc.) to it. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
@JRMeyer ping |
Beta Was this translation helpful? Give feedback.
-
Hi @etlweather, there was quite a bit of augmentation used in training the v1.0 English model. The kind of augmentation was a simple variation on SpecAugment [1]. First 50 Epochs [1]
Second 50 Epochs [2]
Final 50 Epochs [3]
|
Beta Was this translation helpful? Give feedback.
Hi @etlweather,
there was quite a bit of augmentation used in training the v1.0 English model. The kind of augmentation was a simple variation on SpecAugment [1].
First 50 Epochs [1]
Second 50 Epochs [2]
Final 50 Epochs [3]