This is the v0.3.0 pre-release of speech recognition models prepared for CMU Sphinx-5prealpha
. Currently, the only available language is catalan.
This release includes, acoustic models trained with 240+4 hours of training(+test) acoustic data from TV3, a 3-gram language model created from the corpus of subtitles from TV3 and Opensubtitles, and a phonetic dictionary.
The models are in the directory ca-es and the acoustic data is accessible here.
The acoustic models were trained with 16kHz and mono audios, and trained for continuous speech using 6000 tied states and 32 Gaussian mixture models.
The tests were made with 4 hours of FESTCAT data using sphinxtrain
's decode process and the final WER (word error rate) is 11,68% for this clean audio data set. However this score is reached with an in-domain language model. Currently the models are not ready to be used for generic transcription applications.
For general issues with Sphinx please consult the official CMU Sphinx webpage and forum. For issues concerning the models you can report them here, or contact us by mail [email protected]
A major part of this work has been financed by Softcatalà.