This repository has been archived by the owner on Mar 8, 2023. It is now read-only.
2020.08.07
Changelog
After 5 months we release a new model with a lot of improvements!
- Don't use anymore the wikipedia dump as text corpus but Mitads (check at https://github.com/MozillaItalia/DeepSpeech-Italian-Model/releases/tag/Mitads-1.0.0-alpha2)
- DeepSpeech 0.8 based
- New docker scripts
- Sets of
.env
files to try different parameters https://github.com/MozillaItalia/DeepSpeech-Italian-Model/tree/master/DeepSpeech/env_files
- Sets of
- In the meantime we released also the notebooks!
- We are using the Mozilla official DS model for English for transfer learning to improve the quality
Second version of Italian model, trained with:
- ~130 hours for Common Voice IT dataset
- ~127 hours of m-ailabs Italian dataset
total: ~257h
Available in 2 version transfer
used transfer learning form the official English model release by mozilla and other one is from scratch .
model hyper-parameters:
- batch_size=64
- n_hidden=2048
- epochs=30
- learning_rate=0.0001
- dropout=0.4
- lm_alpha=0
- lm_beta=0
- es_epochs=10
- early_stop=1
- amp=1
For transfer learning model:
- amp=0
- drop_source_layer=1
Check the readme about the usage
Thanks
This release wasn't possible without the huge work of @nefastosaturo on the docker and DS side other than generating the new model.
Me (@Mte90) worked on the project management side about the model and @astrastefania with the help for the server offered by the Turin university we were able to do everything.
License
CC0 as public domain.