Changelog

After 5 months we release a new model with a lot of improvements!

Don't use anymore the wikipedia dump as text corpus but Mitads (check at https://github.com/MozillaItalia/DeepSpeech-Italian-Model/releases/tag/Mitads-1.0.0-alpha2)
DeepSpeech 0.8 based
New docker scripts
- Sets of .env files to try different parameters https://github.com/MozillaItalia/DeepSpeech-Italian-Model/tree/master/DeepSpeech/env_files
In the meantime we released also the notebooks!
We are using the Mozilla official DS model for English for transfer learning to improve the quality

Second version of Italian model, trained with:

~130 hours for Common Voice IT dataset
~127 hours of m-ailabs Italian dataset
total: ~257h

Available in 2 version transfer used transfer learning form the official English model release by mozilla and other one is from scratch .

model hyper-parameters:

batch_size=64
n_hidden=2048
epochs=30
learning_rate=0.0001
dropout=0.4
lm_alpha=0
lm_beta=0
es_epochs=10
early_stop=1
amp=1

For transfer learning model:

amp=0
drop_source_layer=1

Check the readme about the usage

Thanks

This release wasn't possible without the huge work of @nefastosaturo on the docker and DS side other than generating the new model.

Me (@Mte90) worked on the project management side about the model and @astrastefania with the help for the server offered by the Turin university we were able to do everything.

License

CC0 as public domain.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2020.08.07

Changelog

Second version of Italian model, trained with:

model hyper-parameters:

Thanks

License