First complete successful training #2
wasertech
started this conversation in
Show and tell
Replies: 1 comment
-
You don't have to imagine anymore, here is a model trained on 2,2K hours of good quality French audio. https://huggingface.co/bofenghuang/asr-wav2vec2-ctc-french
I've already made a draft PR for Listen that uses Wav2Vec 2.0. https://gitlab.com/waser-technologies/technologies/listen/-/merge_requests/2 Here is a demo of this model working pretty well in really time on open domain transcription. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I have successfully trained a small model using only 1K hours of audio.
You can find its model card on the hub:
https://huggingface.co/wasertech/wav2vec2-cv-fr-9
You can also test it in its dedicated space:
https://huggingface.co/spaces/wasertech/French_Wav2Vec2_ASR
The goal was originally to train on all my data (+2,5K hours) but I noticed that only around 1,175K hours where importer instead. I have fixed my data importer and am now retraining on everything. 🤞
Metrics are promising considering the data used for training.
This is about what I get with 2,5K hours of unaugmented audio on STT.
https://github.com/wasertech/commonvoice-fr/releases/tag/v0.8.0-fr-0.3
Using less data with Wav2vec2, yet we get similar results than if we had twice as much data with STT (no augment yes OK but it doesn't improve that much neither). Imagine the results on all of 2,5K hours... plus augmentation can also be tried on Wav2vec2. We also can have an entire transformer network to remove noise from speech so that we can always feed clean audio to our acoustic model.
Beta Was this translation helpful? Give feedback.
All reactions