First complete successful training #2

wasertech · 2023-05-10T00:20:12Z

wasertech
May 10, 2023
Maintainer

I have successfully trained a small model using only 1K hours of audio.

You can find its model card on the hub:

https://huggingface.co/wasertech/wav2vec2-cv-fr-9

You can also test it in its dedicated space:

https://huggingface.co/spaces/wasertech/French_Wav2Vec2_ASR

The goal was originally to train on all my data (+2,5K hours) but I noticed that only around 1,175K hours where importer instead. I have fixed my data importer and am now retraining on everything. 🤞

Metrics are promising considering the data used for training.

Test set	WER	REC
Multilingual LibriSpeech (MLS)	25.74%	8.14%
African Accent French	66.12%	34.56%
TrainingSpeech	14.56%	3.68%
LinguaLibre	38.62%	9.30%
M-AILABS FR	15.90%	4.28%
Att-HACK	6.07%	2.78%
CommonVoice FR 9.0	35.98%	12.10%
Average	22.16%	7.03%

This is about what I get with 2,5K hours of unaugmented audio on STT.

https://github.com/wasertech/commonvoice-fr/releases/tag/v0.8.0-fr-0.3

Using less data with Wav2vec2, yet we get similar results than if we had twice as much data with STT (no augment yes OK but it doesn't improve that much neither). Imagine the results on all of 2,5K hours... plus augmentation can also be tried on Wav2vec2. We also can have an entire transformer network to remove noise from speech so that we can always feed clean audio to our acoustic model.

wasertech · 2023-08-31T11:30:58Z

wasertech
Aug 31, 2023
Maintainer Author

Imagine the results on all of 2,5K hours...

You don't have to imagine anymore, here is a model trained on 2,2K hours of good quality French audio.

https://huggingface.co/bofenghuang/asr-wav2vec2-ctc-french

Test	WER	WER (+LM)
Common Voice 11.0	11.440	9.660
Multilingual LibriSpeech (MLS)	5.930	5.130
VoxPopuli	9.330	8.510
African Accented French	16.220	15.390
Robust Speech Event - Dev Data	16.560	12.960
Fleurs	10.100	8.840

I've already made a draft PR for Listen that uses Wav2Vec 2.0.

https://gitlab.com/waser-technologies/technologies/listen/-/merge_requests/2

Here is a demo of this model working pretty well in really time on open domain transcription.

https://youtu.be/f3gdQv3fvq4

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

First complete successful training #2

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

First complete successful training #2

wasertech May 10, 2023 Maintainer

Replies: 1 comment

wasertech Aug 31, 2023 Maintainer Author

wasertech
May 10, 2023
Maintainer

wasertech
Aug 31, 2023
Maintainer Author