Switch to Coqui STT 1.4.0 #163

wasertech · 2022-05-20T22:12:13Z

This branch implements everything needed to train STT models for french using CommonVoice 9.0 with STT version 1.4.0.

Notes

Checkout the released models from this branch: STT French v0.9.

I've added the import_cv_perso.sh importer script to download personal CV data and ease the process of fine-tuning from checkpoints. See this commit and this article on Discourse.

I've also added a custom python script for lm_optimizer to catch the results of the optimization and save them to disk so we can use them during testing and exporting steps.

train.sh has been split into train.sh, test.sh and export.sh. See this commit.

wasertech · 2022-05-23T22:18:07Z

Managed to make this branch export a model from scratch. See the full logs here.

… stt140-cv9

…ith version

wasertech · 2022-09-04T21:34:37Z

STT 1.4.0 was released as stable! I've updated stt_branch accordingly. This branch stt140-cv9 is now completed.

Full build logs and checks.

Version 10 of CV is out so I'll probably make another branch for it (I'll probably wait for more affordable energy to train cv-fr-10 though).

Added link to french tutorial for fine-tuning

wasertech · 2023-03-05T05:39:00Z

This branch made the mistake to delete commonvoice-fr/DeepSpeech/ to create commonvoice-fr/STT/.
It is now obsolete thanks to #168.

wasertech added 17 commits December 1, 2021 18:09

Updated cv realese to 7.0

7851873

skeleton fix

fcfc3cc

this will fail in so many ways

f794d07

move DeepSpeech to STT

3e47911

update to stt 1.3.0

ddcd523

hashed cv-fr9

fc70841

fix hell itself

6f7cf06

ds -> stt

48b0500

don't restric unidecode (or even install it?)

fdbadc6

fix ken lm path

3430bc0

add excluded sentences to lm

01324b3

Upgrade to STT alpha 1.4.0-1

7737b17

make sure ffmpeg is there

e3d906d

update cv9 sha

41ce236

indepence for batches of sizes

8f21100

fix cv9 sha256

374a51b

This will pass batch memory test

eaf2ad7

This was referenced May 20, 2022

Skeleton 🐸 1.3.0 #161

Closed

Using CV9 to show that producing 22 hours of audio is not enough to train new models. #162

Closed

wasertech added 11 commits May 21, 2022 01:12

update docs

9086e43

use previous default batch size for all

7f4772e

docker file opt.

61ad902

phrasing

c4c5f1f

added skip_batch_test

b867dd1

phrasing II

53d0c42

lint fr/validate_labels

a9e738c

add personal cv data for fine-tuning

648e638

lint importers

44572a4

phrasing III

67578ee

fix eval lm and personal cv

57c6816

wasertech added 2 commits May 24, 2022 19:16

fix if statements

3fb4dcd

remove useless env var

ba9bca1

wasertech mentioned this pull request May 25, 2022

Use 🐸 STT #159

Closed

wasertech changed the title ~~This branch passes the batch memory test~~ Switch to Coqui STT 1.4.0 May 25, 2022

wasertech added 2 commits May 29, 2022 00:50

Add basic data augmentation

483efe4

Update stt_branch

f10e072

wasertech marked this pull request as ready for review May 30, 2022 18:18

This comment was marked as outdated.

Sign in to view

wasertech and others added 13 commits May 30, 2022 21:05

remove useless deps

2528184

Update README.md

0409327

Update README.md

c349820

Update README.md

2f772a3

clean train, test and export

42ce1c9

Merge branch 'stt140-cv9' of github.com:wasertech/commonvoice-fr into…

8c23b78

… stt140-cv9

rebuild scorer with best default values

682a125

set default epochs to converge

65aef10

Build training module from checkout commit to avoid inconsistencies w…

74a0c6a

…ith version

Update docs

b1c5249

fix lm values load

13ffd1b

chill those epochs

7bfaad0

Updated stt to stable release 1.4.0

f83bed3

wasertech added 3 commits September 5, 2022 00:59

rm unused torch lib

51d7d57

cleaned useless comments

2751130

fix crash mising scorer when deleted manually

1ff10d8

wasertech mentioned this pull request Sep 6, 2022

Use MLS dataset #150

Closed

Update README.md

a973df1

Added link to french tutorial for fine-tuning

wasertech closed this Mar 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch to Coqui STT 1.4.0 #163

Switch to Coqui STT 1.4.0 #163

wasertech commented May 20, 2022 •

edited

Loading

wasertech commented May 23, 2022 •

edited

Loading

This comment was marked as outdated.

wasertech commented Sep 4, 2022 •

edited

Loading

wasertech commented Mar 5, 2023

Switch to Coqui STT 1.4.0 #163

Switch to Coqui STT 1.4.0 #163

Conversation

wasertech commented May 20, 2022 • edited Loading

Notes

wasertech commented May 23, 2022 • edited Loading

This comment was marked as outdated.

wasertech commented Sep 4, 2022 • edited Loading

wasertech commented Mar 5, 2023

wasertech commented May 20, 2022 •

edited

Loading

wasertech commented May 23, 2022 •

edited

Loading

wasertech commented Sep 4, 2022 •

edited

Loading