Skip to content
This repository has been archived by the owner on Nov 28, 2022. It is now read-only.

Submission for WEDO Team #9

Open
wants to merge 102 commits into
base: main
Choose a base branch
from
Open
Changes from 1 commit
Commits
Show all changes
102 commits
Select commit Hold shift + click to select a range
2edf5c8
Add pretrained model
BeamNC Oct 12, 2022
89c27b5
add inference code
BeamNC Oct 12, 2022
97eef82
Update script
BeamNC Oct 12, 2022
669fd1d
first_commit
nattanaa Oct 12, 2022
4d5cd37
commit2
nattanaa Oct 12, 2022
8e6bf57
commit3
nattanaa Oct 12, 2022
917eddc
Add setup.sh for install
BeamNC Oct 12, 2022
cfa2d43
Delete submit/Gender_Category/gender_classification/pretrained_model …
beam11221 Oct 12, 2022
2f68b8b
Delete hparams_inference.yaml
beam11221 Oct 12, 2022
c772afc
Update setup.sh
BeamNC Oct 14, 2022
129e3d9
Add readme
BeamNC Oct 14, 2022
44e555f
From local. Fixed conflict
BeamNC Oct 14, 2022
6128900
Update README.md
beam11221 Oct 14, 2022
75a3089
Merge pull request #1 from KongpolC/gender_clf
KongpolC Oct 14, 2022
d12ba78
commit3
nattanaa Oct 14, 2022
5239ad1
commit4
nattanaa Oct 14, 2022
f8706aa
setup update
nattanaa Oct 14, 2022
75ba2a1
update setup.sh to download validated.tsv for cv11
BeamNC Oct 14, 2022
fe2da24
Add cv11 gender inference. Update setup.sh
BeamNC Oct 14, 2022
13b49e7
Update README.md
beam11221 Oct 14, 2022
8fceeb3
Add Thai-ser download script, add data preprocessing notebook; mp3 ->…
BeamNC Oct 14, 2022
281de7e
fixed local conflict
BeamNC Oct 14, 2022
82d6689
Merge pull request #2 from KongpolC/gender_clf_2
beam11221 Oct 14, 2022
67750a8
2 update
nattanaa Oct 14, 2022
73725db
Merge branch 'main' of https://github.com/KongpolC/our-voices-model-c…
nattanaa Oct 14, 2022
799d40c
Update readme
nattanaa Oct 14, 2022
31e0126
Update all
nattanaa Oct 16, 2022
a934cd3
Update readme
nattanaa Oct 16, 2022
7e9b9ee
Add training script
BeamNC Oct 17, 2022
7794fe5
Merge pull request #3 from KongpolC/gender_clf_3
beam11221 Oct 17, 2022
e1e985c
first commit
KongpolC Oct 17, 2022
b0552d6
Merge branch 'main' of github.com:KongpolC/our-voices-model-competition
KongpolC Oct 17, 2022
0175562
Change download directory to ./models
BeamNC Oct 17, 2022
5b6f4f4
Fix setup.sh
BeamNC Oct 17, 2022
eb1132e
Merge pull request #4 from KongpolC/edit_pretrain_path
beam11221 Oct 17, 2022
0eed352
Update dataset internal path
BeamNC Oct 17, 2022
ee12e67
change paths to data
KongpolC Oct 17, 2022
4a9fd68
Merge branch 'main' of github.com:KongpolC/our-voices-model-competition
KongpolC Oct 17, 2022
b475c29
Add requirements
BeamNC Oct 17, 2022
d7cbf71
Update main.ipynb
nattanaa Oct 17, 2022
346b645
Update main.ipynb
nattanaa Oct 17, 2022
83f9276
Add gitignore, add audio preprocessing script
BeamNC Oct 17, 2022
18171af
Update training config
BeamNC Oct 17, 2022
af94756
Add load_dataset.sh
BeamNC Oct 17, 2022
2f12875
Update load_dataset.sh; Add copy tsv file from cv11 to commonvoice11/…
BeamNC Oct 17, 2022
5058376
change paths and add more explanation
KongpolC Oct 17, 2022
03f550b
Update readme
BeamNC Oct 17, 2022
0865f00
Update comment in model_inference notebook. UPdate readme
BeamNC Oct 17, 2022
5606b7a
update_all
nattanaa Oct 17, 2022
7aa9265
Merge branch 'main' of https://github.com/KongpolC/our-voices-model-c…
nattanaa Oct 17, 2022
5f97e9f
modify analysis 5.3
KongpolC Oct 18, 2022
a165796
Merge branch 'main' of github.com:KongpolC/our-voices-model-competition
KongpolC Oct 18, 2022
b5b2002
modify analysis 5.3
KongpolC Oct 18, 2022
fdca75c
Merge remote-tracking branch 'origin/migrate'
KongpolC Oct 18, 2022
e9c0959
rearange data
KongpolC Oct 18, 2022
15a2e4f
sample clips
KongpolC Oct 18, 2022
6744a25
ignores .wav files except one
KongpolC Oct 18, 2022
6e22c0d
remove README
KongpolC Oct 18, 2022
c1b3a8a
rename
KongpolC Oct 18, 2022
4d8b3f7
Move data file to scripts
BeamNC Oct 19, 2022
ff8f973
Fix path to compat with new directory
BeamNC Oct 19, 2022
d16231c
Merge pull request #5 from KongpolC/migrate_script
beam11221 Oct 19, 2022
ad13bac
Add training script
BeamNC Oct 19, 2022
36ade8e
Update training config
BeamNC Oct 19, 2022
c7cf4f2
update path
nattanaa Oct 19, 2022
4377c4b
add_floder_data_prep
nattanaa Oct 19, 2022
e2296e0
Update readme
BeamNC Oct 19, 2022
1294cfb
Merge branch 'main' of https://github.com/KongpolC/our-voices-model-c…
BeamNC Oct 19, 2022
7f93408
update sh
nattanaa Oct 19, 2022
aa6cac3
Merge branch 'main' of https://github.com/KongpolC/our-voices-model-c…
nattanaa Oct 19, 2022
d0df94a
update setup to train
nattanaa Oct 19, 2022
7583333
Update README.md
nattanaa Oct 19, 2022
b277dbe
Split load_dataset.sh into 2 files for commonvoice11 and Thai-SER
BeamNC Oct 19, 2022
32d49b0
Update load_commonvoice11.sh
beam11221 Oct 19, 2022
d74af66
Update commonvoice11 loading script
BeamNC Oct 19, 2022
9cd17c1
Merge pull request #7 from KongpolC/split_load_dataset
beam11221 Oct 19, 2022
e7ec1a3
Update dataset path
BeamNC Oct 19, 2022
8d61e07
Update setup.sh & requirement
BeamNC Oct 19, 2022
12e1846
Update parameter for inference
BeamNC Oct 19, 2022
b572dc8
Add Commonvoice11 annotation genereator & annotation
BeamNC Oct 19, 2022
44b8110
Add Thai-SER annotation and annotation generate scripts
BeamNC Oct 19, 2022
2815225
Update ds_path
BeamNC Oct 19, 2022
3fe660e
Add manifest generator for gender classification training
BeamNC Oct 19, 2022
fad601a
Update readme.md
BeamNC Oct 19, 2022
01bc75d
Merge pull request #8 from KongpolC/add_create_anno
beam11221 Oct 19, 2022
82b77b3
Update README.md
nattanaa Oct 20, 2022
0f01361
Update README.md
nutchascg Oct 20, 2022
da49f73
Update README.md
nutchascg Oct 20, 2022
8793047
Update README.md
nutchascg Oct 20, 2022
eabc1e6
Update README.md
nutchascg Oct 20, 2022
c581687
Update README.md
nutchascg Oct 20, 2022
a42a529
Update README.md
nutchascg Oct 20, 2022
24c09b9
Update README.md
nutchascg Oct 20, 2022
f06ea43
Add README
BeamNC Oct 20, 2022
0e32565
Update README
BeamNC Oct 20, 2022
0187234
Update README.md
nutchascg Oct 20, 2022
21a19a5
Remove files
BeamNC Oct 20, 2022
eeb86c7
Update README.md
nutchascg Oct 20, 2022
db4345c
Update README.md
nutchascg Oct 20, 2022
23dc3ad
Update main notebook
BeamNC Oct 20, 2022
5a61c44
Merge branch 'doc_string' of https://github.com/KongpolC/our-voices-m…
BeamNC Oct 20, 2022
ffadf04
Merge pull request #9 from KongpolC/doc_string
beam11221 Oct 20, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Update readme
  • Loading branch information
nattanaa committed Oct 16, 2022
commit a934cd3bb7003db5068330cfc7c09edb973dbebe
32 changes: 22 additions & 10 deletions submit/Gender_Category/STT/README.md
Original file line number Diff line number Diff line change
@@ -2,10 +2,6 @@

### Setup


```
bash ./setup.sh
```
```
pip install -r requirements.txt
```
@@ -17,23 +13,34 @@ Then, download followings or download sh file
- <a href="https://drive.google.com/file/d/1TX-Fp9CWz7U2AicAjhy3gmDoM7XHqSty/view?usp=sharing">Language Model</a>
- <a href="https://drive.google.com/drive/folders/1LAkmsgQ1KrxuFO54UOTnrA7NWcOGAshX?usp=sharing">WavAugment</a>

```
bash ./setup.sh
```
This will automatically download the essential files for model training.




### Model training
- Model Initiation
Our base model is Data2VecAudio Model with a language modeling head on top for Connectionist Temporal Classification (CTC). Data2VecAudio was proposed in data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language by Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu and Michael Auli. For more information visit, https://huggingface.co/docs/transformers/model_doc/data2vec


```py
# pretrianed model (data2vec)
# pretrianed model
BASE_MODEL = "./data2vec-thai-pretrained"
# load data
mixed_train = load_dataset("./cv11.py", "th", split="train+validation")
mixed_test = load_dataset("./cv11.py", "th", split="test")
# processor
processor = Wav2Vec2Processor.from_pretrained("./processor")
# import augment
# import Waveaugment
import sys
sys.path.append("./WavAugment")
# clips path
abs_path_to_clips = "./Methods_and_Measures/commonvoice11/data/clips_wav"
```

Model :
For our trained models can be downloaded below:

- trained with the 1st dataset (original ratio of gender)
<a href="https://drive.google.com/drive/folders/1YPmUk3ZsfMxqq2nFwUV3fWL3uKFxz13q?usp=sharing">load model</a>
@@ -44,15 +51,15 @@ Model :
- trained with the 3rd dataset (balance ratio between female & male with speaking same sentence)
<a href="https://drive.google.com/drive/folders/10DZLSO6ftUzZlvfme2FMbUIpH2ZZoYvS?usp=sharing">load model</a>

Model after we upsampling training set:
Model after upsampling training set:

- trained with added 2nd dataset (balance ratio between female & male)
<a href="https://drive.google.com/drive/folders/1nsyl3VLo76DIRNg0Zrrrvy_o4QYlUtXJ?usp=sharing">load model</a>

- trained with added 3rd dataset (balance ratio between female & male with speaking same sentence)
<a href="https://drive.google.com/drive/folders/1lBu9JD-_cQOBjsN747ElV-kAsAhR6rD6?usp=sharing">load model</a>

### Evaluate
### Evaluation
#
```py
# processor
@@ -72,4 +79,9 @@ audio_paths = [
]

```
- Output of this `data2vec_evaluate.py` is .csv file with WER and CER score per reccord, which you can easily group by gender to see the final results.