Update readme

common-voice · beam11221 · Oct 12, 2022 · Oct 12, 2022 · Oct 12, 2022 · Oct 12, 2022
commit a934cd3bb7003db5068330cfc7c09edb973dbebe
diff --git a/submit/Gender_Category/STT/README.md b/submit/Gender_Category/STT/README.md
@@ -2,10 +2,6 @@
 
 ### Setup
 
-
-```
-bash ./setup.sh
-```
 ```
 pip install -r requirements.txt
 ```
@@ -17,23 +13,34 @@ Then, download followings or download sh file
 - <a href="https://drive.google.com/file/d/1TX-Fp9CWz7U2AicAjhy3gmDoM7XHqSty/view?usp=sharing">Language Model</a>
 - <a href="https://drive.google.com/drive/folders/1LAkmsgQ1KrxuFO54UOTnrA7NWcOGAshX?usp=sharing">WavAugment</a>
 
+```
+bash ./setup.sh
+```
+This will automatically download the essential files for model training.
+
+
+
 
 ### Model training
-- Model Initiation
+Our base model is Data2VecAudio Model with a language modeling head on top for Connectionist Temporal Classification (CTC). Data2VecAudio was proposed in data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language by Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu and Michael Auli.  For more information visit, https://huggingface.co/docs/transformers/model_doc/data2vec
 
 
 ```py
-# pretrianed model (data2vec)
+# pretrianed model 
 BASE_MODEL = "./data2vec-thai-pretrained"
+# load data
+mixed_train = load_dataset("./cv11.py", "th", split="train+validation")
+mixed_test = load_dataset("./cv11.py", "th", split="test")
+# processor
 processor = Wav2Vec2Processor.from_pretrained("./processor")
-# import augment
+# import Waveaugment
 import sys
 sys.path.append("./WavAugment")
 # clips path
 abs_path_to_clips = "./Methods_and_Measures/commonvoice11/data/clips_wav" 
 ```
 
-Model :
+For our trained models can be downloaded below:
 
 - trained with the 1st dataset (original ratio of gender) 
 <a href="https://drive.google.com/drive/folders/1YPmUk3ZsfMxqq2nFwUV3fWL3uKFxz13q?usp=sharing">load model</a>
@@ -44,15 +51,15 @@ Model :
 - trained with the 3rd dataset (balance ratio between female & male with speaking same sentence) 
 <a href="https://drive.google.com/drive/folders/10DZLSO6ftUzZlvfme2FMbUIpH2ZZoYvS?usp=sharing">load model</a>
 
-Model after we upsampling training set:
+Model after upsampling training set:
 
 - trained with added 2nd dataset (balance ratio between female & male) 
 <a href="https://drive.google.com/drive/folders/1nsyl3VLo76DIRNg0Zrrrvy_o4QYlUtXJ?usp=sharing">load model</a>
 
 - trained with added 3rd dataset (balance ratio between female & male with speaking same sentence)
 <a href="https://drive.google.com/drive/folders/1lBu9JD-_cQOBjsN747ElV-kAsAhR6rD6?usp=sharing">load model</a>
 
-### Evaluate
+### Evaluation
 #
 ```py
 # processor
@@ -72,4 +79,9 @@ audio_paths = [
               ]
 
 ```
+- Output of this  `data2vec_evaluate.py` is .csv file with WER and CER score per reccord, which you can easily group by gender to see the final results.
+
+
+
+