Extract the wav files for the segments of the different corpora:
./get_wavs.py mandarin train
./get_wavs.py french train
./get_wavs.py english train
./get_wavs.py LANG1 train
./get_wavs.py LANG2 train
This also writes the lists like wavs/mandarin_train.list
containing the
absolute paths to the wav files.
Move to directory with Okko's syllable segmentation code, open Matlab, and run the unsupervised syllable segmentation:
cd thetaOscillator
matlab -desktop
process_wavs % execute in Matlab
exit
cd ..
To process the different corpora, edit the basename
variable in
process_wavs.m
with the appropriate language and subset.
Get the landmarks extracted above in the appropriate Python format:
./get_syl_landmarks.py mandarin train
./get_syl_landmarks.py french train
./get_syl_landmarks.py english train
./get_syl_landmarks.py LANG1 train
./get_syl_landmarks.py LANG2 train