CLI tool for word count estimation in audio files.
-
Download the
wce
from gitlab and install the required Python packages using pip:$ git clone https://gitlab.coml.lscp.ens.fr/babycloud/ml/wce.git $ cd wce $ pip install requirements.txt
-
Install the external dependencies Perl, SoX and libsndfile1. On Debian/Ubuntu run:
$ sudo apt-get update && install perl sox libsndfile1
It is possible to use directly the wce
through the CLI or to run it within a
docker container.
For a complete list of available options, run:
$ python cli.py -h
The tool disposes of two commands: train and predict.
-
Train:
Trains a WCE model on audio files given their respective SAD files and annotation files.$ python cli.py train wav_dir annotation_dir -r sad_dir -s sad_name -w output_model_file
If no model file is indicated, it will be saved to a default file:
adapted_model.pickle
. To see all the options use the help command. -
Predict:
Predicts the word counts of audio files given their respective SAD files.$ python cli.py predict wav_dir output_file -r sad_dir -s sad_name -w model_file
If no model file is indicated, the default model will be used:
default_model.pickle
. To see all the options use the help command.
It is recommended to provide SAD files as the results without them are not yet conclusive.
All data preprocessing parameters can be tweaked in the configuration file at models/envelope_estimator/data_processing_config.yml.
Using the provided Dockerfile
:
-
Build the docker image:
$ sudo docker build -t wce .
-
Then to predict the word counts for some audio files using the pre-trained model, run a docker container and mount your data and result directories to the intended directories in the container.
$ sudo docker run \ --name wce \ -v my_data/:/app/data \ -v my_results/:/app/results \ wce
Your data folder must contain the audio files and their respective SAD files.
When the process is done,output.csv
will be available inmy_results/
. -
For any other command, the arguments will need to be specified and the volumes adapted. For instance to use the train command, a
models
volume will need to be specified for the output model file.
Currently the WCE only supports certain formats:
-
Audio files must be in the .wav format
-
SAD files must be in the .rttm and have the following fields:
SPEAKER fname 1 onset duration <NA> <NA> speech <NA>
-
Annotation files must be in .eaf and should include speaker tiers CHI, MOT, FAT as only those are processed.
SAD files and annotation files must have the same name as the audio file they are related to. Moreover SAD file should have the SAD algorithm's name at the front, separated with a '_'.
myfile.wav
sadName_myfile.rttm
myfile.eaf
Two tests are available:
- On the default model, checks if the results are still the same.
- On a trained model, checks if the RMSE is still under a certain threshold.
To run them:
-
Install pytest:
$ pip install pytest
-
And run:
$ pytest test/test.py
This tool is a reimplementation in Python of the original MATLAB WCE based on:
Rasanen, O., Seshadri, S., Karadayi, J., Riebling, E., Bunce, J., Cristia, A., Metze, F., Casillas, M., Rosemberg, C., Bergelson, E., & Soderstrom, M. (submitted): Automatic word count estimation from daylong child-centered recordings in various language environments using language-independent syllabification of speech