Implementation of Exploring Convolutional Neural Networks for Voice Activity Detection

Implementation of the VAD model created in the paper with the same name written by Diego Augusto Silva, José Augusto Stuchi, Ricardo P. Velloso Violato and Luís Gustavo D. Cuozzo

This is not an official repository of the authors, I merely created it as I wanted to use their model for my own project.

The official paper can be found here: Silva, Diego Augusto, et al. "Exploring convolutional neural networks for voice activity detection." Cognitive Technologies. Springer, Cham, 2017. 37-47.

Structure

The project contains different classes for the different parts. To run the whole program, execute the run.py file.

In order to get the program to work, path_data needs to be replaced with the path to the QUT-NOISE-TIMIT dataset.

The sound_viewer_tool is a copy of the tool originally created at https://github.com/ljvillanueva/Sound-Viewer-Tool/blob/master/svt.py. As the tool was discontinued and parts were not working for me, I adapted their code. The changes are, that audiolab has been replaced with librosa and PIL instead of Image is imported. Also, parts of the code have been adjusted to work with Python3.

TODO

To fully copy the methodology of the paper, the following changes need to be done:

change optimizer as well as the learning rate
implement in data creation that each frame only has 1 label and frame sizes are updated accordingly. Currently, if speech is found in a frame, it is always considered speech no matter the percentage of speech found.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Enums.py		Enums.py
README.md		README.md
create_data.py		create_data.py
evaluation.py		evaluation.py
get_files.py		get_files.py
get_rgb_spectrograms.py		get_rgb_spectrograms.py
model.py		model.py
requirements.txt		requirements.txt
run.py		run.py
sound_viewer_tool.py		sound_viewer_tool.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Implementation of Exploring Convolutional Neural Networks for Voice Activity Detection

Structure

TODO

About

Releases

Packages

Languages

nadbot/Exploring-Convolutional-Neural-Networks-for-Voice-Activity-Detection

Folders and files

Latest commit

History

Repository files navigation

Implementation of Exploring Convolutional Neural Networks for Voice Activity Detection

Structure

TODO

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages