This is a demo software that contains scripts to punctuate audio recordings using punkProse library. It is inteded to use for demonstration purposes. It is developed and tested in Mac but it should run on any UNIX based OS.
- Python 3.x
- Python packages
- Numpy
- Theano
- yaml
- Proscript
- speech_recognition
- pyaudio
- wave
- PunkProse library scripts
- Montreal forced aligner
- Praat
- Google Cloud credentials
Install the required python packages. Install Montreal forced aligner and link the binaries and models (MFA_ALIGN_BINARY
, MFA_LEXICON
, MFA_LM
) in microphone_recognition.py
. Praat should be installed and accessible from command line as praat
.
Speech recognition is done through Python package speech_recognition
. Current setting uses Google Cloud Speech API as the recognition engine. The credentials needs to be put in a file named credentials.py
with the name GOOGLE_CLOUD_SPEECH_CREDENTIALS
.
Currently two English punctuation models are provided under the directory models
. One that is trained on words only and another one with prosodic features pause, mean f0 and POS features.
In order to run type:
python listen_and_punctuate.py
You can choose either to record from microphone or open a pre-recorded audio file. Raw and punctuated proscripts will be created under the directory rec
.
In order to visualize the recordings install Prosograph and link the directory rec
in file dataconfig_newdata.py
under Prosograph. You can switch between different punctuation outputs using number keys.
This demo was presented in Interspeech 2018: Link
@inproceedings{punkProse,
author = {Alp Oktem and Mireia Farrus and Antonio Bonafonte},
title = {Visualizing Punctuation Restoration in Speech Transcripts with Prosograph},
booktitle = {Interspeech 2018},
year = {2018},
address = {Hyderabad, India}
}