Automatic Speech Recognition (ASR) - DeepSpeech German

This is the project for the paper German End-to-end Speech Recognition based on DeepSpeech published at KONVENS 2019.

This project aims to develop a working Speech to Text module using Mozilla DeepSpeech, which can be used for any Audio processing pipeline. Mozillla DeepSpeech is a state-of-the-art open-source automatic speech recognition (ASR) toolkit. DeepSpeech is using a model trained by machine learning techniques based on Baidu's Deep Speech research paper. Project DeepSpeech uses Google's TensorFlow to make the implementation easier.

Important Links:

Paper: https://www.researchgate.net/publication/336532830_German_End-to-end_Speech_Recognition_based_on_DeepSpeech

DeepSpeech-API: https://github.com/AASHISHAG/DeepSpeech-API

This Readme is written for DeepSpeech v0.5.0. Refer to Mozillla DeepSpeech for latest updates.

$ cd ..
$ ##Tuda-De
$ git clone https://github.com/AASHISHAG/deepspeech-german.git
$ deepspeech-german/pre-processing/prepare_data.py --tuda $tuda_corpus_path  $export_path_data_tuda

$ ##Voxforge
$ deepspeech-german/pre-processing/run_to_utf_8.sh
$ python3 deepspeech-german/prepare_data.py --voxforge $voxforge_corpus_path $export_path_data_voxforge

$ ##Mozilla Common Voice
$ python3 DeepSpeech/bin/import_cv2.py --filter_alphabet deepspeech-german/data/alphabet.txt $export_path_data_mozilla

NOTE: Change the path accordingly in run_to_utf_8.sh

Language Model

We used KenLM toolkit to train a 3-gram language model. It is Language Model inference code by Kenneth Heafield

Installation

$ git clone https://github.com/kpu/kenlm.git
$ cd kenlm
$ mkdir -p build
$ cd build
$ cmake ..
$ make -j `nproc`

Corpus

We used an open-source German Speech Corpus released by University of Hamburg.

Download the data

$ wget http://ltdata1.informatik.uni-hamburg.de/kaldi_tuda_de/German_sentences_8mil_filtered_maryfied.txt.gz
$ gzip -d German_sentences_8mil_filtered_maryfied.txt.gz

Pre-process the data

$ deepspeech-german/pre-processing/prepare_vocab.py $text_corpus_path $exp_path/clean_vocab.txt

Build the Language Model

$kenlm/build/bin/lmplz --text $exp_path/clean_vocab.txt --arpa $exp_path/words.arpa --o 3
$kenlm/build/bin/build_binary -T -s $exp_path/words.arpa $exp_path/lm.binary

NOTE: use -S memoryuse_in_%, if malloc expection occurs

Example:

$kenlm/build/bin/lmplz --text $exp_path/clean_vocab.txt --arpa $exp_path/words.arpa --o 3 -S 50%

Trie

To build Trie for the above trained Language Model.

Requirements

Build Native Client.

# The DeepSpeech tools are used to create the trie
$ git clone https://github.com/mozilla/tensorflow.git
$ cd tensorflow
$ git checkout origin/r1.13
$ ./configure
$ ln -s ../DeepSpeech/native_client ./
$ bazel build --config=monolithic -c opt --copt=-O3 --copt="-D_GLIBCXX_USE_CXX11_ABI=0" --copt=-fvisibility=hidden //native_client:libdeepspeech.so //native_client:generate_trie --config=cuda

NOTE:

Flags used to configure TensorFlow

Do you wish to build TensorFlow with XLA JIT support? [Y/n]: n
Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: N
Do you wish to build TensorFlow with ROCm support? [y/N]: N
Do you wish to build TensorFlow with CUDA support? [y/N]: y
Do you wish to build TensorFlow with TensorRT support? [y/N]: N
Do you want to use clang as CUDA compiler? [y/N]: N
Do you wish to build TensorFlow with MPI support? [y/N]: N
Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: N

Refer Mozilla's documentation for updates. We used Bazel Build label: 0.19.2 with DeepSpeechV0.5.0

Build Trie

$ DeepSpeech/native_client/generate_trie $path/alphabet.txt $path/lm.binary $exp_path/trie

Training

Define the path of the corpus and the hyperparameters in deepspeech-german/train_model.sh file.

$ nohup deepspeech-german/train_model.sh &

Hyper-Paramter Optimization

Define the path of the corpus and the hyperparameters in deepspeech-german/hyperparameter_optimization.sh file.

$ nohup deepspeech-german/hyperparameter_optimization.sh &

Results

Some results from our findings.

Mozilla 79.7%
Voxforge 72.1%
Tuda-De 26.8%
Tuda-De+Mozilla 57.3%
Tuda-De+Voxforge 15.1%
Tuda-De+Voxforge+Mozilla 21.5%

NOTE: Refer our paper for more information.

Transfer Learning

1. German to German

Specify the checkpoint directory in transfer_model.sh

$ nohup deepspeech-german/transfer_model.sh &

2. English to German

Change all umlauts characters ä,ö,ü,ß to ae, oe, ue, ss
Re-build Language Model, Trie and Corpus
Specify the checkpoint directory in transfer_model.sh

$ nohup deepspeech-german/transfer_model.sh &

NOTE: The checkpoints should be from the same version to perform Transfer Learning

Trained Models (Language Model, Trie, Speech Model and Checkpoints)

The DeepSpeech model can be directly re-trained on new dataset. The required dependencies are available at:

1. v0.5.0

This model is trained on DeepSpeech v0.5.0 with Mozilla_v3+Voxforge+Tuda-De (please refer the paper for more details) https://drive.google.com/drive/folders/1nG6xii2FP6PPqmcp4KtNVvUADXxEeakk?usp=sharing

https://drive.google.com/file/d/1VN1xPH0JQNKK6DiSVgyQ4STFyDY_rle3/view

2. v0.6.0

This model is trained on DeepSpeech v0.6.0 with Mozilla_v4+Voxforge+Tuda-De+MAILABS(454+57+184+233h=928h)

https://drive.google.com/drive/folders/1BKblYaSLnwwkvVOQTQ5roOeN0SuQm8qr?usp=sharing

3. v0.7.4

This model is trained on DeepSpeech v0.7.4 using pre-trained English model released by Mozilla English+Mozilla_v5+MAILABS+Tuda-De+Voxforge (1700+750+233+184+57h=2924h)

https://drive.google.com/drive/folders/1PFSIdmi4Ge8EB75cYh2nfYOXlCIgiMEL?usp=sharing

3. v0.9.0

This model is trained on DeepSpeech v0.9.0 using pre-trained English model released by Mozilla English+Mozilla_v5+SWC+MAILABS+Tuda-De+Voxforge (1700+750+248+233+184+57h=3172h)

Thanks to @koh-osug for providing Tflite model.

Link: https://drive.google.com/drive/folders/1L7ILB-TMmzL8IDYi_GW8YixAoYWjDMn1?usp=sharing

Why being SHY to STAR the repository, if you use the resources? :D

TODO LIST

Realse model for DeepSpeech-v0.6.0
Realse model for DeepSpeech-v0.7.4
Realse model for DeepSpeech-v0.9.0
Add datasets - SWC

Acknowledgments

Prof. Dr.-Ing. Torsten Zesch - Co-Author
Dipl.-Ling. Andrea Horbach
Matthias

References

If you use our findings/scripts in your academic work, please cite:

@inproceedings{agarwal-zesch-2019-german,
    author = "Aashish Agarwal and Torsten Zesch",
    title = "German End-to-end Speech Recognition based on DeepSpeech",
    booktitle = "Preliminary proceedings of the 15th Conference on Natural Language Processing (KONVENS 2019): Long Papers",
    year = "2019",
    address = "Erlangen, Germany",
    publisher = "German Society for Computational Linguistics \& Language Technology",
    pages = "111--119"
}

Name		Name	Last commit message	Last commit date
Latest commit History 119 Commits
.idea		.idea
data		data
media		media
pre-processing		pre-processing
LICENSE.txt		LICENSE.txt
README.md		README.md
commands.txt		commands.txt
hyperparameter_optimization.sh		hyperparameter_optimization.sh
linux_requirements.txt		linux_requirements.txt
python_requirements.txt		python_requirements.txt
test_model.sh		test_model.sh
train_model.sh		train_model.sh
transfer_model.sh		transfer_model.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Automatic Speech Recognition (ASR) - DeepSpeech German

Important Links:

Contents

Requirements

Installing Python bindings

Installing Linux dependencies

Mozilla DeepSpeech

Speech Corpus

Language Model

NOTE: use -S memoryuse_in_%, if malloc expection occurs

Trie

Requirements

Training

Hyper-Paramter Optimization

Results

Transfer Learning

Trained Models (Language Model, Trie, Speech Model and Checkpoints)

TODO LIST

Acknowledgments

References

About

Releases

Packages

Languages

License

kaoh/deepspeech-german

Folders and files

Latest commit

History

Repository files navigation

Automatic Speech Recognition (ASR) - DeepSpeech German

Important Links:

Contents

Requirements

Installing Python bindings

Installing Linux dependencies

Mozilla DeepSpeech

Speech Corpus

Language Model

NOTE: use -S memoryuse_in_%, if malloc expection occurs

Trie

Requirements

Training

Hyper-Paramter Optimization

Results

Transfer Learning

Trained Models (Language Model, Trie, Speech Model and Checkpoints)

TODO LIST

Acknowledgments

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages