Using Decoder on single audio file #907

ishan-modi · 2020-11-30T07:41:57Z

How to use the decoder ?

I have downloaded the model files as mentioned in
https://github.com/facebookresearch/wav2letter/wiki/Inference-Run-Examples#download-the-example-trained-models-from-aws-s3

now I want to use decoder to decode an audio file
I have created a decoder.cfg

--am=path to acoustic_model.bin
--test=path to train.lst
--show
--sholetters
--uselexicon=true
--lm=path to language_model.bin
--lmtype=kenlm
--decodertype=wrd
--lmweight=2.5
--wordscore=1
--beamsize=500
--beamthreshold=25
--silweight=-0.5
--nthread_decoder=4
--smearing=max
--show=true

the train.lst contains the path to my audio file.

I am a bit new to this framework please guide me through and correct me if I am wrong

abhinavkulkarni · 2020-12-01T05:01:39Z

@ishan-modi: Only Flashlight backend models can be used with Train/Test/Decode binaries. The models that you downloaded are the FBGEMM backend ones. Those are to be used in streaming format with inference example binaries.

ishan-modi · 2020-12-01T05:40:17Z

Thank you for the response got it !

Ok so now I am running on Flashlight backend models link

https://github.com/facebookresearch/wav2letter/tree/master/recipes/streaming_convnets/librispeech

and I want to recreate beam search decoding for a single audio file.
How do I generate .lst for this audio file which I can use as an input for decoding ?

abhinavkulkarni · 2020-12-01T05:52:06Z

@ishan-modi: Take a look at the instructions about how to prepare data for training (and testing).

Also, if disk space and internet bandwidth is not a problem, try running the data preparation scripts for one of the recipes. That will download the Librispeech data and lay it out in a format that Train/Test/Decode binaries expect (including .wav, .lst files).

Also, you may want to edit the subject title of this post for the benefit of others.

tlikhomanenko · 2020-12-01T06:35:40Z

Just a quick answer on list file: the expected format (tab or space separated between columns, there should be 3 or 4 columns)

# audio_id (whatever name you want) absolute_audio_path audio_duration (in ms) transcription
1 /home/../1.wav 1234.34 hello world

Adportas · 2020-12-01T21:36:57Z

Just a quick answer on list file: the expected format (tab or space separated between columns, there should be 3 or 4 columns)
# audio_id (whatever name you want) absolute_audio_path audio_duration (in ms) transcription
1 /home/../1.wav 1234.34 hello world

I have associated doubts with that thread

You have to provide the translations text to the Decoder in order to compare results?
If you only want to transcribe and you don't have the texts, just want to use the model, how can it be done?

ishan-modi · 2020-12-02T05:19:37Z

Thank you so much for response. Issue is resolved !!

ishan-modi · 2020-12-02T12:10:53Z

Just a quick answer on list file: the expected format (tab or space separated between columns, there should be 3 or 4 columns)
# audio_id (whatever name you want) absolute_audio_path audio_duration (in ms) transcription
1 /home/../1.wav 1234.34 hello world
I have associated doubts with that thread
1. You have to provide the translations text to the Decoder in order to compare results?

2. If you only want to transcribe and you don't have the texts, just want to use the model, how can it be done?

Answers

No you don't need to have transcripts if you want to decode
Checkout their inference module to generate transcripts by following the steps in given link

https://github.com/facebookresearch/wav2letter/wiki

Adportas · 2020-12-02T19:18:22Z

Just a quick answer on list file: the expected format (tab or space separated between columns, there should be 3 or 4 columns)
# audio_id (whatever name you want) absolute_audio_path audio_duration (in ms) transcription
1 /home/../1.wav 1234.34 hello world
I have associated doubts with that thread
1. You have to provide the translations text to the Decoder in order to compare results?

2. If you only want to transcribe and you don't have the texts, just want to use the model, how can it be done?
Answers
1. No you don't need to have transcripts if you want to decode

-> 1 /home/../1.wav 1234.34 hello world

2. Checkout their inference module to generate transcripts by following the steps in given link
https://github.com/facebookresearch/wav2letter/wiki

Hi @ishan-modi
Thanks for answering

I'am a bit confused between decoder and inference purposes + differences. To me look as like the same but obviously for something there are both. Lately seeing the structure of lst file suggested by @tlikhomanenko in the thread above (/home/../1.wav 1234.34 hello world), it seems to me that the decoder is a transcription comparer between the text that comes in the lst file and the text which the model generates with de wav/flac files listed? (it is right?), otherwise the transcription data (hello world) must be unnecessary + illogical (i think).
Once i got the model, I start with the inference framework tutorial but get stuck because the default model generated by me was not of the streaming type, which is the only one supported for the listed examples: simple_streaming_asr_example, multithreaded_streaming_asr_example
So I wanted to convert it to streaming format in order to use, with the tool StreamingTDSModelConverter but it can only be done with the TDS type and my default model is not of this -> thread
So I started testing with the decoder at the suggestion of @abhinavkulkarni in the following thread, hoping it helps me to transcribe some wavs files, but it's a bit frustrating to read that it requires the transcripts or to go back to the inference that has tds and streaming requirements that the model doesn't have
If I have a model that is neither streaming nor tds type, how do I use it to transcribe wav files without the texts obviously. I've been reading wiki and testing for a few weeks without being able to use it.

tlikhomanenko · 2020-12-11T05:47:50Z

@Adportas Inference is done purely on cpu (in a streaming fashion) while decode.cpp is working both on cpu and gpu for any network and then cpu for beam search decoding. Inference right now is working only with conv type networks. Decoder is taking list file and predicts transcription, so you don't need to have targets. At the same time decode.cpp also computes wer. Right now decode.cpp computes wer in any case, so if you just provide empty targets (there is some bug people reported to have empty targets, so please just put fake text there) you still obtain predictions and wer, but you can simply ignore wer.

So please just use decode.cpp with some fake transcripts (or try even empty strings there)!

ishan-modi added the question label Nov 30, 2020

ishan-modi changed the title ~~[Question title here]~~ Using Decoder on single audio file Dec 1, 2020

ishan-modi closed this as completed Dec 2, 2020

Adportas mentioned this issue Dec 2, 2020

Transcribe audio in WAV format #904

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using Decoder on single audio file #907

Using Decoder on single audio file #907

ishan-modi commented Nov 30, 2020 •

edited

Loading

abhinavkulkarni commented Dec 1, 2020

ishan-modi commented Dec 1, 2020

abhinavkulkarni commented Dec 1, 2020 •

edited

Loading

tlikhomanenko commented Dec 1, 2020

Adportas commented Dec 1, 2020

ishan-modi commented Dec 2, 2020

ishan-modi commented Dec 2, 2020 •

edited

Loading

I have associated doubts with that thread

Adportas commented Dec 2, 2020

I have associated doubts with that thread

tlikhomanenko commented Dec 11, 2020 •

edited

Loading

Using Decoder on single audio file #907

Using Decoder on single audio file #907

Comments

ishan-modi commented Nov 30, 2020 • edited Loading

How to use the decoder ?

abhinavkulkarni commented Dec 1, 2020

ishan-modi commented Dec 1, 2020

abhinavkulkarni commented Dec 1, 2020 • edited Loading

tlikhomanenko commented Dec 1, 2020

Adportas commented Dec 1, 2020

I have associated doubts with that thread

ishan-modi commented Dec 2, 2020

ishan-modi commented Dec 2, 2020 • edited Loading

I have associated doubts with that thread

Adportas commented Dec 2, 2020

I have associated doubts with that thread

tlikhomanenko commented Dec 11, 2020 • edited Loading

ishan-modi commented Nov 30, 2020 •

edited

Loading

abhinavkulkarni commented Dec 1, 2020 •

edited

Loading

ishan-modi commented Dec 2, 2020 •

edited

Loading

tlikhomanenko commented Dec 11, 2020 •

edited

Loading