Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using Decoder on single audio file #907

Closed
ishan-modi opened this issue Nov 30, 2020 · 9 comments
Closed

Using Decoder on single audio file #907

ishan-modi opened this issue Nov 30, 2020 · 9 comments
Labels

Comments

@ishan-modi
Copy link

ishan-modi commented Nov 30, 2020

How to use the decoder ?

I have downloaded the model files as mentioned in
https://github.com/facebookresearch/wav2letter/wiki/Inference-Run-Examples#download-the-example-trained-models-from-aws-s3

now I want to use decoder to decode an audio file
I have created a decoder.cfg

--am=path to acoustic_model.bin
--test=path to train.lst
--show
--sholetters
--uselexicon=true
--lm=path to language_model.bin
--lmtype=kenlm
--decodertype=wrd
--lmweight=2.5
--wordscore=1
--beamsize=500
--beamthreshold=25
--silweight=-0.5
--nthread_decoder=4
--smearing=max
--show=true

the train.lst contains the path to my audio file.

I am a bit new to this framework please guide me through and correct me if I am wrong

@abhinavkulkarni
Copy link

@ishan-modi: Only Flashlight backend models can be used with Train/Test/Decode binaries. The models that you downloaded are the FBGEMM backend ones. Those are to be used in streaming format with inference example binaries.

@ishan-modi
Copy link
Author

Thank you for the response got it !

Ok so now I am running on Flashlight backend models link

https://github.com/facebookresearch/wav2letter/tree/master/recipes/streaming_convnets/librispeech

and I want to recreate beam search decoding for a single audio file.
How do I generate .lst for this audio file which I can use as an input for decoding ?

@abhinavkulkarni
Copy link

abhinavkulkarni commented Dec 1, 2020

@ishan-modi: Take a look at the instructions about how to prepare data for training (and testing).

Also, if disk space and internet bandwidth is not a problem, try running the data preparation scripts for one of the recipes. That will download the Librispeech data and lay it out in a format that Train/Test/Decode binaries expect (including .wav, .lst files).

Also, you may want to edit the subject title of this post for the benefit of others.

@tlikhomanenko
Copy link
Contributor

Just a quick answer on list file: the expected format (tab or space separated between columns, there should be 3 or 4 columns)

# audio_id (whatever name you want) absolute_audio_path audio_duration (in ms) transcription
1 /home/../1.wav 1234.34 hello world

@ishan-modi ishan-modi changed the title [Question title here] Using Decoder on single audio file Dec 1, 2020
@Adportas
Copy link

Adportas commented Dec 1, 2020

Just a quick answer on list file: the expected format (tab or space separated between columns, there should be 3 or 4 columns)

# audio_id (whatever name you want) absolute_audio_path audio_duration (in ms) transcription
1 /home/../1.wav 1234.34 hello world

I have associated doubts with that thread

  1. You have to provide the translations text to the Decoder in order to compare results?
  2. If you only want to transcribe and you don't have the texts, just want to use the model, how can it be done?

@ishan-modi
Copy link
Author

Thank you so much for response. Issue is resolved !!

@ishan-modi
Copy link
Author

ishan-modi commented Dec 2, 2020

Just a quick answer on list file: the expected format (tab or space separated between columns, there should be 3 or 4 columns)

# audio_id (whatever name you want) absolute_audio_path audio_duration (in ms) transcription
1 /home/../1.wav 1234.34 hello world

I have associated doubts with that thread

1. You have to provide the translations text to the Decoder in order to compare results?

2. If you only want to transcribe and you don't have the texts, just want to use the model, how can it be done?

Answers

  1. No you don't need to have transcripts if you want to decode

  2. Checkout their inference module to generate transcripts by following the steps in given link

https://github.com/facebookresearch/wav2letter/wiki

@Adportas
Copy link

Adportas commented Dec 2, 2020

Just a quick answer on list file: the expected format (tab or space separated between columns, there should be 3 or 4 columns)

# audio_id (whatever name you want) absolute_audio_path audio_duration (in ms) transcription
1 /home/../1.wav 1234.34 hello world

I have associated doubts with that thread

1. You have to provide the translations text to the Decoder in order to compare results?

2. If you only want to transcribe and you don't have the texts, just want to use the model, how can it be done?

Answers

1. No you don't need to have transcripts if you want to decode

-> 1 /home/../1.wav 1234.34 hello world

2. Checkout their inference module to generate transcripts by following the steps in given link

https://github.com/facebookresearch/wav2letter/wiki

Hi @ishan-modi
Thanks for answering

  1. I'am a bit confused between decoder and inference purposes + differences. To me look as like the same but obviously for something there are both. Lately seeing the structure of lst file suggested by @tlikhomanenko in the thread above (/home/../1.wav 1234.34 hello world), it seems to me that the decoder is a transcription comparer between the text that comes in the lst file and the text which the model generates with de wav/flac files listed? (it is right?), otherwise the transcription data (hello world) must be unnecessary + illogical (i think).
  2. Once i got the model, I start with the inference framework tutorial but get stuck because the default model generated by me was not of the streaming type, which is the only one supported for the listed examples: simple_streaming_asr_example, multithreaded_streaming_asr_example
  3. So I wanted to convert it to streaming format in order to use, with the tool StreamingTDSModelConverter but it can only be done with the TDS type and my default model is not of this -> thread
  4. So I started testing with the decoder at the suggestion of @abhinavkulkarni in the following thread, hoping it helps me to transcribe some wavs files, but it's a bit frustrating to read that it requires the transcripts or to go back to the inference that has tds and streaming requirements that the model doesn't have
  5. If I have a model that is neither streaming nor tds type, how do I use it to transcribe wav files without the texts obviously. I've been reading wiki and testing for a few weeks without being able to use it.

@tlikhomanenko
Copy link
Contributor

tlikhomanenko commented Dec 11, 2020

@Adportas Inference is done purely on cpu (in a streaming fashion) while decode.cpp is working both on cpu and gpu for any network and then cpu for beam search decoding. Inference right now is working only with conv type networks. Decoder is taking list file and predicts transcription, so you don't need to have targets. At the same time decode.cpp also computes wer. Right now decode.cpp computes wer in any case, so if you just provide empty targets (there is some bug people reported to have empty targets, so please just put fake text there) you still obtain predictions and wer, but you can simply ignore wer.

So please just use decode.cpp with some fake transcripts (or try even empty strings there)!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants