How language model is used in deepspeech #22

JRMeyer · 2021-03-08T00:18:52Z

JRMeyer
Mar 8, 2021
Maintainer

>>> jiping_s
[December 3, 2017, 3:12pm]

In traditional speech recognizers language model specifies what word
sequence is possible. Deepspeech seems to generate final output based on
statistics at letter level (not word level).

I have a language model containing a few hundred words, in arpa: slash
slash slash data slash
ngram 1=655 slash
ngram 2=3133 slash
ngram 3=4482

slash slash 1-grams: slash
0 ~~-0.8111794 slash
... slash
by which the word sequence 'would you like to try our strudel for twenty
five cents' is possible. However the final output is not what I expected
if language model is used in traditional way.

Here is detailed process:

slash (1 slash ) Building language model -

./lmplz slash --text corpus.txt slash --arpa corpus.arpa slash --o 3 slash
./build_binary -T -s corpus.arpa lm.binary

slash (2 slash ) Building trie -

./generate_trie models/alphabet.txt lm.binary corpus.txt trie

(in the building trie step, alphabet.txt is the original file from
Deepspeech release, lm.binary and corpus.txt are my own files from step
(1), and trie is the generated new file)

slash (3 slash ) run deepspeech (wave file says 'would you like to try our strudel
for twenty five cents?') -

(3.1) First, use my language model with Deepspeech's original acoustic
model (the .pb file) -

deepspeech models/output_graph.pb test13.wav models/alphabet.txt
./lm.binary ./trie

output :

Loading model from file models/output_graph.pb slash
Loaded model in 0.204s. slash
Loading language model from files ./lm.binary ./trie slash
Loaded language model in 0.004s. slash
Running inference. slash
would you like to trialastruodle for twenty five cents slash
Inference took 5.162s for 4.057s audio file.

(3.2) Then use everything of Deepspeech

deepspeech models/output_graph.pb test13.wav models/alphabet.txt
models/lm.binary models/trie

output:

Loading model from file models/output_graph.pb slash
Loaded model in 0.223s. slash
Loading language model from files models/lm.binary models/trie slash
Loaded language model in 1.092s. slash
Running inference. slash
would i like to trialastruodlefortwentyfvecents slash
Inference took 5.141s for 4.057s audio file. slash
(deepspeech-venv)jeremy slash levono: slash ~/DeepSpeech slash $

Now from the output of both runs:

would you like to trialastruodle for twenty five cents slash
would i like to trialastruodlefortwentyfvecents

Deepspeech seems to use the language model in a way different from the
traditional way: the letter sequence such as slash ' trialastruodle slash ' has
only rough similarity to what should be the word sequence 'try our
strudel' which is what the language model contains. It seems that after
the neural network generates letter sequences, language model definitely
is used to do a second layer processing, so that we can see the results
above are different due to the use of different language models. My
question is why the strange letter sequence are still there?

[This is an archived TTS discussion thread from discourse.mozilla.org/t/how-language-model-is-used-in-deepspeech]

JRMeyer · 2021-03-08T00:18:55Z

JRMeyer
Mar 8, 2021
Maintainer Author

>>> kdavis
[December 3, 2017, 5:35pm]

There's an explanation of how the language model is integrated in to
Deep Speech in our blog post A Journey to slash <10% Word Error
Rate.

If you have any questions after reading our blog post, feel free to ask
them here.

Thanks for taking the time to dig in to our code!

[Archived Post]

0 replies

JRMeyer · 2021-03-08T00:18:57Z

JRMeyer
Mar 8, 2021
Maintainer Author

>>> jiping_s
[December 4, 2017, 3:41am]

Thanks for the reply and the info links. I will read the relevant papers
in detail.

Coming back to my tests. The first result contains a letter sequence
illegal to the language model:

slash ' trialastruodle slash ' slash
where the expected words are slash
slash 'try our strudel slash '

I assume that this indicates that the decoder has too low confidence in
that part of the audio to produce the correct words. To me this is both
a bad point and a good point. A bad point in that the decoder should
return words only legal to the language model. I also view this behavior
as a good point in that it indicates that for that part of the audio the
decoder has lower decoding confidence - a useful piece of information. I
can use a post processor which checks if there are such illegal letter
sequences. If yes, the post processor can apply sequence similarity to
transform them into word sequences legal to the language model, while
assigning a lower confidence for the next processing stages -- NLU,
dialogue management and so on.

[Native client not returning output

[Archived Post]

0 replies

JRMeyer · 2021-03-08T00:19:00Z

JRMeyer
Mar 8, 2021
Maintainer Author

>>> b.r
[February 25, 2018, 4:05am]

, thanks a lot for the blog
post link. Very informative!!

What were the corpus and vocabulary that went into building your
lm.binary and trie that are released at: slash
https://github.com/mozilla/DeepSpeech/releases/download/v0.1.1/deepspeech-0.1.1-models.tar.gz

Did you use the same vocab.txt that exists under DeepSpeech/data/lm? I
see that this vocab.txt is pretty much based on TEDLIUM transcripts. Can
you confirm?

[Archived Post]

0 replies

JRMeyer · 2021-03-08T00:19:02Z

JRMeyer
Mar 8, 2021
Maintainer Author

>>> kdavis
[February 26, 2018, 8:51am]

The corpus was a combination of Librivox, Fisher, Switchboard training
sets along with some other data.

Built the trie; so, I'll let
him describe that.

[Archived Post]

0 replies

JRMeyer · 2021-03-08T00:19:05Z

JRMeyer
Mar 8, 2021
Maintainer Author

>>> reuben
[February 26, 2018, 10:45pm]

The trie file was built from data/lm/vocab.txt

[Archived Post]

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How language model is used in deepspeech #22

{{title}}

Replies: 5 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

How language model is used in deepspeech #22

JRMeyer Mar 8, 2021 Maintainer

Replies: 5 comments

JRMeyer Mar 8, 2021 Maintainer Author

JRMeyer Mar 8, 2021 Maintainer Author

JRMeyer Mar 8, 2021 Maintainer Author

JRMeyer Mar 8, 2021 Maintainer Author

JRMeyer Mar 8, 2021 Maintainer Author

JRMeyer
Mar 8, 2021
Maintainer

JRMeyer
Mar 8, 2021
Maintainer Author

JRMeyer
Mar 8, 2021
Maintainer Author

JRMeyer
Mar 8, 2021
Maintainer Author

JRMeyer
Mar 8, 2021
Maintainer Author

JRMeyer
Mar 8, 2021
Maintainer Author