Generate KenLM trie based language model #1237

kdavis-mozilla · 2018-02-15T10:46:09Z

No description provided.

kdavis-mozilla · 2018-05-28T16:20:05Z

Generated trie based language models using separately the data sets LibriSpeech, VoxForge, and English Wikipedia.

Instead of creating a single trie model the parameters defining the model were varied. For example the n-gram depth, pruning, vocab estimate, array, quantization, bit depth...

Through all of these variations the tries produced by the English Wikipedia text, an 11GB text, were always greater than the current 1.48 GB language model. So they would not result in a size win. Thus were not further pursued.

For the LibriSpeech and VoxForge tests were made against the librivox test clean all with the same acoustic model to determine which language model performed best. Results are here[1].

lock · 2019-01-03T00:52:51Z

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

kdavis-mozilla added the enhancement label Feb 15, 2018

kdavis-mozilla self-assigned this Feb 15, 2018

kdavis-mozilla closed this as completed May 28, 2018

This was referenced May 28, 2018

Generate publically releasable corpus to train the language model on #1244

Closed

Language model does not include apostrophe #955

Closed

lock bot locked and limited conversation to collaborators Jan 3, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generate KenLM trie based language model #1237

Generate KenLM trie based language model #1237

kdavis-mozilla commented Feb 15, 2018

kdavis-mozilla commented May 28, 2018

lock bot commented Jan 3, 2019

Generate KenLM trie based language model #1237

Generate KenLM trie based language model #1237

Comments

kdavis-mozilla commented Feb 15, 2018

kdavis-mozilla commented May 28, 2018

lock bot commented Jan 3, 2019