Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate KenLM trie based language model #1237

Closed
kdavis-mozilla opened this issue Feb 15, 2018 · 2 comments
Closed

Generate KenLM trie based language model #1237

kdavis-mozilla opened this issue Feb 15, 2018 · 2 comments
Assignees

Comments

@kdavis-mozilla
Copy link
Contributor

No description provided.

@kdavis-mozilla
Copy link
Contributor Author

Generated trie based language models using separately the data sets LibriSpeech, VoxForge, and English Wikipedia.

Instead of creating a single trie model the parameters defining the model were varied. For example the n-gram depth, pruning, vocab estimate, array, quantization, bit depth...

Through all of these variations the tries produced by the English Wikipedia text, an 11GB text, were always greater than the current 1.48 GB language model. So they would not result in a size win. Thus were not further pursued.

For the LibriSpeech and VoxForge tests were made against the librivox test clean all with the same acoustic model to determine which language model performed best. Results are here[1].

@lock
Copy link

lock bot commented Jan 3, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Jan 3, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

1 participant