Generate publically releasable corpus to train the language model on #1244

kdavis-mozilla · 2018-02-15T11:09:53Z

Generate a text upon which the language model can be trained and which can be release under the current licensing.

kdavis-mozilla · 2018-05-28T16:24:06Z

In light of the test results of #1237 the corpus will be librispeach's training data set.

kmonachopoulos · 2018-07-29T15:27:50Z

Hello,

Did you generate full mozilla vocab.txt ? There is a LS LM here : http://www.openslr.org/11/ but the librispeech-lm-norm.txt.gz contains transcripts in capital letters. Can we re-build LM using this? Does that make a difference ?

Thanks

kdavis-mozilla · 2018-07-29T18:39:10Z

@kmonachopoulos I've tried building with that and the quality of recognition goes down.

lock · 2019-01-02T21:52:57Z

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

kdavis-mozilla added foundations server time labels Feb 15, 2018

kdavis-mozilla self-assigned this Feb 15, 2018

lissyx mentioned this issue Apr 23, 2018

Full size of Mozilla vocab.txt #1351

Closed

kdavis-mozilla closed this as completed May 28, 2018

lock bot locked and limited conversation to collaborators Jan 2, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generate publically releasable corpus to train the language model on #1244

Generate publically releasable corpus to train the language model on #1244

kdavis-mozilla commented Feb 15, 2018

kdavis-mozilla commented May 28, 2018

kmonachopoulos commented Jul 29, 2018

kdavis-mozilla commented Jul 29, 2018

lock bot commented Jan 2, 2019

Generate publically releasable corpus to train the language model on #1244

Generate publically releasable corpus to train the language model on #1244

Comments

kdavis-mozilla commented Feb 15, 2018

kdavis-mozilla commented May 28, 2018

kmonachopoulos commented Jul 29, 2018

kdavis-mozilla commented Jul 29, 2018

lock bot commented Jan 2, 2019