Create language model with kenlm #620

JRMeyer · 2021-03-08T03:08:54Z

JRMeyer
Mar 8, 2021
Maintainer

>>> bem0302
[April 30, 2019, 3:25pm]

Hi guys, when creating language model with KenLM and I have know that
KenLM use the N-grams model. So I have 2 questions for this:

1. When I build an slash *.arpa file from a text.txt file. Did all the
sentences in the text.txt need to have the length from 3 to 5 words
to get the best LM? Because my text is about 12000 sentences and
more than 80% of them have length about 8-15.

2. I'm using this command to build the slash *.arpa file:
./lmplz --text text.txt --arpa text2.arpa --o 5. Did I need to
change the value of the last param (currently 5) to some other value
like 3 or 4 based on my data as above ?

[This is an archived TTS discussion thread from discourse.mozilla.org/t/create-language-model-with-kenlm]

JRMeyer · 2021-03-08T03:08:57Z

JRMeyer
Mar 8, 2021
Maintainer Author

>>> bem0302
[May 2, 2019, 3:49am]

Can anyone help me, please.

[Archived Post]

0 replies

JRMeyer · 2021-03-08T03:08:59Z

JRMeyer
Mar 8, 2021
Maintainer Author

>>> eggonlea
[May 2, 2019, 11:09pm]

1. I don't think the sentence length has that 3-5 words limitation.
2. It depends on your own requirement: do you need 5-gram or 3/4-gram.
Without enough background knowledge, others cannot answer the
question. You can try different parameters and see which one gives
your the best result/performance/resource tradeoff.

[Archived Post]

0 replies

JRMeyer · 2021-03-08T03:09:02Z

JRMeyer
Mar 8, 2021
Maintainer Author

>>> bem0302
[May 3, 2019, 1:30am]

So the 5 value in the command
./lmplz --text text.txt --arpa text2.arpa --o 5 is the N in N-grams LM

[Archived Post]

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create language model with kenlm #620

{{title}}

Replies: 3 comments

{{title}}

{{title}}

{{title}}

Select a reply

Create language model with kenlm #620

JRMeyer Mar 8, 2021 Maintainer

Replies: 3 comments

JRMeyer Mar 8, 2021 Maintainer Author

JRMeyer Mar 8, 2021 Maintainer Author

JRMeyer Mar 8, 2021 Maintainer Author

JRMeyer
Mar 8, 2021
Maintainer

JRMeyer
Mar 8, 2021
Maintainer Author

JRMeyer
Mar 8, 2021
Maintainer Author

JRMeyer
Mar 8, 2021
Maintainer Author