Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds support for the use of external dictionaries for segmenters backed by lindera #326

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

PedroTurik
Copy link
Contributor

Pull Request

Related issue

Fixes #322

What does this PR do?

  • This PR adds the features korean-segmentation-external and japanese-segmentation-external, that allow the user to decouple the download of the japanese and korean dictionaries from the compilation process, and configure the path to already downloaded lindera compatible dictionaries, with the MEILISEARCH_JAPANESE_EXTERNAL_DICTIONARY and MEILISEARCH_KOREAN_EXTERNAL_DICTIONARY env vars.

this PR is not finished. Since we cant control which dict the user will use, we cant be sure about the segmentation process, so activating the features will disable segmentation tests for now. Another thing worth mentioning is that for lindera to use an external dict, you need generate it through the lindera CLI. The process isnt exactly obvious and needs to be documented. Its described here

@slatian
Copy link

slatian commented Feb 4, 2025

Thanks for working on this!

If writing documentation is currently holding this back, I'd help out with that.

@ManyTheFish
Copy link
Member

Hey @PedroTurik,
Let me know when the work is done and when the CI passes.
I'll review your PR!

@PedroTurik
Copy link
Contributor Author

Hey @slatian

Yes, you are correct. Where do you think I should document the process to use this feature?

Also, thanks @ManyTheFish. I will fix the CI error and open the PR for review as you said

@slatian
Copy link

slatian commented Feb 6, 2025

Where do you think I should document the process to use this feature?

As a user of the crate I'd expect the documentation for this with the rest of the documentation on docs.rs -> Document it in a module.

Depending on the length, either a documentation module or a section beneath the "Build features" section in the main module.

But @ManyTheFish will probably have an opinion on that too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow loading the lindera dictionaries at runtime instead of compiling them in
3 participants