-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add khmer segmenter #203
add khmer segmenter #203
Conversation
hello @ManyTheFish, how do I test this with meilisearch on my local machine? |
Hello @xshadowlegendx, don't hesitate to pull the main branch of Meilisearch and update the milli/Cargo.toml file linking your branch instead of a fixed version of Charabia. This way you will be able to run Meilisearch with your changes! |
@ManyTheFish ok thanks will try it |
Hello @xshadowlegendx, do you manage to test the khmer segmenter with Meilisearch? |
hello @ManyTheFish, sorry for the late reply. Currently I am still testing the khmer segmenter with meilisearch, so I am able to compile and run it with local |
@xshadowlegendx, a test you could try is to push a document containing I hope it helps! |
ok thanks @ManyTheFish let me try to test it again |
hello @ManyTheFish , so I tried pushing document that contains
this screenshot seems like without khmer segmenter it also able to pick up the word but if its not a word it also pick it up so I try to test the khmer segmenter by cloning meilisearch to my local machine, point charabia deps to my local machine one and below are changes to local meilisearch to test with the khmer segmenter
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hello @xshadowlegendx,
I asked for small changes in your PR,
- Do these changes enhance your search results?
- Do you see some search results that are missing?
See you!
Could you add some sentences in benchmarks, please? charabia/charabia/benches/bench.rs Lines 7 to 28 in 366417d
|
hello @ManyTheFish, sorry for such lateness for your questions:
after those changes on my local, now it show some differences, the left one is running with docker and the right one is running with local development. so I tried searching the word |
Co-authored-by: Many the fish <[email protected]>
Co-authored-by: Many the fish <[email protected]>
…bia into add-khmer-segmenter
Hello @xshadowlegendx, Meilisearch is a prefix search engine, so it's expected that if you tip the start of a word Meilisearch find all the documents containing a word starting with the query. |
hello @ManyTheFish , yes that is the case, so it is expected behavior then. I will be testing some document that are more close to real world data to see how the segmenter perform next then |
Hello @xshadowlegendx, Any news on this PR, Do you feel enough comfortable with your tests to merge it? Thanks! |
hello @ManyTheFish, sorry for the late replies, so I tested with about 253 movies that is production data from my workplace weeks ago. the search is very fast, can search movies from titles, actor names and genres, and typo tolerance also works by updating the default since khmer word are smaller size, so I think it is ready to be merged. but the current |
Hello @xshadowlegendx |
@xshadowlegendx |
hello @curquiza, the
should I apply the suggested fix? |
@ManyTheFish can answer this |
hello @xshadowlegendx, Thanks! |
bors try |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me,
thank you @xshadowlegendx for your work!
bors merge
tryBuild succeeded: |
Build succeeded: |
Pull Request
Related issue
Fixes #200
What does this PR do?
khmer
languagePR checklist
Please check if your PR fulfills the following requirements:
Thank you so much for contributing to Meilisearch!