Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Segment number into word instead of chars (#271) #311

Merged
merged 1 commit into from
Oct 14, 2024

Conversation

dqkqd
Copy link
Contributor

@dqkqd dqkqd commented Oct 6, 2024

Pull Request

Related issue

Fixes #271

What does this PR do?

  • Try to return Match if a string can be represented as number in AhoSegmentedStrIter
  • Add numbers to existing testcases
  • Add more 2 more testcases for segmenting and tokenizing number: segmenter_segment_number and tokenize_number

PR checklist

Please check if your PR fulfills the following requirements:

  • Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
  • Have you read the contributing guidelines?
  • Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!

@ManyTheFish
Copy link
Member

Hello @dqkqd, thank you for the work!
LGTM, we may want to go further by categorizing the tokens as TokenKind::Number in the Classifier

But it would be better in another PR

@ManyTheFish
Copy link
Member

bors merge

Copy link
Contributor

meili-bors bot commented Oct 14, 2024

Build succeeded:

@meili-bors meili-bors bot merged commit 1b48ada into meilisearch:main Oct 14, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Numbers are not segmented the same way depending on the Script/Language
2 participants