Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accent support for ObjectFilters.text #702

Closed
Doc1faux opened this issue Mar 4, 2022 · 3 comments
Closed

Accent support for ObjectFilters.text #702

Doc1faux opened this issue Mar 4, 2022 · 3 comments

Comments

@Doc1faux
Copy link

Doc1faux commented Mar 4, 2022

Hi @anidotnet :)

As #144, I faced an issue with accents but when I search for text e.g. for user firstname.
For instance, my own firstname contains an accent on the first 'e' character (Sébastien) and when I search for it in a user collection in a LIKE way with ObjectFilters.text("firstname", "*se*") it is not returned (obviously, with a "*sé*" search it is ;)).
I already added an index on the searched field with@Indices({ @Index(value = "firstname", type = IndexType.Fulltext) }) so I suppose the only fix should be to add a Collator parameter as well on ObjectFilters.text method?

@Doc1faux
Copy link
Author

Doc1faux commented Mar 4, 2022

Diving into the code, I've just found TextTokenizer and TextIndexingService classes and a way to pass them upon database creation according to your documentation so it should do the job, I test it :)

@anidotnet
Copy link
Contributor

Has it resolved your issue?

@Doc1faux
Copy link
Author

Doc1faux commented Mar 4, 2022

Unfortunately, TextTokenizer isn't helping as it is only a list of stop words.
TextIndexingService could have helped but a known and stable indexing service for Android like Apache Lucene you mentioned in the documentation does not seem to exists :/
I took a look at the Collator class for this specific case but se and are still different strings for it and this is expected for sorting feature.
The only solution I've gone for is to add a normalized firstname field in the collection which is set upon document insertion with Normalizer.normalize(firstname, Normalizer.Form.NFD).replaceAll("[^\\p{ASCII}]", "") and used only for searching. The search term is also normalized obviously.

@anidotnet anidotnet closed this as not planned Won't fix, can't repro, duplicate, stale Aug 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants