-
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
respect_word_boundaries: true breaks when first character of the search term is non-ASCII #1916
Comments
For now, I'm reverting the default behavior of respect_word_boundries to false. This will work the same as it had prior to introduce the new feature. I do think that we need much better unicode support in general, which will be a bigger fix. Good catch! |
This address a bad default causing issues with non-ascii chars in sifter, and default to headless chrome instead of the unmaintained phantomJS for unit testing
Hi guys,
|
Any chance this default value respect_word_boundaries set to false, will be part of a release ? |
Found a related issue here with a solution to set
|
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days |
Issues don't magically fix themselves, do they? (That's reaction to the bot.) |
This problem also occurs when searching for Chinese. There are
|
The example from @heyyo-droid also breaks with the english dash |
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days |
Bot, this issue is still relevant |
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days |
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days |
I did:
(or gave a link to a demo on the Selectize docs)
like below
Non-ASCII/Unicode character at the beginning of an option string cannot be looked up using search.
Steps to reproduce:
TL;DR Define two options, like "Čápkova" and "Ečerova", and then search for "č" or "Č" with respect_word_boundaries enabled (default).
Expected result:
Only option "Čápkova" should be listed (there is a match on the first letter, i.e. word boundary).
Actual result:
Only option "Ečerova" is listed - presumably because non-ASCII character does not act as a word boundary?!
As far as I can tell, this is caused by
\b
added in Sifter forrespect_word_boundaries: true
. This looks like problem with\b
definition, so Unicode-aware word boundary detection needs some other trick.This attempt at regex101.com seems to confirm that:
SO seems to somewhat agree with this diagnosis:
https://stackoverflow.com/questions/10590098/javascript-regexp-word-boundaries-unicode-characters
The text was updated successfully, but these errors were encountered: