respect_word_boundaries: true breaks when first character of the search term is non-ASCII #1916

spacekpe · 2022-11-20T14:27:09Z

I did:

Search for if my issue has already been submitted
Make sure I'm reporting something precise that needs to be fixed
Give my issue a descriptive and concise title
Create a minimal working example on JsFiddle or Codepen
(or gave a link to a demo on the Selectize docs)
Indicate precise steps to reproduce in numbers and the result,
like below

Non-ASCII/Unicode character at the beginning of an option string cannot be looked up using search.

Steps to reproduce:

Use code from https://jsfiddle.net/w9gecnyo/4/
Search for one of the two Unicode characters: "č" or "Č"

TL;DR Define two options, like "Čápkova" and "Ečerova", and then search for "č" or "Č" with respect_word_boundaries enabled (default).

Expected result:
Only option "Čápkova" should be listed (there is a match on the first letter, i.e. word boundary).

Actual result:
Only option "Ečerova" is listed - presumably because non-ASCII character does not act as a word boundary?!

As far as I can tell, this is caused by \b added in Sifter for respect_word_boundaries: true. This looks like problem with \b definition, so Unicode-aware word boundary detection needs some other trick.

This attempt at regex101.com seems to confirm that:

SO seems to somewhat agree with this diagnosis:
https://stackoverflow.com/questions/10590098/javascript-regexp-word-boundaries-unicode-characters

The text was updated successfully, but these errors were encountered:

risadams · 2022-11-22T13:57:44Z

For now, I'm reverting the default behavior of respect_word_boundries to false. This will work the same as it had prior to introduce the new feature. I do think that we need much better unicode support in general, which will be a bigger fix.

Good catch!

This address a bad default causing issues with non-ascii chars in sifter, and default to headless chrome instead of the unmaintained phantomJS for unit testing

heyyo-droid · 2022-12-28T13:18:46Z

Hi guys,
I think I'm facing the same issue. But in my case, searching for Hebrew letter doesn't return anything.

for example searching: ש
English letter are OK.

https://jsfiddle.net/sw9Lkcdy/4/

heyyo-droid · 2023-02-07T09:30:09Z

Any chance this default value respect_word_boundaries set to false, will be part of a release ?
We are using library coming from npmjs, they don't provide dev version.
https://www.npmjs.com/package/@selectize/selectize

AndersFreund · 2023-02-15T15:45:34Z

Found a related issue here with a solution to set respect_word_boundaries
Below will set respect_word_boundaries to default false.
Fixed it for me. Please let us know if a more elegant solution exists.

var getSearchOptions = Selectize.prototype.getSearchOptions;
Selectize.prototype.getSearchOptions = function () {
	var options = getSearchOptions.apply(this, arguments);
	options.respect_word_boundaries = false;
	return options;
};

github-actions · 2023-06-16T02:30:32Z

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days

pspacek · 2023-06-20T16:09:17Z

Issues don't magically fix themselves, do they? (That's reaction to the bot.)

big-dream · 2023-10-10T07:08:27Z

This problem also occurs when searching for Chinese. There are 一二三 in the options, and the option cannot be searched by typing 一.
example: https://codepen.io/big-dream-the-solid/pen/poqGWrB

My solution for this problem is to use an older version like: 4.6.9

rcuhljr · 2023-11-06T16:25:04Z

The example from @heyyo-droid also breaks with the english dash - character, if you have an item like Item - 3 it will filter out as soon as you type a dash.

github-actions · 2024-03-06T02:01:07Z

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days

spacekpe · 2024-03-07T13:23:14Z

Bot, this issue is still relevant

github-actions · 2024-07-06T02:10:11Z

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days

bplace · 2024-07-10T08:41:05Z

Hi, can't we just activate unicode support in the regular expression ?

See the initial example with u flag activated:

github-actions · 2024-11-08T02:34:52Z

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days

risadams added a commit that referenced this issue Nov 22, 2022

Address #1916, Remove PhantomJs

74cf7fb

This address a bad default causing issues with non-ascii chars in sifter, and default to headless chrome instead of the unmaintained phantomJS for unit testing

github-actions bot added the no-issue-activity label Jun 16, 2023

github-actions bot removed the no-issue-activity label Jun 21, 2023

github-actions bot added the no-issue-activity label Mar 6, 2024

github-actions bot removed the no-issue-activity label Mar 8, 2024

github-actions bot added the no-issue-activity label Jul 6, 2024

github-actions bot removed the no-issue-activity label Jul 11, 2024

github-actions bot added the no-issue-activity label Nov 8, 2024

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Nov 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

respect_word_boundaries: true breaks when first character of the search term is non-ASCII #1916

respect_word_boundaries: true breaks when first character of the search term is non-ASCII #1916

spacekpe commented Nov 20, 2022

risadams commented Nov 22, 2022

heyyo-droid commented Dec 28, 2022

heyyo-droid commented Feb 7, 2023

AndersFreund commented Feb 15, 2023

github-actions bot commented Jun 16, 2023

pspacek commented Jun 20, 2023 •

edited

Loading

big-dream commented Oct 10, 2023 •

edited

Loading

rcuhljr commented Nov 6, 2023

github-actions bot commented Mar 6, 2024

spacekpe commented Mar 7, 2024

github-actions bot commented Jul 6, 2024

bplace commented Jul 10, 2024

github-actions bot commented Nov 8, 2024

respect_word_boundaries: true breaks when first character of the search term is non-ASCII #1916

respect_word_boundaries: true breaks when first character of the search term is non-ASCII #1916

Comments

spacekpe commented Nov 20, 2022

risadams commented Nov 22, 2022

heyyo-droid commented Dec 28, 2022

heyyo-droid commented Feb 7, 2023

AndersFreund commented Feb 15, 2023

github-actions bot commented Jun 16, 2023

pspacek commented Jun 20, 2023 • edited Loading

big-dream commented Oct 10, 2023 • edited Loading

rcuhljr commented Nov 6, 2023

github-actions bot commented Mar 6, 2024

spacekpe commented Mar 7, 2024

github-actions bot commented Jul 6, 2024

bplace commented Jul 10, 2024

github-actions bot commented Nov 8, 2024

pspacek commented Jun 20, 2023 •

edited

Loading

big-dream commented Oct 10, 2023 •

edited

Loading