-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fuzzy file matching is too fuzzy #43
Comments
That's interesting. Why fuzzaldrin fail ? Will new version solve it ? Fullpath clearly score more in your "Best Result" mark. Zommed in, it's less clear. All in all I believe it's a matter of the preference for file name or full path. |
@jeancroy the problem is that the algorithm (and it sounds like your new optimizations) do not take into account word boundaries or match length. That's very important for fuzzy matching in my experience. See the example here: https://github.com/garybernhardt/selecta#theory-of-operation Basically, the length of the match of
By taking into account word boundaries and minimum match length, app/models/user is clearly better. |
By the way, instead of considering base name vs full path, you might consider giving preference to shorter paths and matches closer to the end of the path. I'm not sure how selecta and ctrl-p-cmatcher do it, but they do it well |
@aaronjensen added test, passed without modification 👍 |
awesome! hope to see it get merged in soon :) Thanks. |
Because you where kind enough to share idea, I'll develop a bit more. See that use case ? Another one: Those expectations kind of puts haystack size at the role of tiebreaker. However in this case "moderator" is almost "model". Also my script would have attached the "u" to "user" instead of "columns". Final both match landmark character (word boundary) at "m" and "u". So I cannot guarantee that large string will be scored like garbage (because it's not garbage) and a tie breaker is exactly what you need. Another interesting fact is that I find consecutive character very intuitive and use it to resolve otherwise contradictory cases. How do i know the lowercase "i" of "itc" should prefer uppercase "I" of "Importance" while lowercase "d" of "diag" should reject uppercase "D" of "Diagonal" in favor of "diagonal" ? Well "diag" score consecutive point in the actual word, while "itc" score consecutive in the acronym of the word ! Where the consecutive are, control the affinity for exact case vs acronym camelCase. It also ensure that if a large string accidentally match a landmark, not part of a sequence requested by query, we still get garbage like score. |
Looks like improvements are currently under development in atom/fuzzaldrin#22
The text was updated successfully, but these errors were encountered: