-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ref(pattern): Replace regex based matching #4073
Conversation
96ca156
to
177e7a0
Compare
177e7a0
to
7364c3d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. The very thorough comments definitely help.
Co-authored-by: Sebastian Zivota <[email protected]>
c876556
to
418584d
Compare
relay-pattern/src/wildmatch.rs
Outdated
struct CaseInsensitive; | ||
|
||
impl Matcher for CaseInsensitive { | ||
#[inline(always)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity, does #[inline(always)]
actually bring a performance improvement in the places where you added it?
https://std-dev-guide.rust-lang.org/policy/inline.html#what-about-inlinealways
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't and usually I would just rely on the compiler (which is still allowed to ignore a inline(always)
). I added it here to hopefully force a more aggressive monomorphization which hopefully leads to a better codegen. It's specifically also added on internal helpers.
A lot of 'hopefully's in there, but it should (another Konjunktiv) only make it better not worse.
relay-pattern/src/wildmatch.rs
Outdated
// TODO: implement manual lowercase which remembers if there were 'special' unicode | ||
// conversion involved, if not, there is no recovery necessary. | ||
// TODO: benchmark if a lut from offset -> original offset makes sense. | ||
// TODO: benchmark allocation free and search with proper case insensitive search. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume we could implement a special is_prefix_ignore_case
ourselves?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I essentially conceded to this implementation for now, because it was the easiest to get working. This is definitely a place where we potentially can get a lot of performance with a better implementation. A better impl here will also benefit is_prefix
(it's essentially the same code).
Co-authored-by: Joris Bayer <[email protected]>
Co-authored-by: Joris Bayer <[email protected]>
da4202b
to
6e15d49
Compare
Replaces the regex fallback for complex patterns with a matching strategy using the parsed tokens.
The algorithm is loosely based on the one described by Kirk J Krauss. With some modifications, for example it does not use two loops and has support for character classes as well as alternations.
The implementation beats the current one of
regex_lite
in the added "complex" benchmark:When comparing it with the
regex
crate, it's 2x slower for case sensitive and 8x slower for case insensitive.There are still a lot of performance improvements to be found, some of them are annotated in code and more can probably easily found by running a profiler. Especially the case insensitive code paths are very expensive.
But this is something for future PRs.
The code is fuzzed with:
With full (expected) coverage in wildmatch (except for the impossible condition(s), like nested alternates).