-
-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance Improvement 5 - Cache compiled regexes #995
base: master
Are you sure you want to change the base?
Conversation
✅ Deploy Preview for djlint canceled.
|
Hi. I came here mainly to apologize that I broke your branch with my PR. Sorry for that! I'm not sure if I understand all the changes, but I naively thought that your approach seems a little bit complicated, and I was gonna suggest to just use lru_cache (with what I assumed was a small performance penalty), basically like this: # Call this module something like `regex.py` or `regex_custom_wrappers.py`
import re
from functools import lru_cache
def search(regex, text, use_cache: bool = True, flags=None, **kwargs):
if use_cache:
re_compiled = _compile_cached(regex, flags=flags)
return re_compiled.search(text, **kwargs)
return re.search(regex, text, flags=flags, **kwargs)
# ... more regex functions
@lru_cache(maxsize=256)
def _compile_cached(regex, flags=None) -> re.Pattern:
return re.compile(regex, flags=flags) and then it's basically just find & replace. I've got about 20% faster formatting, I think, but I didn't want to spend too much time before I check in with you. |
No worries :)
Thanks, I hope I made your workflows faster :p
I have an even better idea combining both approaches: import regex as re
def search(regex: ???, text: str, flags:RegexFlags|None=None, **kwargs):
return _compile_cached(regex, flags=flags).search(text, **kwargs)
--- Somewhere
old_search = re.search
re.search = re_search
format_it()
re.search = old_search (Just the signatures have to match). I would make the cache as big as possible. (So, I think I have no clear preference, I just did it like this because I was first like:
And so on. So we have three things we could do:
That's probably on monosans to decide :) |
After thinking a bit more, I think your solution @oliverhaas is superior in every manner. I've implemented it like you suggested (I added you as co-author as you made substantial improvements) |
I've added a few extra commits that optimize stuff that wasn't visible earlier. Based on the results of edx-platform:
|
(Ignore my previous comment if you had seen it) Here some stuff I've tried (average of 10 runs reformatting edx-platform):
So as far as I can see it would probably the best to use |
It seems there is an issue with the
You probably would have to port those regexes that won't work. And if I understand correctly, But I think the best way would be to first merge this PR and then build stuff upon it (imo) |
Weird, I did not get an error, but I only reformatted the edx-platform repo on one system and haven't tested anything else. For me it's hard to keep track of whether From me it's definitely a thumbs up for merging. (Off-topic: I just got a 16-core/32-threads CPU for my old retired workstation for fairly cheap. Haven't benchmarked djlint specifically, but compiling or running tests is literally more than 4 times faster compared to my 4-core laptop, which actually is noticeably making my workflows easier.) |
We will not replace |
@oliverhaas If you want do do more optimizations, there are maybe a few things worth looking into it (And out of scope of this PR):
|
Hey, @JCWasmx86! There are some conflicts because of the changes I've made, sorry. Could you please resolve them and do some profiling to see how much this PR improves the performance now? Thanks! |
I've resolved them with a lot of hard work :/, sadly the git history had to suffer for that HEAD: (4 runs summed up)
This PR: (4 runs summed up)
So there are still improvements |
Hey @monosans, what is needed for this PR to get merged? Can I assist in any kind? |
Please see review comments |
@monosans I'm sorry there are none. Did you maybe forget to finalize the review? |
@@ -118,6 +121,7 @@ def linter( | |||
"match": match.group().strip()[:20], | |||
"message": rule["message"], | |||
}) | |||
build_flags.cache_clear() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one quick question, why was this added? flags
are just constants so the cache shouldn't grow that big
@JCWasmx86 Thanks again for the hard work. I'm at 1.2s for edx-platform with this branch on my desktop, which is crazy to think about where djlint was just a while ago. I basically only have one feature/bug left on my pain points, and just a month ago I was looking for alternatives to djlint... If you could share your profiling script in a gist or something, that would be awesome. I can't seem to get the profiler output quite as readable as your images. |
@oliverhaas I just used gprof2dot and snakeviz and temporarily modified the as code as described here: #986 (comment) (Parallel execution => Serial execution as otherwise the profiler gives stupid results) I've profiled again with
|
Even though the regex module has a cache, it's access is not that fast. E.g. re.sub is a combination of re._compile and pattern.sub. re._compile is checking the flags for e.g. DEBUG values or the verbosity. It uses an enum for the flags. enum.__and__ is quite slow, so even for a cache hit we have around two to three enum.__and__ calls. This patch caches commonly used regexes. The naming probably has to be adjusted. Commonly used is defined as: pattern._compile is hit with this pattern+flags combination more than 1000 times in the netbox repo. That somewhat balances out the time needed for compilation, the runtime speed and the maintenance effort.
Having the cache in helpers.py has adverse effects
Even though the regex module has a cache, it's access is not that fast. E.g. re.sub is a combination of re._compile and pattern.sub. re._compile is checking the flags for e.g. DEBUG values or the verbosity. It uses an enum for the flags. enum.and is quite slow, so even for a cache hit we have around two to three enum.and calls. This patch caches commonly used regexes. The naming probably has to be adjusted.
Commonly used is defined as: pattern._compile is hit with this pattern+flags combination more than 1000 times in the netbox repo. That somewhat balances out the time needed for compilation, the runtime speed and the maintenance effort.
Before:
regex._compile is called 392.678x times, taking 40% of the time
enum.and is called 1.038.531x times, taking 20% of the time
After:
regex._compile is called 7226x times, taking 20% of the time
enum.and is called 267.691x times, taking 8-9% of the time. (That's still a lot)
This can probably improved for more regexes, but I think it's somewhat balanced at the point. The compilation time on my PC for the regexes is at around 0.1s
Timings:
Netbox, parallel: 2.5-2.8s
EDX Platform, parallel: 19s
This is the final patch of the performance improvement patch series. There are probably more improvements, but they are not that low-hanging fruits like the last 6 patches.