-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: Implement compiler sharding #202
Conversation
Signed-off-by: Will Beason <[email protected]>
Externs should be an argument passed to Driver on initialization. Signed-off-by: Will Beason <[email protected]>
Signed-off-by: Will Beason <[email protected]>
Signed-off-by: Will Beason <[email protected]>
Signed-off-by: Will Beason <[email protected]>
Hopefully two reminders makes it so if I forget, it'll be obvious to code reviewers that I made a mistake. Signed-off-by: Will Beason <[email protected]>
Signed-off-by: Will Beason <[email protected]>
Signed-off-by: Will Beason <[email protected]>
Also one query per object/constraint for Audit. Signed-off-by: Will Beason <[email protected]>
Signed-off-by: Will Beason <[email protected]>
Now Driver only maintains state for Templates and cached objects for referential constraints. Signed-off-by: Will Beason <[email protected]>
Client and Driver are fully usable after calling New() Signed-off-by: Will Beason <[email protected]>
Signed-off-by: Will Beason <[email protected]>
Signed-off-by: Will Beason <[email protected]>
Signed-off-by: Will Beason <[email protected]>
Signed-off-by: Will Beason <[email protected]>
Signed-off-by: Will Beason <[email protected]>
Just pass path elements instead - this eliminates a lot of complexity around both writing paths from slices of strings and back to slices of strings. Signed-off-by: Will Beason <[email protected]>
Signed-off-by: Will Beason <[email protected]>
Signed-off-by: Will Beason <[email protected]>
Signed-off-by: Will Beason <[email protected]>
Signed-off-by: Will Beason <[email protected]>
Signed-off-by: Will Beason <[email protected]>
Signed-off-by: Will Beason <[email protected]>
Signed-off-by: Will Beason <[email protected]>
This is much more efficient than constantly marhsalling/unmarshalling the objects. Also remove "Resource" since it is redundant Signed-off-by: Will Beason <[email protected]>
Signed-off-by: Will Beason <[email protected]>
Signed-off-by: Will Beason <[email protected]>
It can't exist since it's a new storage. Signed-off-by: Will Beason <[email protected]>
Prevent more inconsistent states Signed-off-by: Will Beason <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!! LGTM
@willbeason looks like unit tests were timing out, re-ran it (edit: timed out again). do we need to increase the timeout value or is it a problem with code/tests? here's the full output
|
This prevents cases where we need mutexes to guard different sets of information. Signed-off-by: Will Beason <[email protected]>
Signed-off-by: Will Beason <[email protected]>
Signed-off-by: Will Beason <[email protected]>
It was a problem with the code - I didn't lock properly. It's fixed, and I've moved that part of the code to it's own class so future devs will be unlikely to make the same mistake I did. Otherwise the logic is unchanged - just mutexes moved around. |
Signed-off-by: Will Beason <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This is a massive PR - you may need more than one sitting to consume it.
You may find it helpful to skim https://docs.google.com/document/d/1ibCxaI-7HyWyDjQNL4iDRMHnrauufIMJ1D6LwIKdlsI/edit?usp=sharing again - the high-level design presented in the doc is largely unchanged.
Notable design changes:
data.inventory
to run Go matching logic, and in-rego matching logic costs too much time to run per-Template queries. Consumers of frameworks (e.g. Gatekeeper) will need to implement their own audit-from-cache. There are options for maintaining audit-from-cache in frameworks while keeping these performance gains, but they aren't pretty.Refactoring changes (non-behavioral)
I've refrained from allowing much more parallelization in AddConstraint/AddTemplate operations (as in - adding Constraints and Templates simultaneously, or adding Constraints for different Templates simultaneously). While AddConstraint can be made more parallel, it comes at the cost of a lot of complexity and the absolute gains are not meaningful for known use cases (~5,000 Constraints added per second vs. ~20,000 Constraints added per second).
Update: Benchmark data is available here: https://docs.google.com/spreadsheets/d/1WYLMDdr9w9QwF9NGYvF1bzrYrKwF52Efj4bl7jaGMqA/edit?usp=sharing
AddTemplate
Ignore multithreaded stats (crossed out). We aren't going to do this in this PR as it would hold this up. In the future, we can get an additional ~5x speedup in template compilation on top of what is shown here (e.g. 20x to 100x faster).
Compiling individual ConstraintTemplates is 2x-3x faster than before, from 4-5ms to 1.5-2ms.
Compiling 200 ConstraintTemplates is 10x-20x faster than before, from 3.4s-11s to 0.3s-0.5s.
There is improvement for all tested cases. Tests were run with both simple and complex rego code in Templates, so it is reasonable to assume all cases are improved.
AddConstraint
AddConstraint is ~2x faster than before, so it is theoretically possible to add ~30,000 Constraints to a Gatekeeper installation per second. This is unlikely to be a bottleneck in any use case, so further improvements aren't necessary.
Review
Review is complex.
TLDR: For most use cases, there are significant query improvements. The CPU cost of filtering out objects (and so not running Constraints) is dropped by 20x or more, so users should expect improvements in nearly all cases.
Note that for the data in the review benchmarks, I ran queries where every Constraint returned the same result. So "10 Templates & 100 Constraints / Success" means that all 100 Constraints are run against every object, and every Constraint resulted in no violation. These individual cases do not match reality where most objects are not matched by most Constraints. However, by testing these extreme cases individually, you can construct the expected improvement for various distributions of successes/failures/filters/autorjections.
I've taken data when running queries serially (in 1 thread) and in parallel (in 12 threads). The data for runtime in each page on the linked benchmark data is of throughput, not spent CPU time. In single-threaded mode these are equivalent, but this is not the case for multithreaded. For example, autorejecting a query for 1 Template/10 Constraints in one thread takes 18us, but only 4.3us with 12 threads. Effectively, this means that running in 12 threads means queries can be rejected at a rate of (1/4.3us) = 233,000/second, whereas running in a single thread the throughput is (1/18us) = 55,000/second.
Note that the throughput improvements are universally greater than the CPU improvements. This is largely due to less read locking within Driver - all state is stored in the Rego storage (which has its own locking), so no synchronization is required directly by Driver.
For Constraints which are filtered out against a Review, or which autoreject a Review, speed is universally increased 10x-100x. Since this logic entirely happens in Go instead of Rego, we see a huge performance increase.
Otherwise, there is generally a 2x-3x speedup in Reviews where all Constraints are run for every object. The exception is the case where:
This case is pathological - there shouldn't need to be so many Templates with a single Constraint, each matching every object in a Cluster. At these scales query time is still very reasonable - for example 18ms for 100 Constraints/100 Templates. We wouldn't expect there to be noticeable problems/degradation until a user had ~1,000 Constraints/1,000 Templates, when queries start to take longer than 200ms.