-
Notifications
You must be signed in to change notification settings - Fork 769
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No violations found with gatekeeper 3.8.0 although there are violations and they were found with 3.7.2. #2026
Comments
I am also experiencing the same issue upgrading helm chart from 3.7.2 to 3.8.0. Gatekeeper no longer warns on policy failure. I also have the same namespace exemption config. Before upgrade:
After upgrade:
I also tried changing the policy to deny, same behavior after upgrade. Pod gets created as if the constraint doesn't exist. After removing the Same behavior in |
I created a similar config resource in a kind cluster and it appears to function properly:
I wonder if this is a case of the webhook failing open? @mrjoelkamp When you say "Did some digging and it seems that the constraints aren't running for the request." what did you notice? What are:
@mbrowatzki It looks like you are observing a lack of expected audit results
|
Also @mbrowatzki is it just a namespace label constraint, or do other constraints break? |
Audit also appears to be working for me:
|
@mbrowatzki It looks like you are observing a lack of expected audit results What are the contents of your gatekeeper config if you run kubectl get -n gatekeeper-system config -oyaml ? (please preserve whitespace as it might be important)
Do you see any crashing on your pods in gatekeeper-system? |
@maxsmythe thanks for looking into this
I checked the gatekeeper-controller-manager pod logs. I expected to see something related to admission like the following:
Here is the status of one of the many psp constraints we have active:
Pod status (I am using a custom namespace
No abnormal log entries
I am using AWS EKS, I enabled control plan API server logs and there are only a few gatekeeper related logs. I am not seeing any webhook related entries.
No audit violations for resources that were admitted with the I also tested with the gatekeeper webhook configurations to Here is the output for the
|
Thanks for the data! This is not good, users who have configs should probably avoid this release until we sort this out. @open-policy-agent/gatekeeper-maintainers @ritazh @sozercan @willbeason I was able to replicate the bug with this config (reliably, no flaking):
Poking around, it looks like OPAs data cache is somehow getting wiped of constraints even though the constraint framework knows about the constraints. Here is some debug output from a custom build showing the OPA cache being empty but other caches being full:
The logs show the constraint framework matching against the Removing these lines of code from the config controller appears to fix the behavior (though we can't actually remove those lines, since they're needed to avoid stale cached data):
It looks like somehow calling constraint storage code: code defining constraint storage root: data removal code: data removal root: So I'm not sure how one is clobbering the other. That's the current state of what I've found out. |
Actually, here is the debug run where the "remove data" code is disabled:
still no dumping of data, so the driver dump command may not be dumping the constraints, though note that the tracing is much more active. |
Ah, there is a bug in Dump(), it's querying Fixing that to see what actually lives in |
Now Dump() is behaving as expected. It looks like cache wiping is not the issue. Here is the output for the "bad" build, (without
|
This fixes the issues uncovered in open-policy-agent#2026 Signed-off-by: Max Smythe <[email protected]>
Fixes open-policy-agent/gatekeeper#2026 Signed-off-by: Max Smythe <[email protected]>
Fixes open-policy-agent/gatekeeper#2026 Signed-off-by: Max Smythe <[email protected]>
Fixes open-policy-agent/gatekeeper#2026 Signed-off-by: Max Smythe <[email protected]>
Fixes open-policy-agent/gatekeeper#2026 Signed-off-by: Max Smythe <[email protected]>
Fixes open-policy-agent/gatekeeper#2026 Signed-off-by: Max Smythe <[email protected]>
…-policy-agent#222) Fixes open-policy-agent/gatekeeper#2026 Signed-off-by: Max Smythe <[email protected]>
…-policy-agent#222) Fixes open-policy-agent/gatekeeper#2026 Signed-off-by: Max Smythe <[email protected]>
* Make sure that the Rego hook is well-behaved with no data cache (#222) Fixes open-policy-agent/gatekeeper#2026 Signed-off-by: Max Smythe <[email protected]> * Upgrade linter Signed-off-by: Max Smythe <[email protected]> * Upgrade workflows Signed-off-by: Max Smythe <[email protected]>
@mbrowatzki @mrjoelkamp thanks for reporting this! We fixed this in v3.8.1, let us know if this fixes your issue |
Hello, |
@sozercan @maxsmythe Thanks for the update! It is working as expected again. I appreciate the help on this! |
Thank you for reporting the issue! |
What steps did you take and what happened:
With gatekeeper 3.7.2 there are Total Violations: 4 found in my ns-must-have-label constraint. It's similair to the all_ns_must_have_gatekeeper.yaml example under the demo examples...
After update to 3.8.0 there are: Total Violations: 0.
I played a litte bit around.
Fresh install with 3.8.0 without config (no excluded Namespaces) shows Total Violations: 8
After deploying the following config there are Total Violations: 0 again.
apiVersion: config.gatekeeper.sh/v1alpha1 kind: Config metadata: name: config namespace: '{{ .Release.Namespace }}' spec: match: - excludedNamespaces: ["kube-*", "gatekeeper-system"] processes: ["*"]
What did you expect to happen:
Same amount of violations in 3.7.2 and 3.8.0
Anything else you would like to add:
Same behavior with 3.9.0-beta.0.
Environment:
The text was updated successfully, but these errors were encountered: