-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🐛 Fixed a bug in newGVKFixupWatcher which caused the metadata informer to hang #1790
🐛 Fixed a bug in newGVKFixupWatcher which caused the metadata informer to hang #1790
Conversation
Hi @wallrj. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
pkg/cache/internal/informers_map.go
Outdated
@@ -434,12 +434,12 @@ func (w *gvkFixupWatcher) run() { | |||
w.ch <- e | |||
} | |||
w.wg.Done() | |||
close(w.ch) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be moved above w.wg.Done()
so that there's a guarantee that w.ch
is closed by the time w.Stop()
returns?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found watch.Filter which has the correct channel closing behaviour so I've re-implemented. See what you think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! TIL about watch.Filter
.
0359498
to
cab8dd1
Compare
pkg/cache/internal/informers_map.go
Outdated
watcher, | ||
func(in watch.Event) (out watch.Event, keep bool) { | ||
in.DeepCopyInto(&out) | ||
out.Object.GetObjectKind().SetGroupVersionKind(gvk) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original implementation mutated the watch event.
I've used DeepCopy here, but I'm not sure if it's necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think DeepCopy is necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The in watch.Event
is passed in by value, but it has a pointer to a runtime.Object
(e.g. &metav1.PartialObjectMetadata
) which might be mutated elsewhere, so I thought it was safest to DeepCopy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, no strong objections to DeepCopy from me!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't think it is necessary, but no strong objections from me too :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to remove deepcopy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Done.
f.Modify(newTestType("bar")) | ||
f.Delete(newTestType("bar")) | ||
f.Error(newTestType("error: blah")) | ||
f.Stop() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: The test sends events to the wrapped watcher and then checks the results that come from the wrapper.
/ok-to-test |
// TestGVKFixupWatcher tests that gvkFixupWatcher behaves like the watch that it | ||
// wraps and that it overrides the GVK. | ||
// Adapted from https://github.com/kubernetes/kubernetes/blob/adbda068c1808fcc8a64a94269e0766b5c46ec41/staging/src/k8s.io/apimachinery/pkg/watch/watch_test.go#L33-L78 | ||
func TestGVKFixupWatcher(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK, the rest of controller-runtime's tests use Ginkgo/Gomega. Would you mind refactoring this test to be consistent with the rest of the repo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've re-implemented the tests using Gomega Expect
, but tried to retain the spirit of the original tests so that they can be easily compared to the original.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also replaced the convoluted fake type with actual metav1.PartialObjectMetadata
Signed-off-by: Richard Wall <[email protected]>
Signed-off-by: Richard Wall <[email protected]>
cab8dd1
to
5f4904e
Compare
Some evidence that the problem has been fixed: Test fails with the old implementation richard 0fbdc4b ~ projects kubernetes-sigs controller-runtime 1 go test ./pkg/cache/internal/... -v --ginkgo.v
=== RUN TestSource
Running Suite: Cache Internal Suite
===================================
Random Seed: 1643975186
Will run 1 of 1 specs
gvkFixupWatcher
behaves like watch.FakeWatcher
/home/richard/projects/kubernetes-sigs/controller-runtime/pkg/cache/internal/informers_map_test.go:35
STEP: Fixing up watch.EventType: ADDED and passing it on
STEP: Fixing up watch.EventType: MODIFIED and passing it on
STEP: Fixing up watch.EventType: MODIFIED and passing it on
STEP: Fixing up watch.EventType: DELETED and passing it on
STEP: Fixing up watch.EventType: ERROR and passing it on
• Failure [1.002 seconds]
gvkFixupWatcher
/home/richard/projects/kubernetes-sigs/controller-runtime/pkg/cache/internal/informers_map_test.go:34
behaves like watch.FakeWatcher [It]
/home/richard/projects/kubernetes-sigs/controller-runtime/pkg/cache/internal/informers_map_test.go:35
Timed out after 1.001s.
Expected
<<-chan watch.Event | len:0, cap:0>: 0xc0000be540
to be closed
/home/richard/projects/kubernetes-sigs/controller-runtime/pkg/cache/internal/informers_map_test.go:79
------------------------------
Summarizing 1 Failure:
[Fail] gvkFixupWatcher [It] behaves like watch.FakeWatcher
/home/richard/projects/kubernetes-sigs/controller-runtime/pkg/cache/internal/informers_map_test.go:79
Ran 1 of 1 Specs in 1.002 seconds
FAIL! -- 0 Passed | 1 Failed | 0 Pending | 0 Skipped
--- FAIL: TestSource (1.00s)
FAIL
FAIL sigs.k8s.io/controller-runtime/pkg/cache/internal 1.039s
FAIL
Test succeeds with the new implementationrichard 1789-close-metadata-watch-result-channel ~ projects kubernetes-sigs controller-runtime 1 go test ./pkg/cache/internal/... -v --ginkgo.v
=== RUN TestSource
Running Suite: Cache Internal Suite
===================================
Random Seed: 1643975277
Will run 1 of 1 specs
gvkFixupWatcher
behaves like watch.FakeWatcher
/home/richard/projects/kubernetes-sigs/controller-runtime/pkg/cache/internal/informers_map_test.go:35
STEP: Fixing up watch.EventType: ADDED and passing it on
STEP: Fixing up watch.EventType: MODIFIED and passing it on
STEP: Fixing up watch.EventType: MODIFIED and passing it on
STEP: Fixing up watch.EventType: DELETED and passing it on
STEP: Fixing up watch.EventType: ERROR and passing it on
•
Ran 1 of 1 Specs in 0.010 seconds
SUCCESS! -- 1 Passed | 0 Failed | 0 Pending | 0 Skipped
--- PASS: TestSource (0.01s)
PASS
ok sigs.k8s.io/controller-runtime/pkg/cache/internal 0.027s
A sample informer now closes and re-establishes watches regularlydiff --git a/go.mod b/go.mod
index fb929dc..99e989c 100644
--- a/go.mod
+++ b/go.mod
@@ -14,6 +14,8 @@ require (
sigs.k8s.io/controller-runtime v0.11.0
)
+replace sigs.k8s.io/controller-runtime => ../../kubernetes-sigs/controller-runtime
+
require (
cloud.google.com/go v0.81.0 // indirect
github.com/Azure/go-autorest v14.2.0+incompatible // indirect
richard master ~ projects wallrj partial-object-watch 1 go run ./cmd/ctrl-metadata-informer --zap-log-level 8 --zap-time-encoding iso8601 --namespace test1
2022-02-04T10:21:54.413Z LEVEL(-6) Config loaded from file: /home/richard/.kube/config
...
2022-02-04T11:38:59.523Z INFO add {"name": "test1/foo-9673"}
2022-02-04T11:39:23.998Z LEVEL(-4) pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Watch close - *v1.PartialObjectMetadata total 15 items received
2022-02-04T11:39:23.998Z INFO GET https://127.0.0.1:46255/api/v1/namespaces/test1/secrets?allowWatchBookmarks=true&resourceVersion=78217&timeout=8m3s&timeoutSeconds=483&watch=true
2022-02-04T11:39:23.998Z INFO Request Headers:
2022-02-04T11:39:23.998Z INFO Accept: application/vnd.kubernetes.protobuf;as=PartialObjectMetadata;g=meta.k8s.io;v=v1,application/json;as=PartialObjectMetadata;g=meta.k8s.io;v=v1,application/json
2022-02-04T11:39:23.998Z INFO User-Agent: ctrl-metadata-informer/v0.0.0 (linux/amd64) kubernetes/$Format
2022-02-04T11:39:24.000Z INFO Response Status: 200 OK in 1 milliseconds
2022-02-04T11:39:24.000Z INFO Response Headers:
2022-02-04T11:39:24.000Z INFO Content-Type: application/vnd.kubernetes.protobuf;stream=watch
2022-02-04T11:39:24.000Z INFO Date: Fri, 04 Feb 2022 11:39:24 GMT
2022-02-04T11:39:24.000Z INFO Cache-Control: no-cache, private
2022-02-04T11:39:59.621Z INFO add {"name": "test1/foo-25053"}
2022-02-04T11:40:59.744Z INFO add {"name": "test1/foo-10768"}
2022-02-04T11:41:59.849Z INFO add {"name": "test1/foo-7385"}
2022-02-04T11:42:59.958Z INFO add {"name": "test1/foo-24315"}
2022-02-04T11:44:00.062Z INFO add {"name": "test1/foo-26661"}
2022-02-04T11:45:00.151Z INFO add {"name": "test1/foo-21396"}
2022-02-04T11:46:00.299Z INFO add {"name": "test1/foo-23017"}
2022-02-04T11:47:00.451Z INFO add {"name": "test1/foo-21648"}
2022-02-04T11:47:27.000Z LEVEL(-4) pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Watch close - *v1.PartialObjectMetadata total 17 items received
2022-02-04T11:47:27.001Z INFO GET https://127.0.0.1:46255/api/v1/namespaces/test1/secrets?allowWatchBookmarks=true&resourceVersion=79033&timeout=8m2s&timeoutSeconds=482&watch=true
2022-02-04T11:47:27.001Z INFO Request Headers:
2022-02-04T11:47:27.001Z INFO Accept: application/vnd.kubernetes.protobuf;as=PartialObjectMetadata;g=meta.k8s.io;v=v1,application/json;as=PartialObjectMetadata;g=meta.k8s.io;v=v1,application/json
2022-02-04T11:47:27.001Z INFO User-Agent: ctrl-metadata-informer/v0.0.0 (linux/amd64) kubernetes/$Format
2022-02-04T11:47:27.004Z INFO Response Status: 200 OK in 2 milliseconds
2022-02-04T11:47:27.004Z INFO Response Headers:
2022-02-04T11:47:27.004Z INFO Cache-Control: no-cache, private
2022-02-04T11:47:27.004Z INFO Content-Type: application/vnd.kubernetes.protobuf;stream=watch
2022-02-04T11:47:27.004Z INFO Date: Fri, 04 Feb 2022 11:47:27 GMT
2022-02-04T11:48:00.596Z INFO add {"name": "test1/foo-207"}
2022-02-04T11:49:00.692Z INFO add {"name": "test1/foo-5199"}
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
Thanks for all the help on this!
// newGVKFixupWatcher adds a wrapper that preserves the GVK information when | ||
// events come in. | ||
// | ||
// This works around a bug where GVK information is not passed into mapping | ||
// functions when using the OnlyMetadata option in the builder. | ||
// This issue is most likely caused by kubernetes/kubernetes#80609. | ||
// See kubernetes-sigs/controller-runtime#1484. | ||
// | ||
// This was originally implemented as a cache.ResourceEventHandler wrapper but | ||
// that contained a data race which was resolved by setting the GVK in a watch | ||
// wrapper, before the objects are written to the cache. | ||
// See kubernetes-sigs/controller-runtime#1650. | ||
// | ||
// The original watch wrapper was found to be incompatible with | ||
// k8s.io/client-go/tools/cache.Reflector so it has been re-implemented as a | ||
// watch.Filter which is compatible. | ||
// See kubernetes-sigs/controller-runtime#1789. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉 Thanks for capturing all of that history!
pkg/cache/internal/informers_map.go
Outdated
watcher, | ||
func(in watch.Event) (out watch.Event, keep bool) { | ||
in.DeepCopyInto(&out) | ||
out.Object.GetObjectKind().SetGroupVersionKind(gvk) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, no strong objections to DeepCopy from me!
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: joelanford, wallrj The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/cc @vincepri because I see that you introduced the metadata only watches in ClusterAPI and which may have been affected by this bug since upgrading to controller-runtime v0.10.1 in kubernetes-sigs/cluster-api#5249 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly LG. Copyright should be 2022.
@@ -0,0 +1,94 @@ | |||
/* | |||
Copyright 2018 The Kubernetes Authors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/2018/2022/
@@ -0,0 +1,31 @@ | |||
/* | |||
Copyright 2018 The Kubernetes Authors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/2018/2022/
pkg/cache/internal/informers_map.go
Outdated
watcher, | ||
func(in watch.Event) (out watch.Event, keep bool) { | ||
in.DeepCopyInto(&out) | ||
out.Object.GetObjectKind().SetGroupVersionKind(gvk) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't think it is necessary, but no strong objections from me too :)
Signed-off-by: Richard Wall <[email protected]>
Signed-off-by: Richard Wall <[email protected]>
/lgtm Thanks for all the help on this! |
Let's see if this works: /cherry-pick release-0.11 |
@joelanford: new pull request created: #1801 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Fixes: #1789
I copied some tests from client-go and observed that the original implementation of gvkFixupWatcher fails those tests:
I've now re-implemented the gvkFixupWatcher as a
watch.FilterFunc
along withwatch.Filter
which seems to have the desired behaviour.This change allows the reflector to know when the ResultChannel is closed at which point it breaks from its watch loop and establishes a new watch: