-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tekton modifies objects in the informer cache #2734
Comments
/kind bug I'm currently looking at this in spare cycles, but would be open to handing off to someone with more bandwidth to run down the issues this has dislodged. |
One complication here is that the single control loop is performing multiple Lister reads -> Client writes, but given that the fakes aren't updating the informers index during the client writes the second Lister read -> Client write reads a stale version of the object from the Lister cache. |
One partial solution to the above (since the same will happen during normal reconciliation) is to use a Patch like this, which bypasses the stale read and optimistic concurrency checks to update the labels and annotations:
I'm working through some other test failures as well, at least one of which is a test bug that was masked by this bug. |
In each of the `{Pipeline,Task}Run` reconcilers the functions to update status and labels/annotations refetch the resource from the informer cache, check the field they want to update, and if an update is needed they set the field on the informer's copy and call the appropriate update method. In pseudo-code: ```go func update(fr *FooRun) { newFr := lister.Get(fr.Name) if reflect.DeepEqual(newFr.Field, fr.Field) { newFr.Field = fr.Field // This modified the informer's copy! return client.Update(newFr) } } ``` I have worked around this in two different ways: 1. For the status updates I added a line like `newFr = newFr.DeepCopy()` immediately above the mutation to avoid writing to the informer's copy. 2. For the label/annotation updates, I changed the `Update` call to a `Patch` that bypasses optimistic concurrency checks. This last bit is important because otherwise the update above will lead to the first reconciliation *always* failing due to `resourceVersion` skew caused by the status update. This also works around some fun interactions with the test code (see fixed issue). There are two other notable aspects to this change: 1. Test bugs! There were a good number of places that were assuming that the object stored in the informer was altered. I changed most of these to refetch through the client. 2. D-Fence! I added some logic to some of the common test setup code to `DeepCopy()` resources before feeding them to the fake clients to try and avoid assumptions about "same object" creeping back in. It is also worth calling out that this change will very likely destabilize the metric that I identified [here](tektoncd#2729) as racy, which is likely masked by the mutation of the informer copies. Fixes: tektoncd#2734
In each of the `{Pipeline,Task}Run` reconcilers the functions to update status and labels/annotations refetch the resource from the informer cache, check the field they want to update, and if an update is needed they set the field on the informer's copy and call the appropriate update method. In pseudo-code: ```go func update(fr *FooRun) { newFr := lister.Get(fr.Name) if reflect.DeepEqual(newFr.Field, fr.Field) { newFr.Field = fr.Field // This modified the informer's copy! return client.Update(newFr) } } ``` I have worked around this in two different ways: 1. For the status updates I added a line like `newFr = newFr.DeepCopy()` immediately above the mutation to avoid writing to the informer's copy. 2. For the label/annotation updates, I changed the `Update` call to a `Patch` that bypasses optimistic concurrency checks. This last bit is important because otherwise the update above will lead to the first reconciliation *always* failing due to `resourceVersion` skew caused by the status update. This also works around some fun interactions with the test code (see fixed issue). There are two other notable aspects to this change: 1. Test bugs! There were a good number of places that were assuming that the object stored in the informer was altered. I changed most of these to refetch through the client. 2. D-Fence! I added some logic to some of the common test setup code to `DeepCopy()` resources before feeding them to the fake clients to try and avoid assumptions about "same object" creeping back in. It is also worth calling out that this change will very likely destabilize the metric that I identified [here](#2729) as racy, which is likely masked by the mutation of the informer copies. Fixes: #2734
Expected Behavior
Tekton should
.DeepCopy()
any resource that is fetched from an informer's Lister, otherwise the resource in the underlying cache is altered and bad things result (TBH, I'm not even sure of the extent, but this is a "no no").Actual Behavior
In a few places Tekton updates the resource returned from the informer cache.
Steps to Reproduce the Problem
I outlined this a bit here: #2729 (comment), and also highlighted a few of the problem spots.
The text was updated successfully, but these errors were encountered: