-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integration tests for changing feature flags #5999
Comments
… Switch This commit fixes the updates for reconciling PipelineRunStatus when switching the EmbeddedStatus feature flag. It resets the status.runs and status.taskruns to nil with minimal EmbeddedStatus and status.childReferences to nil with full EmbeddedStatus. Prior to this change, the childReferences runs and taskruns in pipelineRunStatus would not have been reset to nil and could have led to validation error when the feature flag is being switched. The following issue could help prevent such bugs in the future: Integration tests for changing feature flags tektoncd#5999
… Switch This commit fixes the updates for reconciling PipelineRunStatus when switching the EmbeddedStatus feature flag. It resets the status.runs and status.taskruns to nil with minimal EmbeddedStatus and status.childReferences to nil with full EmbeddedStatus. Prior to this change, the childReferences runs and taskruns in pipelineRunStatus would not have been reset to nil and could have led to validation error when the feature flag is being switched. The following issue could help prevent such bugs in the future: Integration tests for changing feature flags tektoncd#5999
… Switch This commit fixes the updates for reconciling PipelineRunStatus when switching the EmbeddedStatus feature flag. It resets the status.runs and status.taskruns to nil with minimal EmbeddedStatus and status.childReferences to nil with full EmbeddedStatus. Prior to this change, the childReferences runs and taskruns in pipelineRunStatus would not have been reset to nil and could have led to validation error when the feature flag is being switched. The following issue could help prevent such bugs in the future: Integration tests for changing feature flags tektoncd#5999
… Switch This commit fixes the updates for reconciling PipelineRunStatus when switching the EmbeddedStatus feature flag. It resets the status.runs and status.taskruns to nil with minimal EmbeddedStatus and status.childReferences to nil with full EmbeddedStatus. Prior to this change, the childReferences runs and taskruns in pipelineRunStatus would not have been reset to nil and could have led to validation error when the feature flag is being switched. The following issue could help prevent such bugs in the future: Integration tests for changing feature flags #5999
I agree that these changes will be infrequent. So one option might be to declaring that changing values mid-flight will result in undefined behavior. |
The only way I can imagine we could freeze the config, would be store in the status of the particular run, but I'm not sure we should go down this path. Perhaps declaring "undefined behaviour" like @dibyom suggested is a good way to handle this. |
SGTM, I'll open a PR. |
@afrittoli @dibyom I realized we now do actually store feature flags in taskrun/pipelinerun status via the provenance field. Is it worth updating the codebase to use the value of feature flags stored in the provenance field, if they are present, rather than always using the value from the configmap? @chuangw6 curious to hear your thoughts as well. |
Thanks @lbernick for bringing up this. Good point. However, the time point when we store the feature flags to status is the same as the time when we store the taskSpec/pipelineSpec into status line173-prepare, which is before the execution line188-reconcile. I am wondering if there is a race condition where the feature flag version is X at the point we store feature flags into status, but the feature flags are changed to version Y before the execution. This triggers my thinking of whether |
This seems like a reason in favor of using the feature flags from provenance, not a reason against. Shouldn't we make sure the provenance reflects the flags that were actually used to execute the pipelinerun? |
+1
yes, we definitely need to make sure the data there reflects the actually flags being used. I will revise this with @chitrangpatel and get back here with updates. |
I've been giving this a lot of thought and tried to prototype things a bit. The challenge with feature flags is as follows:
|
@chitrangpatel thanks for the update, that's a great point. Just one clarification: Are the feature flags in provenance meant to reflect only the values used for the taskrun/pipelinerun, and not for the task/pipeline? And would it be worth creating a separate issue to figure out how feature flags in provenance can better reflect the execution of the pipelinerun, or do you think this issue is sufficient? |
The feature flags in the provenance are meant to reflect the values for the entire build process assuming that they weren't mutated midway (so it assumes that both pipeline and pipelinerun were validated with the same set of feature flags). Clearly we cannot guarantee that case since if the feature-flags change midway, we don't currently record it or fail the task/pipeline. I think we can open a separate issue to figure out how feature flags in provenance can better reflect the execution of the pipelinerun. Let me do that now. |
#6797 continues this discussion from the provenance perspective. |
When a cluster operator changes the value of a feature flag, or when they upgrade to a new Pipeline version that modifies the default value for a feature flag, this has the potential to affect running PipelineRuns. We don't have any tests for this as far I know.
One example of how this can cause problems is #5991, where a running PipelineRun fails to complete if
embedded-status
is modified during its execution (thanks @JeromeJu for spotting this problem!). There are other feature flags (disable-affinity-assistant, enable-api-fields, etc) that have the potential to interfere with running PipelineRuns, but we haven't tested out these interactions.I'm not sure how problematic this is. I expect these changes happen infrequently, but operators should be able to smoothly upgrade tekton versions, and clusters that run a large volume of PipelineRuns will likely run into this problem. We are also planning to change the value of "disable-affinity-assistant" in the v1 release.
I'm also not sure of the best way to test for this. Modifying feature flags during unit tests is challenging, because our unit tests test one reconcile loop at a time. Inserting a flag change into unit tests means baking in assumptions about when during execution a flag flip would happen, and makes tests much more complicated. However, doing the same during integration tests also has challenges, because we run all integration tests at the same time on the same cluster. Maybe we could write some new integration tests that run on their own cluster?
The text was updated successfully, but these errors were encountered: