-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Auto-Restoring deleted ServiceImports and EndPointSlices #696
Comments
@davidohana Could you clarify what you mean by deleting locally? I expect you mean deleting locally on the source cluster where export exists, but just wanted to be sure it is so. |
Yes, that what I mean - deleting on the origin cluster. |
There are likely other objects that would fail in similar ways if a user deletes them, because they are internal objects that we don't expect users to mess with. To handle this properly we would need to make a more systematic review of the code base, which would likely be benefited by an EP and a broader discussion about what should change and how. |
This issue has been automatically marked as stale because it has not had activity for 60 days. It will be closed if no further activity occurs. Please make a comment if this issue/pr is still valid. Thank you for your contributions. |
@vthapar pointed out that finalizers may be a good, low-overhead way of doing this. |
This issue has been automatically marked as stale because it has not had activity for 60 days. It will be closed if no further activity occurs. Please make a comment if this issue/pr is still valid. Thank you for your contributions. |
Discussing this again, it's important to note that monitoring resources cross-many-clusters for deletion comes with overhead that doesn't exist for the parallel of an endpoint slices in one K8s cluster. If there's a usecase for this, we'd be happy to discuss that and prioritize this work. Again, this likely applies to other resources and finalizers might enable this with reasonable overhead. This might be kinda-related to submariner-io/enhancements#161, or at least a good time to think about finalizers for various resources. |
This issue has been automatically marked as stale because it has not had activity for 60 days. It will be closed if no further activity occurs. Please make a comment if this issue/pr is still valid. Thank you for your contributions. |
While a finalizer would prevent out-of-band deletion, we wouldn't be able to update the resource since it's in the process of being deleted. A simple solution would be to specify a resync period with the informers. |
I tested finalizer behavior and and after deletion with a finalizer still present, the resource can still be updated. You just can't modify the A safer solution is to recreate a resource on out-of-band deletion. We can do this fairly simply via the resource syncer. On deletion of a resource, check if the originating resource still exists and, if so, re-queue it so the deleted resource is re-created. If not then process the deleted resource normally. Eg, |
My team runs a multi-cluster setup with Submariner's service discovery setup (v0.15.2), where we might be seeing a case of this issue. Our workloads can be placed on particular cluster based on We are seeing that this results in the deletion of aggregated To mitigate the issue, we had to manually re-create @tpantelis Will your solution handle above scenario? And is there a timeline for when a solution will be rolled out? |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further |
Not sure whether to call this a bug or an enhancement.
What happened:
What you expected to happen:
The agent shall recreate the service import / endpoint slice after some time as the "source of truth" is the export and it still exists. Currently this only happens after lighthouse agent restart.
Additional notes:
To compare, if I delete a k8s endpoint slice, it is recreated immediately.
I realize that those edge cases are only possible if someone with sufficient permissions delete those objects manually.
Slack discussion link: https://kubernetes.slack.com/archives/C010RJV694M/p1645966829687039
The text was updated successfully, but these errors were encountered: