You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Recently DWO began watching PVC cleanup jobs for errors and reporting them as failures in workspace cleanup. However, a side-effect of this detection is that it can result in DevWorkspaces unnecessarily being stuck in a terminating state in the event that a cleanup job encounters a transient error that later resolves:
DevWorkspace is deleted, cleanup job is created
Cleanup job encounters an error, workspace is set to Errored state
Error in cleanup job is resolved, job runs successfully
Finalizer is not cleared as we don't check errored workspaces
This is a significant issue, as unlike the DevWorkspace startup case (where a DevWorkspace can just be restarted), there's no way to clear the errored status from a DevWorkspace. As a result, users must check the cleanup jobs status, notice that it completed successfully, and then remove the finalizer from the DevWorkspace manually.
How To Reproduce
Not easy to reproduce as it requires a transient error in the cluster, but the recent encounter was a few workspaces that were stuck terminating due to CreateContainerError errors in the cleanup job. This seems to have been due to some temporary issue on the cluster as all the jobs had been completed and event history had been cleared by the time it was noticed.
Additional context
The text was updated successfully, but these errors were encountered:
Description
Recently DWO began watching PVC cleanup jobs for errors and reporting them as failures in workspace cleanup. However, a side-effect of this detection is that it can result in DevWorkspaces unnecessarily being stuck in a terminating state in the event that a cleanup job encounters a transient error that later resolves:
Errored
stateThis is a significant issue, as unlike the DevWorkspace startup case (where a DevWorkspace can just be restarted), there's no way to clear the errored status from a DevWorkspace. As a result, users must check the cleanup jobs status, notice that it completed successfully, and then remove the finalizer from the DevWorkspace manually.
How To Reproduce
Not easy to reproduce as it requires a transient error in the cluster, but the recent encounter was a few workspaces that were stuck terminating due to
CreateContainerError
errors in the cleanup job. This seems to have been due to some temporary issue on the cluster as all the jobs had been completed and event history had been cleared by the time it was noticed.Additional context
The text was updated successfully, but these errors were encountered: