You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We've conducted a local reproduction of this scenario: nomadEventLog-ExecuteDraining.txt.
It shows that POSEIDON-3W (#590) with the sub-error the allocation was rescheduled indicates this error. This has not happened for at least 90 days.
If we consider a fix for this necessary in the future, we might consider listening to Nomad's Node events to receive drain updates, fetch all allocations of this node, and block new executions for these allocations/runners. Further, we should ensure that the drain deadline matches the maximum of all allowed execution timeouts (of CodeOcean).
This issue is still valid and could be a nice improvement. However, we don't expect that this problem occurs many times, so that it doesn't have a high priority.
Our current
drain_on_shutdown
strategy for stopping Nomad agents is:drain-on-shutdown
deadline
all running executions have time to finish.The executions that don't have enough time to finish result in a user-visible error.
See #651
Unfortunately, we currently don't have any metric to count how often this issue occurs.
The text was updated successfully, but these errors were encountered: