You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
When a task execution takes longer than its timeout and the job is re-activated to the same worker, the worker execution stops completely (due to a raised exception). For some tasks the execution time may vary extremely and therefore a hard-limit on the timeout is not always the best way to go.
To Reproduce
Steps to reproduce the behavior:
Create a worker with a task which runs longer than its timeout (e.g. time.sleep(20))
Make sure the timeout_ms is set smaller than the job execution time
Start the worker
Wait until the ValueError: Job 2251799814133421 already registered in TaskState is raised
The worker has now stopped working and is not accepting any new jobs anymore
Expected behavior
The worker should not stop working at all. I think re-activating an already existing job should either: (1) ignore the newly accepted job and keep executing the already active job, (2) simply execute the newly activated job or (3) cancel the currently running job and execute the newly activated job.
In any case it should keep on accepting other jobs and log this situation as a warning, since it is obviously not what you want.
Whenever running multiple workers, option 2 would always happen if the job doesn't activate on the same worker. However I think option 1, including a warning, might be the best way to handle this specific situation.
Additional context
Version: v3.0.0rc3
If we can decide which solution is best I am willing to create a pull request for it.
The text was updated successfully, but these errors were encountered:
Make sure your timeout is big enough so this never happens...
Remove the job from the TaskState before the deadline runs out (worker._task_state.remove(job)). This will cause your job to be run again if it is re-activated. However, I have not tested this with completing jobs after their deadline.
Wrap the worker.start() in a while loop to automatically restart the worker when this occurs. The downside of this, is that the job is currently activated and you'll have to wait for it to expire to be re-activated once more.
Describe the bug
When a task execution takes longer than its timeout and the job is re-activated to the same worker, the worker execution stops completely (due to a raised exception). For some tasks the execution time may vary extremely and therefore a hard-limit on the timeout is not always the best way to go.
To Reproduce
Steps to reproduce the behavior:
time.sleep(20)
)timeout_ms
is set smaller than the job execution timeValueError: Job 2251799814133421 already registered in TaskState
is raisedExpected behavior
The worker should not stop working at all. I think re-activating an already existing job should either: (1) ignore the newly accepted job and keep executing the already active job, (2) simply execute the newly activated job or (3) cancel the currently running job and execute the newly activated job.
In any case it should keep on accepting other jobs and log this situation as a warning, since it is obviously not what you want.
Whenever running multiple workers, option 2 would always happen if the job doesn't activate on the same worker. However I think option 1, including a warning, might be the best way to handle this specific situation.
Additional context
Version: v3.0.0rc3
If we can decide which solution is best I am willing to create a pull request for it.
The text was updated successfully, but these errors were encountered: