Worker stops completely when a Job is re-activated #212

jelleklaver · 2021-09-08T13:27:36Z

Describe the bug
When a task execution takes longer than its timeout and the job is re-activated to the same worker, the worker execution stops completely (due to a raised exception). For some tasks the execution time may vary extremely and therefore a hard-limit on the timeout is not always the best way to go.

To Reproduce
Steps to reproduce the behavior:

Create a worker with a task which runs longer than its timeout (e.g. time.sleep(20))
Make sure the timeout_ms is set smaller than the job execution time
Start the worker
Wait until the ValueError: Job 2251799814133421 already registered in TaskState is raised
The worker has now stopped working and is not accepting any new jobs anymore

Expected behavior
The worker should not stop working at all. I think re-activating an already existing job should either: (1) ignore the newly accepted job and keep executing the already active job, (2) simply execute the newly activated job or (3) cancel the currently running job and execute the newly activated job.

In any case it should keep on accepting other jobs and log this situation as a warning, since it is obviously not what you want.

Whenever running multiple workers, option 2 would always happen if the job doesn't activate on the same worker. However I think option 1, including a warning, might be the best way to handle this specific situation.

Additional context
Version: v3.0.0rc3

If we can decide which solution is best I am willing to create a pull request for it.

The text was updated successfully, but these errors were encountered:

jelleklaver · 2021-09-09T08:07:43Z

Current work-arounds I found:

Make sure your timeout is big enough so this never happens...
Remove the job from the TaskState before the deadline runs out (worker._task_state.remove(job)). This will cause your job to be run again if it is re-activated. However, I have not tested this with completing jobs after their deadline.
Wrap the worker.start() in a while loop to automatically restart the worker when this occurs. The downside of this, is that the job is currently activated and you'll have to wait for it to expire to be re-activated once more.

See #212.

JonatanMartens · 2021-09-28T11:45:06Z

Fixed in v3.0.0rc4

camunda-community-hub-probot bot added the triage label Sep 8, 2021

JonatanMartens added a commit that referenced this issue Sep 9, 2021

Allow reactivation of already running job in task state

de3c309

See #212.

JonatanMartens added the bug Something isn't working label Sep 9, 2021

camunda-community-hub-probot bot removed the triage label Sep 9, 2021

JonatanMartens mentioned this issue Sep 9, 2021

Fix: 212 job reactivation crashes worker #214

Merged

2 tasks

JonatanMartens closed this as completed Sep 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Worker stops completely when a Job is re-activated #212

Worker stops completely when a Job is re-activated #212

jelleklaver commented Sep 8, 2021 •

edited

Loading

jelleklaver commented Sep 9, 2021 •

edited

Loading

JonatanMartens commented Sep 28, 2021

Worker stops completely when a Job is re-activated #212

Worker stops completely when a Job is re-activated #212

Comments

jelleklaver commented Sep 8, 2021 • edited Loading

jelleklaver commented Sep 9, 2021 • edited Loading

JonatanMartens commented Sep 28, 2021

jelleklaver commented Sep 8, 2021 •

edited

Loading

jelleklaver commented Sep 9, 2021 •

edited

Loading