Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Worker stops completely when a Job is re-activated #212

Closed
jelleklaver opened this issue Sep 8, 2021 · 2 comments
Closed

Worker stops completely when a Job is re-activated #212

jelleklaver opened this issue Sep 8, 2021 · 2 comments
Labels
bug Something isn't working

Comments

@jelleklaver
Copy link

jelleklaver commented Sep 8, 2021

Describe the bug
When a task execution takes longer than its timeout and the job is re-activated to the same worker, the worker execution stops completely (due to a raised exception). For some tasks the execution time may vary extremely and therefore a hard-limit on the timeout is not always the best way to go.

To Reproduce
Steps to reproduce the behavior:

  1. Create a worker with a task which runs longer than its timeout (e.g. time.sleep(20))
  2. Make sure the timeout_ms is set smaller than the job execution time
  3. Start the worker
  4. Wait until the ValueError: Job 2251799814133421 already registered in TaskState is raised
  5. The worker has now stopped working and is not accepting any new jobs anymore

Expected behavior
The worker should not stop working at all. I think re-activating an already existing job should either: (1) ignore the newly accepted job and keep executing the already active job, (2) simply execute the newly activated job or (3) cancel the currently running job and execute the newly activated job.

In any case it should keep on accepting other jobs and log this situation as a warning, since it is obviously not what you want.

Whenever running multiple workers, option 2 would always happen if the job doesn't activate on the same worker. However I think option 1, including a warning, might be the best way to handle this specific situation.

Additional context
Version: v3.0.0rc3

If we can decide which solution is best I am willing to create a pull request for it.

@jelleklaver
Copy link
Author

jelleklaver commented Sep 9, 2021

Current work-arounds I found:

  1. Make sure your timeout is big enough so this never happens...
  2. Remove the job from the TaskState before the deadline runs out (worker._task_state.remove(job)). This will cause your job to be run again if it is re-activated. However, I have not tested this with completing jobs after their deadline.
  3. Wrap the worker.start() in a while loop to automatically restart the worker when this occurs. The downside of this, is that the job is currently activated and you'll have to wait for it to expire to be re-activated once more.

@JonatanMartens
Copy link
Collaborator

Fixed in v3.0.0rc4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants