You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
(Relates to HPC Pack 2019, 6.1.7531.0 and probably earlier)
Feature Request Description
Using job submit on the command-line, we can set /parentjobids and have a job queue until other job(s) finishes. This is really useful.
By default though, if the parent job fails, the child job remains in the queue and never execute, so we have to cancel those jobs manually.
I'm not quite sure if /faildependenttasks fixes this - since I am talking about jobs mainly, rather than tasks. Perhaps it does.
If it does, then it would be good if /faildependenttasks could be set as true by default, at the job template level.
Describe Preferred Solution
Option to select Fail Dependent Tasks (or jobs?) in HPC Cluster Manager, in Configuration -> Job Templates -> Job Template Editor -> Add (property) drop down. We already have "Fail on Task Failure", but not "Fail if parent tasks/jobs fail"
Describe Alternatives Considered
Alternatively - I cannot really see a reason why you wouldn't want /faildependenttasks to be on all the time. Presumably it makes no difference if there are no dependent jobs, but I think it's reasonable that all child jobs fail by default if the parent fails.
The text was updated successfully, but these errors were encountered:
To follow up - I think /faildependenttasks does not do what I hoped it would, and it may be some additional functionality I am requesting - perhaps /faildependentjobs - in which a job will fail if one of its /parentjobids also fails.
@weshinsley , thanks for the feedback. The original design is to keep the child jobs in active Queue state once any of the parent jobs is canceled or failed. Since the canceled or failed parent job can be requeued, the child jobs will run after the requeued parent job completes successfully. If the canceled or failed parent job was deleted from the database after a long period, the queued child jobs would be set to Failed state. I agree we may provide another option to cancel or fail the child jobs immediately after any of the parent jobs is canceled or failed.
(Relates to HPC Pack 2019, 6.1.7531.0 and probably earlier)
Feature Request Description
Using
job submit
on the command-line, we can set/parentjobids
and have a job queue until other job(s) finishes. This is really useful.By default though, if the parent job fails, the child job remains in the queue and never execute, so we have to cancel those jobs manually.
I'm not quite sure if
/faildependenttasks
fixes this - since I am talking about jobs mainly, rather than tasks. Perhaps it does.If it does, then it would be good if
/faildependenttasks
could be set as true by default, at the job template level.Describe Preferred Solution
Option to select Fail Dependent Tasks (or jobs?) in HPC Cluster Manager, in Configuration -> Job Templates -> Job Template Editor -> Add (property) drop down. We already have "Fail on Task Failure", but not "Fail if parent tasks/jobs fail"
Describe Alternatives Considered
Alternatively - I cannot really see a reason why you wouldn't want /faildependenttasks to be on all the time. Presumably it makes no difference if there are no dependent jobs, but I think it's reasonable that all child jobs fail by default if the parent fails.
The text was updated successfully, but these errors were encountered: