Optimize slurm config #234

nirmalasrjn · 2016-06-01T03:38:54Z

FastSchedule directive has the default value of 1 which permits fast scheduling. However if a node has less than the configured resources, it will be set to DRAIN, which means, the node will finish the currently running job. But no further jobs will be scheduled on that node. When FastSchedule is set to 0, scheduling decisions are made based on actual configuration of each individual node.

Adding FastSchedule Directive to Slurm Specs

koomie · 2016-06-03T14:26:09Z

Quick question. Do you know if you can avoid having to specify node configuration details in slurm.conf if you adopt FastSchedule=0?

nirmalasrjn · 2016-06-03T14:55:42Z

As far as I know, it does not eliminate the need to specify node configuration details in slurm.conf. When I use the FastSchedule directive, I put in node configuration details too.

koomie · 2016-06-03T20:57:32Z

So, if the node does not match the configuration that is called out in the slum.conf file, is there a reason to not want it to be set to DRAIN? If you didn't have to create the node entries by using FastSchedule=0, that would be an advantage in my mind, but if not, I'm not sure one is necessarily better than the other.

JohnWestlund · 2016-06-03T21:17:59Z

That’s correct. Nodes within a pool should be homogenenous and be correctly defined in the slurm.conf so they can be allocated based on a jobs constraints. If a pool is heterogenous you’re asking for performance and results to be possibly highly variable – but the resource manager should still have the correct definition of the resources. You might be able to get by with an incorrect core count but there are other node features that impose a much harder limit (memory, etc).

At the end of the day this is not something that should be changing often – your slurm.conf should be relatively stable. And if suddenly something is mismatched you probably want to know about it.

John

From: Karl W. Schulz [mailto:[email protected]]
Sent: Friday, June 3, 2016 1:58 PM
To: openhpc/ohpc [email protected]
Subject: Re: [openhpc/ohpc] Optimize slurm config (#234)

So, if the node does not match the configuration that is called out in the slum.conf file, is there a reason to not want it to be set to DRAIN? If you didn't have to create the node entries by using FastSchedule=0, that would be an advantage in my mind, but if not, I'm not sure one is necessarily better than the other.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHubhttps://github.com//pull/234#issuecomment-223692683, or mute the threadhttps://github.com/notifications/unsubscribe/AFP7vxIUmOoS7AMsqJUv1kRBm74ms_qCks5qIJU-gaJpZM4IrKKx.

Default value of 0 means that a node does not return to service unless the administrator manually brings the node to service. If it is set to 1, then, that node can return to service if it has a valid configuration and was set to DOWN only because it was unresponsive.

…_Factory (#234)

koomie · 2016-06-09T20:21:50Z

Landed the the latest which updates the ReturnToService directive onto 1.1.1 branch.

koomie · 2016-06-10T20:00:02Z

Re-opening to enable build for 1.1

koomie · 2016-06-16T19:40:12Z

Confirmed ReturnToService=1 setting applied during CI install. Closing this out.

nirmalasrjn added 2 commits May 31, 2016 22:27

Merge pull request #1 from nirmalasrjn/SlurmConfig-Update

107d161

Adding FastSchedule Directive to Slurm Specs

koomie added a commit that referenced this pull request Jun 9, 2016

Merge branch 'nirmalasrjn-OptimizeSlurmConfig' into obs/OpenHPC_1.1.1…

2ab269a

…_Factory (#234)

koomie closed this Jun 9, 2016

koomie added enhancement t:rms labels Jun 10, 2016

koomie added this to the 1.1.1 milestone Jun 10, 2016

koomie reopened this Jun 10, 2016

koomie added the built label Jun 15, 2016

koomie added a commit that referenced this pull request Jun 15, 2016

update to avoid double definition of ReturnToService (#234)

6a71cdc

koomie closed this Jun 16, 2016

adrianreber mentioned this pull request Jul 1, 2024

Duplicate ReturnToService line in /etc/slurm/slurm.conf? #1991

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize slurm config #234

Optimize slurm config #234

nirmalasrjn commented Jun 1, 2016

koomie commented Jun 3, 2016

nirmalasrjn commented Jun 3, 2016

koomie commented Jun 3, 2016

JohnWestlund commented Jun 3, 2016

koomie commented Jun 9, 2016

koomie commented Jun 10, 2016

koomie commented Jun 16, 2016

Optimize slurm config #234

Optimize slurm config #234

Conversation

nirmalasrjn commented Jun 1, 2016

koomie commented Jun 3, 2016

nirmalasrjn commented Jun 3, 2016

koomie commented Jun 3, 2016

JohnWestlund commented Jun 3, 2016

koomie commented Jun 9, 2016

koomie commented Jun 10, 2016

koomie commented Jun 16, 2016