-
Notifications
You must be signed in to change notification settings - Fork 195
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize slurm config #234
Optimize slurm config #234
Conversation
FastSchedule directive has the default value of 1 which permits fast scheduling. However if a node has less than the configured resources, it will be set to DRAIN, which means, the node will finish the currently running job. But no further jobs will be scheduled on that node. When FastSchedule is set to 0, scheduling decisions are made based on actual configuration of each individual node.
Adding FastSchedule Directive to Slurm Specs
Quick question. Do you know if you can avoid having to specify node configuration details in slurm.conf if you adopt FastSchedule=0? |
As far as I know, it does not eliminate the need to specify node configuration details in slurm.conf. When I use the FastSchedule directive, I put in node configuration details too. |
So, if the node does not match the configuration that is called out in the slum.conf file, is there a reason to not want it to be set to DRAIN? If you didn't have to create the node entries by using FastSchedule=0, that would be an advantage in my mind, but if not, I'm not sure one is necessarily better than the other. |
That’s correct. Nodes within a pool should be homogenenous and be correctly defined in the slurm.conf so they can be allocated based on a jobs constraints. If a pool is heterogenous you’re asking for performance and results to be possibly highly variable – but the resource manager should still have the correct definition of the resources. You might be able to get by with an incorrect core count but there are other node features that impose a much harder limit (memory, etc). At the end of the day this is not something that should be changing often – your slurm.conf should be relatively stable. And if suddenly something is mismatched you probably want to know about it. John From: Karl W. Schulz [mailto:[email protected]] So, if the node does not match the configuration that is called out in the slum.conf file, is there a reason to not want it to be set to DRAIN? If you didn't have to create the node entries by using FastSchedule=0, that would be an advantage in my mind, but if not, I'm not sure one is necessarily better than the other. — |
Default value of 0 means that a node does not return to service unless the administrator manually brings the node to service. If it is set to 1, then, that node can return to service if it has a valid configuration and was set to DOWN only because it was unresponsive.
Landed the the latest which updates the ReturnToService directive onto 1.1.1 branch. |
Re-opening to enable build for 1.1 |
Confirmed |
FastSchedule directive has the default value of 1 which permits fast scheduling. However if a node has less than the configured resources, it will be set to DRAIN, which means, the node will finish the currently running job. But no further jobs will be scheduled on that node. When FastSchedule is set to 0, scheduling decisions are made based on actual configuration of each individual node.