UFM is a powerful platform for managing scale-out computing environments. UFM enables data center operators to monitor, efficiently provision, and operate the modern data center fabric. Using UFM with SLURM offer a good solution to track network bandwidth, congestion, errors and resource utilization of compute nodes of SLURM jobs.
UFM 6.10 installed on a RH7x machine with sharp_enabled & enable_sharp_allocation true and running in management mode. python 2.7 on SLURM controller. UFM-SLURM Integration tar file. Generate token_auth
tar -xf ufm_slurm_integration.tar.gz
sudo ./install.sh
or run the ./install.sh with root permissions
If you set auth_type=token_auth in UFM SLURM’s config file, you must generate a new token by logging into the UFM server and running the following curl command:
curl -H "X-Remote-User:admin" -XPOST http://127.0.0.1:8000/app/tokens
Then you must copy the generated token and paste it into the config file beside the token parameter.
1) Verify that both the UFM server and the Kerberos server are appropriately configured to support Kerberos authentication.
2) Install Required Packages: Execute the command yum install krb5-libs krb5-workstation to install the necessary packages.
3) Adjust the /etc/krb5.conf file to match your realm, domain, and other settings, either manually or by copying it from the Kerberos server.
Using the keytab file:
Copy the Keytab from Kerberos server to SLURM server Machine.
Create the Kerberos ticket by running: kinit -k -t /path/to/your-keytab-file HTTP/YOUR-HOST-NAME@YOUR-REALM
Using user principle:
In the Kerberos server create user principle by running: kadmin.local addprinc user_name
In the SLURM client, acquire the TGT by running: kinit user_name
Use curl to verify the user's ability to authenticate to UFM REST APIs using Kerberos authentication:
curl --negotiate -i -u : -k 'https://<ufm_server_name>/ufmRestKrb/app/tokens'
Set ufm_server=<ufm_host_name>, recommended to use the host name and not host IP.
Set auth_type=kerberos_auth.
Set principal_name=<your_principal_name>; retrieve it using the klist command.
After installation, the configurations should be set and UFM machine should be running. Several configurable settings need to be set to make the integration run.
sudo vim /etc/slurm/ufm_slurm.conf
ufm_server: UFM server IP address to connect to
ufm_server_user: Username of UFM server used to connect to UFM if you set
auth_type=basic_auth.
ufm_server_pass: UFM server user password.
partially_alloc: Whether to allow or not allow partial allocation of nodes
pkey: By default it will by default management Pkey 0x7fff
auth_type: One of (token_auth, basic_auth) by default it token_auth
token: If you set auth_type to be token_auth you need to set
generated token. Please see Generate token_auth section.
log_file_name: The name of integration logging file
- On UFM machine, open the ufm config file /opt/ufm/files/conf/gv.cfg
- In section [Server], set the key: "monitoring_mode" to no and then save the file.
monitoring_mode=no
- Start UFM
* HA mode: /etc/init.d/ufmha start
* SA mode: /etc/init.d/ufmd start
After installation and deployment of UFM-SLURM integration, the integration should work for every submitted SLURM job automatically.
- Using the Slurm controller submit a new SLURM job.
for example: # sbatch -N2 batch1.sh
- In UFM side, a new SHArP reservation will be created based on job_id,
job nodes and set pkey in ufm_slurm.conf file.
- A new pkey will be created contains all the ports of job nodes to
allow the SHArP nodes to be communicated on top of it.
- After the SLURM job is completed, UFM deletes the created SHArP
reservation and pkey.
- From the time that a job is submitted by SLURM server until
completion, a log file called /tmp/ufm_slurm.log logs all the
actions and errors occurs during the execution. This log file
could be changed by modify log_file_name parameter in UFM_SLURM
config file /etc/slurm/ufm_slurm.conf.