A new open-source SDK project for NVIDIA UFM (Unified Fabric Manager)
-
Scripts:
- list of python scripts as examples how to collect data and operate devices via UFM REST API
-
Plugins:
-
SLURM Integration Plugin:Utilize UFM with SLURM to monitor network bandwidth, congestion, errors, and resource utilization of SLURM job compute nodes.
-
UFM NDT Plugin: Compare topologies, manage opensm input file creation, and merge topologies for enhanced UFM functionality.
-
Advanced Hello World Plugin: An advanced example demonstrating the construction of plugins for both backend and GUI.
-
Bright Plugin: Augment UFM's network perspective with data from Bright Cluster Manager, improving network-centered root cause analysis (RCA) tasks.
-
Fluentd Telemetry Plugin: Extract UFM telemetry counters via Prometheus metrics and stream them using the Fluentd protocol to the telemetry console.
-
Fluentd Topology Plugin: Extract topology via UFM API and stream it using the Fluentd protocol to the telemetry console.
-
Grafana InfiniBand Telemetry Plugin: Provides a new UFM telemetry Prometheus endpoint with human-readable labels for monitoring using Grafana.
-
Grafana Telemetry Plugin: Grafana dashboard to monitor UFM telemetry metrics collected by Prometheus Server.
-
gRPC Streamer Plugin: Provides gRPC streaming of UFM REST API for enhanced communication.
-
Hello World Plugin: Basic example of a backend plugin with minimal requirements.
-
PDR Deterministic Plugin: Handles UFM Packet Drop Rate (PDR) Deterministic Plugin for port isolation.
-
SNMP Receiver Plugin: Listens to SNMP traps from managed switches in the fabric and redirects them as events to UFM.
-
Sysinfo Plugin: Queries commands to switches using AIOHttps communication.
-
UFM Syslog Streaming Plugin: Extracts UFM events from UFM syslog and streams them to a remote Fluentd destination, with an option to duplicate syslog messages to a remote syslog destination.
-
Zabbix Telemetry Plugin: Explains how to stream UFM events to Zabbix for monitoring.
-
-
Utils:
- tools for general usage over UFM REST API. some of them already in use in the available 3rd party plugins
Please use the following in order to build: http://hpc-master.lab.mtl.com:8080/job/UFM_PLUGINS_RELEASE/
This project supports integration of custom plugins into the Continuous Integration (CI) pipeline. The process for integrating your plugin is simple and described below.
Your plugin should have a dedicated .ci
directory that contains a ci_matrix.yaml
file. This file is used by the CI pipeline to manage the build process for your plugin.
You can use the ci_matrix.yaml
file found in the hello_world_plugin
directory as a template.
https://github.com/Mellanox/ufm_sdk_3.0/blob/main/plugins/hello_world_plugin/.ci/ci_matrix.yaml
To add your plugin to the CI pipeline, follow these steps:
-
Create a
.ci
directory in your plugin's root directory. -
In the
.ci
directory, create aci_matrix.yaml
file. Use theci_matrix.yaml
file in thehello_world_plugin
directory as a template. -
The CI pipeline will detect changes in your plugin directory and trigger the CI process. It uses the
ci_matrix.yaml
file in your plugin's.ci
directory to manage the CI process.
If your plugin directory does not contain a .ci
directory, the CI process will fail.
The CI pipeline gets triggered based on changes made to the plugins. If changes occur in multiple plugins, the pipeline will not trigger the individual .ci
directories but instead trigger a default empty CI.
Release job URL: https://nbuprod.blsm.nvidia.com/swx-ufm/job/UFM_PLUGINS_SDK_RELEASE/.
Release job URL - DRP: will be synced and updated soon.
Build instructions:
- Go to url.
- Click on login (top right corner).
- Login using corp username(without '@nvidia.com') and password.
- Once logged in you now can build the job.
- Click on "Build With Parameters"
- Update the necessary parameter.
- Click on "Build" and the job will be executed