Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core feature] Slurm agent #5634

Open
2 tasks done
BerndDoser opened this issue Aug 5, 2024 · 5 comments
Open
2 tasks done

[Core feature] Slurm agent #5634

BerndDoser opened this issue Aug 5, 2024 · 5 comments
Assignees
Labels
enhancement New feature or request

Comments

@BerndDoser
Copy link

Motivation: Why do you think this is important?

Slurm is a widely used workload management system in many HPC (High-Performance Computing) compute clusters. It plays a vital role in efficiently allocating compute resources, running work on these allocated resources, and overseeing a queue of pending work.

Integrating Slurm with Flyte presents a strategic opportunity to greatly enhance the utilization of Flyte workflows on HPC clusters, providing many advantages.

Goal: What should the final outcome look like, ideally?

The aim is the implementation of a Flyte agent that facilitates task submission to HPC resources using the Slurm scheduler.

Typically, users interact with Slurm through its command-line interface (CLI). For instance, the sbatch command submits a job script for later execution. An optional Slurm daemon also offers a REST API for interacting with the Slurm system.

Describe alternatives you've considered

I don't know of anything comparable.

Propose: Link/Inline OR Additional context

I am available to offer support using Slurm and to test the Flyte agent.
https://github.com/JBris/slurm-rest-api-docker can be used for testing the Slurm CLI and the Slurm REST API.

Are you sure this issue hasn't been raised already?

  • Yes

Have you read the Code of Conduct?

  • Yes
@BerndDoser BerndDoser added enhancement New feature or request untriaged This issues has not yet been looked at by the Maintainers labels Aug 5, 2024
Copy link

welcome bot commented Aug 5, 2024

Thank you for opening your first issue here! 🛠

Copy link

runllm bot commented Aug 5, 2024

To integrate Slurm with Flyte, you can create a Flyte agent that interacts with the Slurm scheduler using its REST API. This approach leverages Flyte's extensibility and allows for efficient task submission to HPC resources managed by Slurm.

Motivation: Why do you think this is important?

Slurm is a widely used workload management system in many HPC (High-Performance Computing) compute clusters. It plays a vital role in efficiently allocating compute resources, running work on these allocated resources, and overseeing a queue of pending work.

Integrating Slurm with Flyte presents a strategic opportunity to greatly enhance the utilization of Flyte workflows on HPC clusters, providing many advantages.

Goal: What should the final outcome look like, ideally?

The aim is the implementation of a Flyte agent that facilitates task submission to HPC resources using the Slurm scheduler.

Typically, users interact with Slurm through its command-line interface (CLI). For instance, the sbatch command submits a job script for later execution. An optional Slurm daemon also offers a REST API for interacting with the Slurm system.

Describe alternatives you've considered

I don't know of anything comparable.

Propose: Link/Inline OR Additional context

I am available to offer support using Slurm and to test the Flyte agent.
https://github.com/JBris/slurm-rest-api-docker can be used for testing the Slurm CLI and the Slurm REST API.

Are you sure this issue hasn't been raised already?

  • Yes

Have you read the Code of Conduct?

  • Yes

Sources:


This is an AI-generated response and your feedback is appreciated! Please leave a 👍 if this is helpful and 👎 if it is not.

Mention @RunLLM to ask follow-up questions. See runllm.com for more!

@eapolinario eapolinario self-assigned this Aug 8, 2024
@eapolinario eapolinario removed the untriaged This issues has not yet been looked at by the Maintainers label Aug 8, 2024
@eapolinario
Copy link
Contributor

@BerndDoser , would this be something that you'd be interested to collaborate? The guide on how to develop a new agent has a lot of details, including how to develop agents locally (so that the dev cycle is tight).

@BerndDoser
Copy link
Author

Hi @eapolinario, thank you for your interest in the feature request. I am generally interested in collaborating. For example, I can set up a Slurm container to test such a feature. I am currently on vacation but will be back at the beginning of September.

@kumare3
Copy link
Contributor

kumare3 commented Sep 11, 2024

Please let us know when you try.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Assigned
Development

No branches or pull requests

4 participants