Skip to content
This repository has been archived by the owner on Jun 6, 2024. It is now read-only.

Need to know all the jobs in a Node #2128

Closed
scarlett2018 opened this issue Feb 2, 2019 · 8 comments
Closed

Need to know all the jobs in a Node #2128

scarlett2018 opened this issue Feb 2, 2019 · 8 comments

Comments

@scarlett2018
Copy link
Member

scarlett2018 commented Feb 2, 2019

One useful scenario is: Before Node upgrade/migration/deletion, admin need to know all the jobs in the corresponding node. there are more other scenarios.

@fanyangCS
Copy link
Contributor

close and track in #349

@scarlett2018 scarlett2018 changed the title Before Node upgrade/migration/deletion, need to know all the jobs in the Node. Need to know all the jobs in a Node Feb 26, 2019
@scarlett2018
Copy link
Member Author

@fanyangCS - I'm reopening the issue as it is a display and query ask for Node<>job list, not just for decommission. I rephrased the job accordingly.

@scarlett2018 scarlett2018 reopened this Feb 26, 2019
@scarlett2018 scarlett2018 added this to the 0.11.0 milestone Feb 26, 2019
@xudifsd
Copy link
Member

xudifsd commented Feb 26, 2019

related PR #2197

@xudifsd
Copy link
Member

xudifsd commented Feb 28, 2019

#2197 merged. But seems user want something like, jobs with low resource util are marked with red:

image

We can add extra column to following page:

image

Also user want ordinary user could see the page instead of admin.

Should we discuss this? @scarlett2018 and @fanyangCS @Gerhut @Anbang-Hu

@scarlett2018
Copy link
Member Author

@xudifsd - Agree the services page seems the most fit place for this request for now.

@xudifsd
Copy link
Member

xudifsd commented Feb 28, 2019

Webportal can query prometheus using query task_gpu_percent to get node, job mapping and display correctly in the page, the result of query should be like task_gpu_percent{container_env_PAI_TASK_INDEX="0",container_label_PAI_CURRENT_TASK_ROLE_NAME="worker",container_label_PAI_HOSTNAME="paigcr-a-gpu-1104",container_label_PAI_JOB_NAME="admin~dixu-gpu-test",container_label_PAI_USER_NAME="admin",instance="10.151.40.4:9102",job="pai_serivce_exporter",minor_number="0",pai_service_name="job-exporter",scraped_from="job-exporter-wxhc4"}, instance is where this job exist, and other labels can be extracted to make up job name.

For low resource usage, Webportal can query using task_gpu_percent < 50 and task_gpu_mem_percent < 50 to get low resource util of job, assume we only care gpu, and threshold for low is 50%.

@scarlett2018
Copy link
Member Author

#2197 merged. But seems user want something like, jobs with low resource util are marked with red:

image

We can add extra column to following page:

image

Also user want ordinary user could see the page instead of admin.

Should we discuss this? @scarlett2018 and @fanyangCS @Gerhut @Anbang-Hu

Created an issue for this ask in #2250

@xudifsd
Copy link
Member

xudifsd commented Mar 1, 2019

close this and discuss in #2250

@xudifsd xudifsd closed this as completed Mar 1, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants