Skip to content
This repository has been archived by the owner on Jun 6, 2024. It is now read-only.

YARN dedicated VC #2790

Closed
4 tasks done
mzmssg opened this issue May 20, 2019 · 4 comments
Closed
4 tasks done

YARN dedicated VC #2790

mzmssg opened this issue May 20, 2019 · 4 comments
Assignees

Comments

@mzmssg
Copy link
Member

mzmssg commented May 20, 2019

Goal:
provide a feature to reserve nodes.

Solution:
leverage yarn node-label to create exclusive vc.

Items:

  • Cmdline tool to add dedicated vc
  • Rest-server expose accurate resource instead of percentage
  • Webportal show dedicated vc
  • metrics fix

Features might be impacted:
image

  1. Available gpus
    Now, available gpus should exclude dedicated gpus. Rest-api exposes it in resourceTotal.
  2. Available nodes
    Now some nodes are dedicated for a vc, should show the mapping. Rest-api expose it in nodeList.
@mzmssg
Copy link
Member Author

mzmssg commented May 30, 2019

Cmdline tools:
Add three command in node_maintain.py

python node_maintain.py dedicated-vc get -m {master ip}
python node_maintain.py dedicated-vc add -m {master ip} -v {vc name} -n {dedicated nodes}
python node_maintain.py dedicated-vc remove -m {master ip} -v {vc name}

@mzmssg
Copy link
Member Author

mzmssg commented May 30, 2019

Rest api:
Add dedicated and total resource field:

{
  //capacity percentage this virtual cluster can use of entire cluster
  "capacity":50,
  //max capacity percentage this virtual cluster can use of entire cluster
  "maxCapacity":100,
  // used capacity percentage this virtual cluster can use of entire cluster
  "usedCapacity":0,
  "numActiveJobs":0,
  "numJobs":0,
  "numPendingJobs":0,
  "resourcesUsed":{
   "memory":0,
   "vCores":0,
   "GPUs":0
  },
 "resourcesTotal":{
   "memory":0,
   "vCores":0,
   "GPUs":0
  },
  "dedicated": true/false,
  // RUNNING: vc is enabled
  // STOPPED: vc is disabled, without either new job or running job.
  // DRAINING: intermedia state from RUNNING to STOPPED, in waiting on existing job.
  "status":"RUNNING"/"STOPPED"/"DRAINING",
  "nodeList": [node1, node2]
}

@mzmssg
Copy link
Member Author

mzmssg commented May 30, 2019

Webportal:
add dedicated vc table, add total resource in existing columns, add bonus column:

image

image

@mzmssg
Copy link
Member Author

mzmssg commented May 31, 2019

Exporter & prometheus:
Exporter will consider node label when calculating available resource

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants