Skip to content
This repository has been archived by the owner on Jun 6, 2024. It is now read-only.

[dedicated vc] management tool #2923

Merged
merged 6 commits into from
Jun 14, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
98 changes: 98 additions & 0 deletions docs/tools/dedicated_vc.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
# dedicated_vc

## Overview

Unlike shared Virtual Clusters sharing cluster nodes, dedicated Virtual Cluster is binding to 1 or more physical nodes.
Once a node is assigned to a dedicated VC, shared VCs are no longer able to use its resource.
The whole cluster resource is split as below:

```
Cluster Resource
├── Shared Resource:
│ ├── DEFAULT: capacity
│ ├── Shared VC_1: capacity
│ └── Shared VC_2: capacity
└── Dedicated Resource:
├── Dedicated VC_1: node1, node2
└── Dedicated VC_2: node3, node4

shared_vc_resource = shared_resource * shared_vc_capacity
dedicated_vc_resource = sum(dedicated_vc_nodes)
```

A job submitted to Shared VC might be scheduler to any shared nodes,
oppositely, the one submitted to Dedicated VC could only be scheduler to corresponding VC dedicated nodes.


Currently we support configure shared_vc by web UI, but only cmdline tool for dedicated_vc.
This doc introduce more details.


## Commands

We provide get, add and remove dedicated vc in the node_maintain.py, working directory is pai/src/tools.
```bash
python node_maintain.py dedicated-vc {get,add,remove}
```

### Get dedicated-vc

```bash
python node_maintain.py dedicated-vc get -m {master_ip}
```
This command output dedicated vc name, nodes and total resource,

#### Examples:

```
$ python node_maintain.py dedicated-vc get -m 10.0.0.1
dedicated_1:
Nodes:
Resource: <CPUs:0.0, Memory:0.0MB, GPUs:0.0>
dedicated_2:
Nodes: 10.0.0.2, 10.0.0.3
Resource: <CPUs:24.0, Memory:208896.0MB, GPUs:4.0>
```


### Add dedicated-vc

```bash
python node_maintain.py dedicated-vc add -m {master_ip} -v {added_vc_name} [-n {added_nodes}]
```
This command added {added_nodes} to {added_vc_name}, if {added_vc_name} was not found, this command would create it firstly.
The dedicated_vc resource is allocated from Shared VC pool and subtracted from DEFAULT VC quota.
The remaining Shared VCs' capacity will be recalculated to ensure a constant **GPU** quota.
If no enough DEFAULT quota, allocation will raise error.

#### Examples:

```
# Add an empty dedicated_3
$ python node_maintain.py dedicated-vc add -m 10.0.0.1 -v dedicated_3

# Add 10.0.0.4 to dedicated_3
$ python node_maintain.py dedicated-vc add -m 10.0.0.1 -v dedicated_3 -n 10.0.0.4
```

### Remove dedicated-vc

```bash
python node_maintain.py dedicated-vc remove -m {master_ip} -v {removed_vc_name} [-n {removed_nodes}]
```
This command deleted {removed_nodes} from {removed_vc_name}, if {removed_nodes} omitted, it would delete whole vc.
Deleted resource will be back to Shared VC pool, more specifically, to DEFAULT VC.

#### Examples:

```
# Remove 10.0.0.2 from dedicated_2
$ python node_maintain.py dedicated-vc remove -m 10.0.0.1 -v dedicated_2 -n 10.0.0.2

# Remove dedicated_2 and free all nodes
$ python node_maintain.py dedicated-vc remove -m 10.0.0.1 -v dedicated_2
```




2 changes: 1 addition & 1 deletion src/dev-box/build/dev-box.dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ RUN apt-get -y update && \
net-tools && \
mkdir -p /cluster-configuration &&\
git clone https://github.com/Microsoft/pai.git &&\
pip install python-etcd docker kubernetes GitPython jsonschema
pip install python-etcd docker kubernetes GitPython jsonschema attrs dicttoxml beautifulsoup4

WORKDIR /tmp

Expand Down
Loading