Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[router] Release router 0.1.0 with dynamic scaling and fault tolerance #2455

Merged
merged 4 commits into from
Dec 11, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion rust/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"

[project]
name = "sglang-router"
version = "0.0.11"
version = "0.1.0"
description = "SGLang router is a standalone module implemented in Rust to achieve data parallelism across SGLang instances."
authors = [{name = "Byron Hsu", email = "[email protected]"}]
requires-python = ">=3.8"
Expand Down
58 changes: 58 additions & 0 deletions rust/v0.1.0.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
# SGLang Router v0.1.0: Dynamic Scaling and Fault Tolerance

We have released `sglang-router` v0.1.0 equipped with dynamic scaling and fault tolerance! It is essential for the router to be able to dynamically scale the number of workers and handle worker failures. To achieve this, we have implemented the following features:

## 1. Dynamic scaling: The router can dynamically scale the number of workers based on the request load.

We offer `/add_worker` and `/remove_worker` APIs to dynamically add or remove workers from the router.

- `/add_worker`

Usage:

```bash
$ curl -X POST http://localhost:30000/add_worker?url=http://worker_url_1
```

Example:

```bash
$ python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3.1-8B-Instruct --port 30001
$ curl -X POST http://localhost:30000/add_worker?url=http://127.0.0.1:30001
Successfully added worker: http://127.0.0.1:30001
```

- `/remove_worker`

Usage:

```bash
$ curl -X POST http://localhost:30000/remove_worker?url=http://worker_url_1
```

Example:

```bash
$ curl -X POST http://localhost:30000/remove_worker?url=http://127.0.0.1:30001
Successfully removed worker: http://127.0.0.1:30001
```

Note:

- For cache-aware router, the worker will be removed from the tree and the queues.

## 2. Fault tolerance: The router can handle worker failures and automatically remove the failed worker from the router.

We provide retries based for failure tolerance.

1. If the request to a worker fails for `max_worker_retries` times, the router will remove the worker from the router and move on to the next worker.
2. If the total number of retries exceeds `max_total_retries`, the router will return an error.

Note:

- `max_worker_retries` is 3 and `max_total_retries` is 6 by default.

Closing remarks:

1. Please read the full usage at https://sgl-project.github.io/router/router.html
2. The feature is still under active improvement, so please don't hesitate to raise issues or submit PRs if you have any suggestions or feedback.
Loading