Skip to content

Commit

Permalink
docs: Update Triton documentation and examples (#3668)
Browse files Browse the repository at this point in the history
  • Loading branch information
ssheng authored Mar 14, 2023
1 parent eaa6218 commit 42a62d8
Show file tree
Hide file tree
Showing 7 changed files with 44 additions and 75 deletions.
11 changes: 11 additions & 0 deletions docs/source/integrations/triton.rst
Original file line number Diff line number Diff line change
Expand Up @@ -408,6 +408,17 @@ HTTP/REST APIs is disabled by default, though it can be enabled when creating th

Additionally, BentoML will allocate a random port for the gRPC/HTTP server, hence ``grpc-port`` or ``http-port`` options that is passed to Runner ``cli_args`` will be omitted.

Adaptive Batching
^^^^^^^^^^^^^^^^^

:ref:`Adaptive batching <guides/batching:Adaptive Batching>` is a feature supported by BentoML runners that allows for efficient batch size selection during inference. However, it's important to note that this feature is not compatible with ``TritonRunner``.

``TritonRunner`` is designed as a standalone Triton server, which means that the adaptive batching logic in BentoML runners is not invoked when using ``TritonRunner``.

Fortunately, Triton supports its own solution for efficient batching called `dynamic batching <https://github.com/triton-inference-server/server/blob/main/docs/user_guide/model_configuration.md#scheduling-and-batching>`_.
Similar to adaptive batching, dynamic batching also allows for the selection of the optimal batch size during inference. To use dynamic batching in Triton, relevant settings can be specified in the
`model configuration <https://github.com/triton-inference-server/server/blob/main/docs/user_guide/model_configuration.md#model-configuration>`_ file.

.. admonition:: 🚧 Help us improve the integration!

This integration is still in its early stages and we are looking for feedbacks and contributions to make it even better!
Expand Down
33 changes: 10 additions & 23 deletions examples/triton/onnx/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,16 @@ triton_runner = bentoml.triton.Runner(
)
```

CLI arguments can be passed through the `cli_args` argument of `bentoml.triton.Runner`:

```python
triton_runner = bentoml.triton.Runner(
"triton-runners",
model_repository="s3://path/to/model_repository",
cli_args=["--load-model=torchscrip_yolov5s", "--model-control-mode=explicit"],
)
```

An example of inference API:

```python
Expand Down Expand Up @@ -60,29 +70,6 @@ docker:
> tritonserver are currently only supported with `--production` tag. Make sure
> to have `tritonserver` binary available in PATH if running locally.

To pass triton arguments to `serve` do it via
`--triton-options ARG=VALUE[, VALUE]`

```bash
bentoml serve --production --triton-options log-verbose=True
```

or via `bentoml.serve`:

```python
import bentoml
server = bentoml.serve(
bento,
server_type='grpc',
production=True,
triton_args=[
"model-control-mode=explicit",
"load-model=onnx_yolov5s",
],
)
```

To find out more about BentoML Runner architecture, see
[our latest documentation](https://docs.bentoml.org/en/latest/concepts/runner.html#)

Expand Down
1 change: 1 addition & 0 deletions examples/triton/onnx/train.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@
raise bentoml.exceptions.NotFound(
"'override=True', overriding previously saved weights/conversions."
)
print(f"{bento_model_name} already exists. Skipping...")
except bentoml.exceptions.NotFound:
ModelProto = onnx.load(MODEL_FILE.with_suffix(".onnx").__fspath__())
onnx_checker.check_model(ModelProto)
Expand Down
36 changes: 10 additions & 26 deletions examples/triton/pytorch/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,16 @@ triton_runner = bentoml.triton.Runner(
)
```

CLI arguments can be passed through the `cli_args` argument of `bentoml.triton.Runner`:

```python
triton_runner = bentoml.triton.Runner(
"triton-runners",
model_repository="s3://path/to/model_repository",
cli_args=["--load-model=torchscrip_yolov5s", "--model-control-mode=explicit"],
)
```

An example of inference API:

```python
Expand Down Expand Up @@ -57,32 +67,6 @@ docker:
base_image: nvcr.io/nvidia/tritonserver:22.12-py3
```
> tritonserver are currently only supported with `--production` tag. Make sure
> to have `tritonserver` binary available in PATH if running locally.

To pass triton arguments to `serve` do it via
`--triton-options ARG=VALUE[, VALUE]`

```bash
bentoml serve --production --triton-options log-verbose=True
```

or via `bentoml.serve`:

```python
import bentoml
server = bentoml.serve(
bento,
server_type='grpc',
production=True,
triton_args=[
"model-control-mode=explicit",
"load-model=pytorch_yolov5s",
],
)
```

To find out more about BentoML Runner architecture, see
[our latest documentation](https://docs.bentoml.org/en/latest/concepts/runner.html#)
Expand Down
1 change: 1 addition & 0 deletions examples/triton/pytorch/train.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@
raise bentoml.exceptions.NotFound(
"'override=True', overriding previously saved weights/conversions."
)
print(f"{bento_model_name} already exists. Skipping...")
except bentoml.exceptions.NotFound:
print(
"Saved model:",
Expand Down
36 changes: 10 additions & 26 deletions examples/triton/tensorflow/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,16 @@ triton_runner = bentoml.triton.Runner(
)
```

CLI arguments can be passed through the `cli_args` argument of `bentoml.triton.Runner`:

```python
triton_runner = bentoml.triton.Runner(
"triton-runners",
model_repository="s3://path/to/model_repository",
cli_args=["--load-model=torchscrip_yolov5s", "--model-control-mode=explicit"],
)
```

An example of inference API:

```python
Expand Down Expand Up @@ -57,32 +67,6 @@ docker:
base_image: nvcr.io/nvidia/tritonserver:22.12-py3
```
> tritonserver are currently only supported with `--production` tag. Make sure
> to have `tritonserver` binary available in PATH if running locally.

To pass triton arguments to `serve` do it via
`--triton-options ARG=VALUE[, VALUE]`

```bash
bentoml serve --production --triton-options log-verbose=True
```

or via `bentoml.serve`:

```python
import bentoml
server = bentoml.serve(
bento,
server_type='grpc',
production=True,
triton_args=[
"model-control-mode=explicit",
"load-model=tensorflow_yolov5s",
],
)
```

To find out more about BentoML Runner architecture, see
[our latest documentation](https://docs.bentoml.org/en/latest/concepts/runner.html#)
Expand Down
1 change: 1 addition & 0 deletions examples/triton/tensorflow/train.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@
raise bentoml.exceptions.NotFound(
"'override=True', overriding previously saved weights/conversions."
)
print(f"{bento_model_name} already exists. Skipping...")
except bentoml.exceptions.NotFound:
_, metadata = load_traced_script()
model = tf.saved_model.load(
Expand Down

0 comments on commit 42a62d8

Please sign in to comment.