Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Doc] Organise installation documentation into categories and tabs #11935

Merged
merged 23 commits into from
Jan 13, 2025
Merged
Show file tree
Hide file tree
Changes from 22 commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
0d41fd3
Initial commit
hmellor Jan 10, 2025
cc06ab3
Merge branch 'main' into group-installation-guides
hmellor Jan 12, 2025
090b662
Extract python env setup so it can be used by all index files
hmellor Jan 10, 2025
b513038
Complete CPU merge
hmellor Jan 10, 2025
cd25045
Match directory naming scheme (`-` -> `_`)
hmellor Jan 10, 2025
bfbe5b6
Change OpoenVINO title to match others
hmellor Jan 10, 2025
1a52098
Complete AI accelerator merging
hmellor Jan 10, 2025
3a89bde
codespell
hmellor Jan 10, 2025
373078b
Fix duplicate labels created by `.md` files that were in the source a…
hmellor Jan 10, 2025
4c4ec3e
Missed one `.inc.md` file
hmellor Jan 10, 2025
a3705e1
Remove extra dependency
hmellor Jan 11, 2025
ae99d91
Merge branch 'main' into group-installation-guides
hmellor Jan 12, 2025
c7349f9
`pymarkdownlnt fix docs -r`
hmellor Jan 12, 2025
f86e495
Respond to comment on GPU supported features
hmellor Jan 12, 2025
c07fede
Respond to comment on Apple silicon
hmellor Jan 12, 2025
5c45c16
Respond to comment on OpenVINO
hmellor Jan 12, 2025
64ea29b
`format.sh`
hmellor Jan 12, 2025
a1edbb3
Change AI accelerator index title
hmellor Jan 13, 2025
3c3ab44
Add tip for building CPU docker image
hmellor Jan 13, 2025
c5cfe49
Add placeholders for extra information
hmellor Jan 13, 2025
630e44f
`## (Python|Docker)` -> `## Set up using (Python|Docker)`
hmellor Jan 13, 2025
1740d65
Add placeholders for env creation section in GPU index
hmellor Jan 13, 2025
7b150a4
Make suggested title change
hmellor Jan 13, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@
# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
# This pattern also affects html_static_path and html_extra_path.
exclude_patterns: List[str] = ["**/*.template.md"]
exclude_patterns: List[str] = ["**/*.template.md", "**/*.inc.md"]

# Exclude the prompt "$" when copying code
copybutton_prompt_text = r"\$ "
Expand Down
4 changes: 4 additions & 0 deletions docs/source/deployment/docker.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@

# Using Docker

(deployment-docker-pre-built-image)=

## Use vLLM's Official Docker Image

vLLM offers an official Docker image for deployment.
Expand All @@ -23,6 +25,8 @@ container to access the host's shared memory. vLLM uses PyTorch, which uses shar
memory to share data between processes under the hood, particularly for tensor parallel inference.
```

(deployment-docker-build-image-from-source)=

## Building vLLM's Docker Image from Source

You can build and run vLLM from source via the provided <gh-file:Dockerfile>. To build vLLM:
Expand Down
4 changes: 3 additions & 1 deletion docs/source/features/compatibility_matrix.md
Original file line number Diff line number Diff line change
Expand Up @@ -322,7 +322,9 @@ Check the '✗' with links to see tracking issue for unsupported feature/hardwar

```

### Feature x Hardware
(feature-x-hardware)=

## Feature x Hardware

```{list-table}
:header-rows: 1
Expand Down
Original file line number Diff line number Diff line change
@@ -1,38 +1,23 @@
(installation-gaudi)=
# Installation

# Installation for Intel® Gaudi®
This tab provides instructions on running vLLM with Intel Gaudi devices.

This README provides instructions on running vLLM with Intel Gaudi devices.
## Requirements

## Requirements and Installation
- OS: Ubuntu 22.04 LTS
- Python: 3.10
- Intel Gaudi accelerator
- Intel Gaudi software version 1.18.0

Please follow the instructions provided in the [Gaudi Installation
Guide](https://docs.habana.ai/en/latest/Installation_Guide/index.html)
to set up the execution environment. To achieve the best performance,
please follow the methods outlined in the [Optimizing Training Platform
Guide](https://docs.habana.ai/en/latest/PyTorch/Model_Optimization_PyTorch/Optimization_in_Training_Platform.html).

### Requirements

- OS: Ubuntu 22.04 LTS
- Python: 3.10
- Intel Gaudi accelerator
- Intel Gaudi software version 1.18.0

### Quick start using Dockerfile

```console
docker build -f Dockerfile.hpu -t vllm-hpu-env .
docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --rm vllm-hpu-env
```

```{tip}
If you're observing the following error: `docker: Error response from daemon: Unknown runtime specified habana.`, please refer to "Install Using Containers" section of [Intel Gaudi Software Stack and Driver Installation](https://docs.habana.ai/en/v1.18.0/Installation_Guide/Bare_Metal_Fresh_OS.html). Make sure you have `habana-container-runtime` package installed and that `habana` container runtime is registered.
```
## Configure a new environment

### Build from source

#### Environment verification
### Environment verification

To verify that the Intel Gaudi software was correctly installed, run:

Expand All @@ -47,7 +32,7 @@ Refer to [Intel Gaudi Software Stack
Verification](https://docs.habana.ai/en/latest/Installation_Guide/SW_Verification.html#platform-upgrade)
for more details.

#### Run Docker Image
### Run Docker Image

It is highly recommended to use the latest Docker image from Intel Gaudi
vault. Refer to the [Intel Gaudi
Expand All @@ -61,7 +46,13 @@ docker pull vault.habana.ai/gaudi-docker/1.18.0/ubuntu22.04/habanalabs/pytorch-i
docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --ipc=host vault.habana.ai/gaudi-docker/1.18.0/ubuntu22.04/habanalabs/pytorch-installer-2.4.0:latest
```

#### Build and Install vLLM
## Set up using Python

### Pre-built wheels

Currently, there are no pre-built Intel Gaudi wheels.

### Build wheel from source

To build and install vLLM from source, run:

Expand All @@ -80,7 +71,26 @@ git checkout habana_main
python setup.py develop
```

## Supported Features
## Set up using Docker

### Pre-built images

Currently, there are no pre-built Intel Gaudi images.

### Build image from source

```console
docker build -f Dockerfile.hpu -t vllm-hpu-env .
docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --rm vllm-hpu-env
```

```{tip}
If you're observing the following error: `docker: Error response from daemon: Unknown runtime specified habana.`, please refer to "Install Using Containers" section of [Intel Gaudi Software Stack and Driver Installation](https://docs.habana.ai/en/v1.18.0/Installation_Guide/Bare_Metal_Fresh_OS.html). Make sure you have `habana-container-runtime` package installed and that `habana` container runtime is registered.
```

## Extra information

## Supported features

- [Offline inference](#offline-inference)
- Online serving via [OpenAI-Compatible Server](#openai-compatible-server)
Expand All @@ -94,14 +104,14 @@ python setup.py develop
for accelerating low-batch latency and throughput
- Attention with Linear Biases (ALiBi)

## Unsupported Features
## Unsupported features

- Beam search
- LoRA adapters
- Quantization
- Prefill chunking (mixed-batch inferencing)

## Supported Configurations
## Supported configurations

The following configurations have been validated to be function with
Gaudi2 devices. Configurations that are not listed may or may not work.
Expand Down Expand Up @@ -137,7 +147,7 @@ Gaudi2 devices. Configurations that are not listed may or may not work.
- [meta-llama/Meta-Llama-3.1-70B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-70B-Instruct)
with tensor parallelism on 8x HPU, BF16 datatype with random or greedy sampling

## Performance Tuning
## Performance tuning

### Execution modes

Expand Down Expand Up @@ -368,7 +378,7 @@ Additionally, there are HPU PyTorch Bridge environment variables impacting vLLM
- `PT_HPU_LAZY_MODE`: if `0`, PyTorch Eager backend for Gaudi will be used, if `1` PyTorch Lazy backend for Gaudi will be used, `1` is default
- `PT_HPU_ENABLE_LAZY_COLLECTIVES`: required to be `true` for tensor parallel inference with HPU Graphs

## Troubleshooting: Tweaking HPU Graphs
## Troubleshooting: tweaking HPU graphs

If you experience device out-of-memory issues or want to attempt
inference at higher batch sizes, try tweaking HPU Graphs by following
Expand Down
Loading
Loading