From b831cf3858c60b227a62cf8f992fae0eccc8bccf Mon Sep 17 00:00:00 2001 From: DarkLight1337 Date: Mon, 6 Jan 2025 06:50:11 +0000 Subject: [PATCH] Update Signed-off-by: DarkLight1337 --- docs/source/contributing/model/basic.md | 16 ++---- docs/source/contributing/model/index.md | 2 +- docs/source/contributing/model/oot.md | 34 ----------- .../source/contributing/model/registration.md | 56 +++++++++++++++++++ 4 files changed, 62 insertions(+), 46 deletions(-) delete mode 100644 docs/source/contributing/model/oot.md create mode 100644 docs/source/contributing/model/registration.md diff --git a/docs/source/contributing/model/basic.md b/docs/source/contributing/model/basic.md index fd364d70a71ba..14690ffe24a83 100644 --- a/docs/source/contributing/model/basic.md +++ b/docs/source/contributing/model/basic.md @@ -6,18 +6,12 @@ This guide walks you through the steps to implement a basic vLLM model. ## 1. Bring your model code -Start by forking our [GitHub repository](https://github.com/vllm-project/vllm) and then [build it from source](#build-from-source). -This gives you the ability to modify the codebase and test your model. - -Clone the PyTorch model code from the HuggingFace Transformers repository and put it into the directory. -For instance, vLLM's [OPT model](gh-file:vllm/model_executor/models/opt.py) was adapted from the HuggingFace's [modeling_opt.py](https://github.com/huggingface/transformers/blob/main/src/transformers/models/opt/modeling_opt.py) file. +First, clone the PyTorch model code from the source repository. +For instance, vLLM's [OPT model](gh-file:vllm/model_executor/models/opt.py) was adapted from +HuggingFace's [modeling_opt.py](https://github.com/huggingface/transformers/blob/main/src/transformers/models/opt/modeling_opt.py) file. ```{warning} -When copying the model code, make sure to review and adhere to the code's copyright and licensing terms. -``` - -```{tip} -If you don't want to fork the repository and modify vLLM's codebase, please refer to [Out-of-Tree Model Integration](#new-model-oot). +Make sure to review and adhere to the original code's copyright and licensing terms! ``` ## 2. Make your code compatible with vLLM @@ -105,4 +99,4 @@ This method should load the weights from the HuggingFace's checkpoint file and a ## 5. Register your model -Finally, add your `*ForCausalLM` class to `_VLLM_MODELS` in so that it is available by default. +See [this page](#new-model-registration) for instructions on how to register your new model to be used by vLLM. diff --git a/docs/source/contributing/model/index.md b/docs/source/contributing/model/index.md index 1e976873474b5..e3587754a6401 100644 --- a/docs/source/contributing/model/index.md +++ b/docs/source/contributing/model/index.md @@ -10,7 +10,7 @@ This section provides more information on how to integrate a [HuggingFace Transf basic multimodal -oot +registration ``` ```{note} diff --git a/docs/source/contributing/model/oot.md b/docs/source/contributing/model/oot.md deleted file mode 100644 index 780d2d542ea10..0000000000000 --- a/docs/source/contributing/model/oot.md +++ /dev/null @@ -1,34 +0,0 @@ -(new-model-oot)= - -# Out-of-Tree Model Integration - -You can integrate a model using a plugin without modifying the vLLM codebase. - -```{seealso} -[vLLM's Plugin System](#plugin-system) -``` - -To register the model, use the following code: - -```python -from vllm import ModelRegistry -from your_code import YourModelForCausalLM -ModelRegistry.register_model("YourModelForCausalLM", YourModelForCausalLM) -``` - -If your model imports modules that initialize CUDA, consider lazy-importing it to avoid errors like `RuntimeError: Cannot re-initialize CUDA in forked subprocess`: - -```python -from vllm import ModelRegistry - -ModelRegistry.register_model("YourModelForCausalLM", "your_code:YourModelForCausalLM") -``` - -```{important} -If your model is a multimodal model, ensure the model class implements the {class}`~vllm.model_executor.models.interfaces.SupportsMultiModal` interface. -Read more about that [here](#enabling-multimodal-inputs). -``` - -```{note} -Although you can directly put these code snippets in your script using `vllm.LLM`, the recommended way is to place these snippets in a vLLM plugin. This ensures compatibility with various vLLM features like distributed inference and the API server. -``` diff --git a/docs/source/contributing/model/registration.md b/docs/source/contributing/model/registration.md new file mode 100644 index 0000000000000..d591f378b70e5 --- /dev/null +++ b/docs/source/contributing/model/registration.md @@ -0,0 +1,56 @@ +(new-model-registration)= + +# Model Registration + +vLLM relies on a model registry to determine how to run each model. +A list of pre-registered architectures can be found on the [Supported Models](#supported-mm-models) page. + +If your model is not on this list, you must register it to vLLM. +This page provides detailed instructions on how to do so. + +## Built-in models + +To add a model directly to the vLLM library, start by forking our [GitHub repository](https://github.com/vllm-project/vllm) and then [build it from source](#build-from-source). +This gives you the ability to modify the codebase and test your model. + +After you have implemented your model (see [tutorial](#new-model-basic)), put it into the directory. +Then, add your model class to `_VLLM_MODELS` in so that it is automatically registered upon importing vLLM. +You should also include an example HuggingFace repository for this model in to run the unit tests. +Finally, update the [Supported Models](#supported-mm-models) documentation page to promote your model! + +```{important} +The list of models in each section should be maintained in alphabetical order. +``` + +## Out-of-tree models + +You can load an external model using a plugin without modifying the vLLM codebase. + +```{seealso} +[vLLM's Plugin System](#plugin-system) +``` + +To register the model, use the following code: + +```python +from vllm import ModelRegistry +from your_code import YourModelForCausalLM +ModelRegistry.register_model("YourModelForCausalLM", YourModelForCausalLM) +``` + +If your model imports modules that initialize CUDA, consider lazy-importing it to avoid errors like `RuntimeError: Cannot re-initialize CUDA in forked subprocess`: + +```python +from vllm import ModelRegistry + +ModelRegistry.register_model("YourModelForCausalLM", "your_code:YourModelForCausalLM") +``` + +```{important} +If your model is a multimodal model, ensure the model class implements the {class}`~vllm.model_executor.models.interfaces.SupportsMultiModal` interface. +Read more about that [here](#enabling-multimodal-inputs). +``` + +```{note} +Although you can directly put these code snippets in your script using `vllm.LLM`, the recommended way is to place these snippets in a vLLM plugin. This ensures compatibility with various vLLM features like distributed inference and the API server. +```