Update

Signed-off-by: DarkLight1337 <[email protected]>
vllm-project · Jan 6, 2025 · b831cf3 · b831cf3
1 parent 22677bc
commit b831cf3
Show file tree

Hide file tree

Showing 4 changed files with 62 additions and 46 deletions.
diff --git a/docs/source/contributing/model/basic.md b/docs/source/contributing/model/basic.md
@@ -6,18 +6,12 @@ This guide walks you through the steps to implement a basic vLLM model.
 
 ## 1. Bring your model code
 
-Start by forking our [GitHub repository](https://github.com/vllm-project/vllm) and then [build it from source](#build-from-source).
-This gives you the ability to modify the codebase and test your model.
-
-Clone the PyTorch model code from the HuggingFace Transformers repository and put it into the <gh-dir:vllm/model_executor/models> directory.
-For instance, vLLM's [OPT model](gh-file:vllm/model_executor/models/opt.py) was adapted from the HuggingFace's [modeling_opt.py](https://github.com/huggingface/transformers/blob/main/src/transformers/models/opt/modeling_opt.py) file.
+First, clone the PyTorch model code from the source repository.
+For instance, vLLM's [OPT model](gh-file:vllm/model_executor/models/opt.py) was adapted from
+HuggingFace's [modeling_opt.py](https://github.com/huggingface/transformers/blob/main/src/transformers/models/opt/modeling_opt.py) file.
 
 ```{warning}
-When copying the model code, make sure to review and adhere to the code's copyright and licensing terms.
-```
-
-```{tip}
-If you don't want to fork the repository and modify vLLM's codebase, please refer to [Out-of-Tree Model Integration](#new-model-oot).
+Make sure to review and adhere to the original code's copyright and licensing terms!
 ```
 
 ## 2. Make your code compatible with vLLM
@@ -105,4 +99,4 @@ This method should load the weights from the HuggingFace's checkpoint file and a
 
 ## 5. Register your model
 
-Finally, add your `*ForCausalLM` class to `_VLLM_MODELS` in <gh-file:vllm/model_executor/models/registry.py> so that it is available by default.
+See [this page](#new-model-registration) for instructions on how to register your new model to be used by vLLM.
diff --git a/docs/source/contributing/model/index.md b/docs/source/contributing/model/index.md
@@ -10,7 +10,7 @@ This section provides more information on how to integrate a [HuggingFace Transf
 
 basic
 multimodal
-oot
+registration
 ```
 
 ```{note}

diff --git a/docs/source/contributing/model/oot.md b/docs/source/contributing/model/oot.md
diff --git a/docs/source/contributing/model/registration.md b/docs/source/contributing/model/registration.md
@@ -0,0 +1,56 @@
+(new-model-registration)=
+
+# Model Registration
+
+vLLM relies on a model registry to determine how to run each model.
+A list of pre-registered architectures can be found on the [Supported Models](#supported-mm-models) page.
+
+If your model is not on this list, you must register it to vLLM.
+This page provides detailed instructions on how to do so.
+
+## Built-in models
+
+To add a model directly to the vLLM library, start by forking our [GitHub repository](https://github.com/vllm-project/vllm) and then [build it from source](#build-from-source).
+This gives you the ability to modify the codebase and test your model.
+
+After you have implemented your model (see [tutorial](#new-model-basic)), put it into the <gh-dir:vllm/model_executor/models> directory.
+Then, add your model class to `_VLLM_MODELS` in <gh-file:vllm/model_executor/models/registry.py> so that it is automatically registered upon importing vLLM.
+You should also include an example HuggingFace repository for this model in <gh-file:tests/models/registry.py> to run the unit tests.
+Finally, update the [Supported Models](#supported-mm-models) documentation page to promote your model!
+
+```{important}
+The list of models in each section should be maintained in alphabetical order.
+```
+
+## Out-of-tree models
+
+You can load an external model using a plugin without modifying the vLLM codebase.
+
+```{seealso}
+[vLLM's Plugin System](#plugin-system)
+```
+
+To register the model, use the following code:
+
+```python
+from vllm import ModelRegistry
+from your_code import YourModelForCausalLM
+ModelRegistry.register_model("YourModelForCausalLM", YourModelForCausalLM)
+```
+
+If your model imports modules that initialize CUDA, consider lazy-importing it to avoid errors like `RuntimeError: Cannot re-initialize CUDA in forked subprocess`:
+
+```python
+from vllm import ModelRegistry
+
+ModelRegistry.register_model("YourModelForCausalLM", "your_code:YourModelForCausalLM")
+```
+
+```{important}
+If your model is a multimodal model, ensure the model class implements the {class}`~vllm.model_executor.models.interfaces.SupportsMultiModal` interface.
+Read more about that [here](#enabling-multimodal-inputs).
+```
+
+```{note}
+Although you can directly put these code snippets in your script using `vllm.LLM`, the recommended way is to place these snippets in a vLLM plugin. This ensures compatibility with various vLLM features like distributed inference and the API server.
+```
-Original file line number
+Diff line change
@@ Expand Up @@
     basic
     multimodal
-    oot
+    registration
     ```
     ```{note}
@@ Expand Down @@