From b831cf3858c60b227a62cf8f992fae0eccc8bccf Mon Sep 17 00:00:00 2001
From: DarkLight1337 <tlleungac@connect.ust.hk>
Date: Mon, 6 Jan 2025 06:50:11 +0000
Subject: [PATCH] Update

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
---
 docs/source/contributing/model/basic.md       | 16 ++----
 docs/source/contributing/model/index.md       |  2 +-
 docs/source/contributing/model/oot.md         | 34 -----------
 .../source/contributing/model/registration.md | 56 +++++++++++++++++++
 4 files changed, 62 insertions(+), 46 deletions(-)
 delete mode 100644 docs/source/contributing/model/oot.md
 create mode 100644 docs/source/contributing/model/registration.md

diff --git a/docs/source/contributing/model/basic.md b/docs/source/contributing/model/basic.md
index fd364d70a71ba..14690ffe24a83 100644
--- a/docs/source/contributing/model/basic.md
+++ b/docs/source/contributing/model/basic.md
@@ -6,18 +6,12 @@ This guide walks you through the steps to implement a basic vLLM model.
 
 ## 1. Bring your model code
 
-Start by forking our [GitHub repository](https://github.com/vllm-project/vllm) and then [build it from source](#build-from-source).
-This gives you the ability to modify the codebase and test your model.
-
-Clone the PyTorch model code from the HuggingFace Transformers repository and put it into the <gh-dir:vllm/model_executor/models> directory.
-For instance, vLLM's [OPT model](gh-file:vllm/model_executor/models/opt.py) was adapted from the HuggingFace's [modeling_opt.py](https://github.com/huggingface/transformers/blob/main/src/transformers/models/opt/modeling_opt.py) file.
+First, clone the PyTorch model code from the source repository.
+For instance, vLLM's [OPT model](gh-file:vllm/model_executor/models/opt.py) was adapted from
+HuggingFace's [modeling_opt.py](https://github.com/huggingface/transformers/blob/main/src/transformers/models/opt/modeling_opt.py) file.
 
 ```{warning}
-When copying the model code, make sure to review and adhere to the code's copyright and licensing terms.
-```
-
-```{tip}
-If you don't want to fork the repository and modify vLLM's codebase, please refer to [Out-of-Tree Model Integration](#new-model-oot).
+Make sure to review and adhere to the original code's copyright and licensing terms!
 ```
 
 ## 2. Make your code compatible with vLLM
@@ -105,4 +99,4 @@ This method should load the weights from the HuggingFace's checkpoint file and a
 
 ## 5. Register your model
 
-Finally, add your `*ForCausalLM` class to `_VLLM_MODELS` in <gh-file:vllm/model_executor/models/registry.py> so that it is available by default.
+See [this page](#new-model-registration) for instructions on how to register your new model to be used by vLLM.
diff --git a/docs/source/contributing/model/index.md b/docs/source/contributing/model/index.md
index 1e976873474b5..e3587754a6401 100644
--- a/docs/source/contributing/model/index.md
+++ b/docs/source/contributing/model/index.md
@@ -10,7 +10,7 @@ This section provides more information on how to integrate a [HuggingFace Transf
 
 basic
 multimodal
-oot
+registration
 ```
 
 ```{note}
diff --git a/docs/source/contributing/model/oot.md b/docs/source/contributing/model/oot.md
deleted file mode 100644
index 780d2d542ea10..0000000000000
--- a/docs/source/contributing/model/oot.md
+++ /dev/null
@@ -1,34 +0,0 @@
-(new-model-oot)=
-
-# Out-of-Tree Model Integration
-
-You can integrate a model using a plugin without modifying the vLLM codebase.
-
-```{seealso}
-[vLLM's Plugin System](#plugin-system)
-```
-
-To register the model, use the following code:
-
-```python
-from vllm import ModelRegistry
-from your_code import YourModelForCausalLM
-ModelRegistry.register_model("YourModelForCausalLM", YourModelForCausalLM)
-```
-
-If your model imports modules that initialize CUDA, consider lazy-importing it to avoid errors like `RuntimeError: Cannot re-initialize CUDA in forked subprocess`:
-
-```python
-from vllm import ModelRegistry
-
-ModelRegistry.register_model("YourModelForCausalLM", "your_code:YourModelForCausalLM")
-```
-
-```{important}
-If your model is a multimodal model, ensure the model class implements the {class}`~vllm.model_executor.models.interfaces.SupportsMultiModal` interface.
-Read more about that [here](#enabling-multimodal-inputs).
-```
-
-```{note}
-Although you can directly put these code snippets in your script using `vllm.LLM`, the recommended way is to place these snippets in a vLLM plugin. This ensures compatibility with various vLLM features like distributed inference and the API server.
-```
diff --git a/docs/source/contributing/model/registration.md b/docs/source/contributing/model/registration.md
new file mode 100644
index 0000000000000..d591f378b70e5
--- /dev/null
+++ b/docs/source/contributing/model/registration.md
@@ -0,0 +1,56 @@
+(new-model-registration)=
+
+# Model Registration
+
+vLLM relies on a model registry to determine how to run each model.
+A list of pre-registered architectures can be found on the [Supported Models](#supported-mm-models) page.
+
+If your model is not on this list, you must register it to vLLM.
+This page provides detailed instructions on how to do so.
+
+## Built-in models
+
+To add a model directly to the vLLM library, start by forking our [GitHub repository](https://github.com/vllm-project/vllm) and then [build it from source](#build-from-source).
+This gives you the ability to modify the codebase and test your model.
+
+After you have implemented your model (see [tutorial](#new-model-basic)), put it into the <gh-dir:vllm/model_executor/models> directory.
+Then, add your model class to `_VLLM_MODELS` in <gh-file:vllm/model_executor/models/registry.py> so that it is automatically registered upon importing vLLM.
+You should also include an example HuggingFace repository for this model in <gh-file:tests/models/registry.py> to run the unit tests.
+Finally, update the [Supported Models](#supported-mm-models) documentation page to promote your model!
+
+```{important}
+The list of models in each section should be maintained in alphabetical order.
+```
+
+## Out-of-tree models
+
+You can load an external model using a plugin without modifying the vLLM codebase.
+
+```{seealso}
+[vLLM's Plugin System](#plugin-system)
+```
+
+To register the model, use the following code:
+
+```python
+from vllm import ModelRegistry
+from your_code import YourModelForCausalLM
+ModelRegistry.register_model("YourModelForCausalLM", YourModelForCausalLM)
+```
+
+If your model imports modules that initialize CUDA, consider lazy-importing it to avoid errors like `RuntimeError: Cannot re-initialize CUDA in forked subprocess`:
+
+```python
+from vllm import ModelRegistry
+
+ModelRegistry.register_model("YourModelForCausalLM", "your_code:YourModelForCausalLM")
+```
+
+```{important}
+If your model is a multimodal model, ensure the model class implements the {class}`~vllm.model_executor.models.interfaces.SupportsMultiModal` interface.
+Read more about that [here](#enabling-multimodal-inputs).
+```
+
+```{note}
+Although you can directly put these code snippets in your script using `vllm.LLM`, the recommended way is to place these snippets in a vLLM plugin. This ensures compatibility with various vLLM features like distributed inference and the API server.
+```