docs: Add model loading doc (#5049)

* Add model loading docs Signed-off-by: Sherlock113 <[email protected]> * Add API doc Signed-off-by: Sherlock113 <[email protected]> --------- Signed-off-by: Sherlock113 <[email protected]>
bentoml · Oct 30, 2024 · bc2e5d0 · bc2e5d0
1 parent e684484
commit bc2e5d0
Show file tree

Hide file tree

Showing 10 changed files with 75 additions and 47 deletions.
diff --git a/README.md b/README.md
@@ -159,7 +159,7 @@ Check out the [examples](./examples/) folder for more sample code and usage.
 - [GPU inference](https://docs.bentoml.com/en/latest/guides/gpu-inference.html)
 - [Distributed serving systems](https://docs.bentoml.com/en/latest/guides/distributed-services.html)
 - [Concurrency and autoscaling](https://docs.bentoml.com/en/latest/bentocloud/how-tos/autoscaling.html)
-- [Model packaging and Model Store](https://docs.bentoml.com/en/latest/guides/model-store.html)
+- [Model loading and Model Store](https://docs.bentoml.com/en/latest/guides/model-loading-and-management.html)
 - [Observability](https://docs.bentoml.com/en/latest/guides/observability/index.html)
 - [BentoCloud deployment](https://docs.bentoml.com/en/latest/guides/deployment.html)
 

diff --git a/...urce/_static/img/guides/model-loading-and-management/hf-model-on-bentocloud.png b/...urce/_static/img/guides/model-loading-and-management/hf-model-on-bentocloud.png
diff --git a/docs/source/get-started/introduction.rst b/docs/source/get-started/introduction.rst
@@ -56,7 +56,7 @@ The following is the basic workflow of using the BentoML framework.
 1. Model registration
 ^^^^^^^^^^^^^^^^^^^^^
 
-To get started, you can save your model in the BentoML :doc:`/guides/model-store`, a centralized repository for managing all local models. BentoML is compatible with a variety of models, including pre-trained models from Hugging Face or custom models trained on your custom datasets. The Model Store simplifies the process of iterating and evaluating different model versions, providing an efficient way to track and manage your ML assets.
+To get started, you can save your model in the BentoML :doc:`/guides/model-loading-and-management`, a centralized repository for managing all local models. BentoML is compatible with a variety of models, including pre-trained models from Hugging Face or custom models trained on your custom datasets. The Model Store simplifies the process of iterating and evaluating different model versions, providing an efficient way to track and manage your ML assets.
 
 Note that for simple use cases, you can **skip this step** and use pre-trained models directly when creating your BentoML Service.
 

diff --git a/docs/source/guides/build-options.rst b/docs/source/guides/build-options.rst
@@ -132,7 +132,7 @@ Alternatively, create a ``.bentoignore`` file in the ``build_ctx`` directory as
 ``models``
 ^^^^^^^^^^
 
-You can specify the model to be used for building a Bento using a string model tag or a dictionary. When you start from an existing project, you can download models from BentoCloud to your local :doc:`/guides/model-store` with the ``models`` configurations by running ``bentoml models pull``.
+You can specify the model to be used for building a Bento using a string model tag or a dictionary. When you start from an existing project, you can download models from BentoCloud to your local :doc:`/guides/model-loading-and-management` with the ``models`` configurations by running ``bentoml models pull``.
 
 See the following example for details. If you don't define models in ``bentofile.yaml``, the model specified in the Service is used to build the Bento.
 

diff --git a/docs/source/guides/index.rst b/docs/source/guides/index.rst
@@ -45,11 +45,11 @@ This chapter introduces the key features of BentoML. We recommend you read :doc:
 
         Customize the build configurations of a Bento.
 
-    .. grid-item-card:: :doc:`/guides/model-store`
-        :link: /guides/model-store
+    .. grid-item-card:: :doc:`/guides/model-loading-and-management`
+        :link: /guides/model-loading-and-management
         :link-type: doc
 
-        Use the BentoML local Model Store to manage your models in a unified way.
+        Load AI models and manage them in a unified way.
 
     .. grid-item-card:: :doc:`/guides/tasks`
         :link: /guides/tasks
@@ -138,7 +138,7 @@ This chapter introduces the key features of BentoML. We recommend you read :doc:
     containerization
     workers
     build-options
-    model-store
+    model-loading-and-management
     tasks
     gpu-inference
     model-composition

diff --git a/docs/source/guides/model-store.rst → ...e/guides/model-loading-and-management.rst b/docs/source/guides/model-store.rst → ...e/guides/model-loading-and-management.rst
@@ -1,17 +1,16 @@
-===========
-Model Store
-===========
+============================
+Model loading and management
+============================
 
-BentoML provides a local Model Store to save and manage models, which is essentially a local file directory maintained by BentoML. This document explains how to use the BentoML Model Store.
+BentoML offers simple APIs for you to load, store and manage AI models.
 
-When should you use the Model Store?
-------------------------------------
+Understand the Model Store
+--------------------------
 
-While it's straightforward to download and use pre-trained models from public model hubs like Hugging Face directly within a ``service.py`` file for simple use cases, more complex scenarios often require a more organized approach to model management. We recommend you use the BentoML Model Store in the following scenarios:
+BentoML provides a local Model Store to save and manage models, which is essentially a local file directory maintained by BentoML. It is useful in several scenarios including:
 
-- **Private model management**: If you are working with private models that have been fine-tuned or trained from scratch for specific tasks, using BentoML's Model Store offers a secure and efficient way to store, version, and access these models across your projects.
-- **Model cataloging**: BentoML's Model Store facilitates easy cataloging and versioning of models, enabling you to maintain a clear record of model iterations and switch between different model versions as required.
-- **Model downloading acceleration in BentoCloud**: For deployment on BentoCloud, the Model Store improves the cold start time of model downloading. BentoCloud caches models to expedite their availability and supports streaming loading of models directly to GPU memory.
+- **Private model management**: For private models fine-tuned or trained for specific tasks, using BentoML's Model Store offers a secure and efficient way to store, version, and access them.
+- **Model cataloging**: BentoML's Model Store facilitates easy cataloging and versioning of models, enabling you to maintain a clear record of model iterations and switch.
 
 Save a model
 ------------
@@ -51,49 +50,65 @@ If you have an existing model on disk, you can import it into the BentoML Model
         shutil.copytree(local_model_dir, model_ref.path, dirs_exist_ok=True)
         print(f"Model saved: {model_ref}")
 
-Retrieve a model
-----------------
+Load a model
+------------
 
-To retrieve a model from the BentoML Model Store, use the ``get`` method.
+BentoML provides an efficient mechanism for loading AI models to accelerate model deployment, reducing image build time and cold start time.
 
-.. code-block:: python
+.. tab-set::
 
-    import bentoml
-    bento_model: bentoml.Model = bentoml.models.get("summarization-model:latest")
+   .. tab-item:: From Hugging Face
 
-    # Print related attributes of the model object.
-    print(bento_model.tag)
-    print(bento_model.path)
+      To load a model from Hugging Face (HF), instantiate a ``HuggingFaceModel`` class from ``bentoml.models`` and specify the model ID as shown on HF. For a gated Hugging Face model, remember to export your `Hugging Face API token <https://huggingface.co/docs/hub/en/security-tokens>`_ as environment variables before loading the model.
 
-``bentoml.models.get`` returns a ``bentoml.Model`` instance, linking to a saved model entry in the BentoML Model Store. You can then use the instance to get model information like tag, labels, and file system paths, or create a :doc:`Service </guides/services>` on top of it.
+      Here is an example:
 
-For example, you can load the model into a Transformers pipeline from the ``path`` provided by the ``bentoml.Model`` instance as below. See more in :doc:`/get-started/quickstart`.
+      .. code-block:: python
 
-.. code-block:: python
+         import bentoml
+         from bentoml.models import HuggingFaceModel
+         from transformers import AutoModelForSequenceClassification, AutoTokenizer
 
-    import bentoml
-    from transformers import pipeline
+         @bentoml.service(resources={"cpu": "200m", "memory": "512Mi"})
+         class MyService:
+             # Specify a model from HF with its ID
+             model_ref = HuggingFaceModel("google-bert/bert-base-uncased")
+
+             def __init__(self):
+                 # Load the actual model and tokenizer within the instance context
+                 self.model = AutoModelForSequenceClassification.from_pretrained(self.model_ref)
+                 self.tokenizer = AutoTokenizer.from_pretrained(self.model_ref)
+
+      If you deploy the HF model to BentoCloud, you can view and verify it within your Bento on the details page. It is indicated with the HF icon. Clicking it redirects you to the model page on HF.
+
+      .. image:: ../../_static/img/guides/model-loading-and-management/hf-model-on-bentocloud.png
+
+   .. tab-item:: From the Model Store or BentoCloud
+
+      To load a model from the local Model Store or BentoCloud, instantiate a ``BentoModel`` from ``bentoml.models`` and specify its model tag. Make sure the model is stored locally or available in BentoCloud.
 
-    @bentoml.service
-    class Summarization:
-        # Define the model as a class variable
-        model_ref = bentoml.models.get("summarization-model")
+      Here is an example:
 
-        def __init__(self) -> None:
-            # Load model into pipeline
-            self.pipeline = pipeline('summarization', self.model_ref.path)
+      .. code-block:: python
 
-        @bentoml.api
-        def summarize(self, text: str = EXAMPLE_INPUT) -> str:
-            ...
+         import bentoml
+         from bentoml.models import BentoModel
+         import joblib
 
+         @bentoml.service(resources={"cpu": "200m", "memory": "512Mi"})
+         class MyService:
+             # Define model reference at the class level
+             # Load a model from the Model Store or BentoCloud
+             iris_ref = BentoModel("iris_sklearn:latest")
 
-Models must be retrieved from the class scope of a Service. Defining the model as a class variable declares it as a dependency of the Service, ensuring the models are referenced by the Bento when transported and deployed.
+             def __init__(self):
+                 self.iris_model = joblib.load(self.iris_ref.path_of("model.pkl"))
 
-.. warning::
+.. important::
 
-    If ``bentoml.models.get()`` is called inside the constructor of a Service class, the model will not be referenced by the Bento therefore not pushed or deployed, leading to model ``NotFound`` in BentoML store error.
+   When using ``HuggingFaceModel`` and ``BentoModel``, you must load the model from the class scope of a Service. Defining the model as a class variable declares it as a dependency of the Service, ensuring the models are referenced by the Bento when transported and deployed. If you call these two APIs within the constructor of a Service class, the model will not be referenced by the Bento. As a result, it will not be pushed or deployed, leading to a model ``NotFound`` error.
 
+For more information, see :doc:`/reference/stores`.
 
 Manage models
 -------------

diff --git a/docs/source/guides/services.rst b/docs/source/guides/services.rst
@@ -35,7 +35,7 @@ Methods within the class which are defined as accessible HTTP API endpoints are
 
 .. note::
 
-    This Service downloads a pre-trained model from Hugging Face. It is possible to use your own model within the Service class. For more information, see :doc:`/guides/model-store`.
+    This Service downloads a pre-trained model from Hugging Face. It is possible to use your own model within the Service class. For more information, see :doc:`/guides/model-loading-and-management`.
 
 Test the Service code
 ---------------------

diff --git a/docs/source/reference/stores.rst b/docs/source/reference/stores.rst
@@ -15,6 +15,19 @@ Manage Bentos
 .. autofunction:: bentoml.build
 .. autofunction:: bentoml.bentos.build_bentofile
 
+Load models
+-----------
+
+.. autoclass:: bentoml.models.BentoModel
+    :members: to_info, from_info, resolve
+    :undoc-members:
+    :show-inheritance:
+
+.. autoclass:: bentoml.models.HuggingFaceModel
+    :members: to_info, from_info, resolve
+    :undoc-members:
+    :show-inheritance:
+
 Manage models
 -------------
 

diff --git a/docs/source/use-cases/custom-models/mlflow.rst b/docs/source/use-cases/custom-models/mlflow.rst
@@ -41,7 +41,7 @@ This example uses the ``scikit-learn`` framework to train a classification model
     model.fit(X_train, Y_train)
     mlflow.sklearn.save_model(model, model_uri.resolve())
 
-Next, use the ``bentoml.mlflow.import_model`` API to save the model to the BentoML :doc:`/guides/model-store`, a local directory to store and manage models. You can retrieve this model later in other services to run predictions.
+Next, use the ``bentoml.mlflow.import_model`` API to save the model to the BentoML :doc:`/guides/model-loading-and-management`, a local directory to store and manage models. You can retrieve this model later in other services to run predictions.
 
 .. code-block:: bash
 

diff --git a/docs/source/use-cases/custom-models/xgboost.rst b/docs/source/use-cases/custom-models/xgboost.rst
@@ -49,7 +49,7 @@ This example uses the ``scikit-learn`` framework to load and preprocess the `bre
     # Train the model
     model = xgb.train(param, dt)
 
-After training, use the ``bentoml.xgboost.save_model`` API to save the model to the BentoML :doc:`/guides/model-store`, a local directory to store and manage models. You can retrieve this model later in other services to run predictions.
+After training, use the ``bentoml.xgboost.save_model`` API to save the model to the BentoML :doc:`/guides/model-loading-and-management`, a local directory to store and manage models. You can retrieve this model later in other services to run predictions.
 
 .. code-block:: bash