Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: Add model loading doc #5049

Merged
merged 2 commits into from
Oct 30, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -159,7 +159,7 @@ Check out the [examples](./examples/) folder for more sample code and usage.
- [GPU inference](https://docs.bentoml.com/en/latest/guides/gpu-inference.html)
- [Distributed serving systems](https://docs.bentoml.com/en/latest/guides/distributed-services.html)
- [Concurrency and autoscaling](https://docs.bentoml.com/en/latest/bentocloud/how-tos/autoscaling.html)
- [Model packaging and Model Store](https://docs.bentoml.com/en/latest/guides/model-store.html)
- [Model loading and Model Store](https://docs.bentoml.com/en/latest/guides/model-loading-and-management.html)
- [Observability](https://docs.bentoml.com/en/latest/guides/observability/index.html)
- [BentoCloud deployment](https://docs.bentoml.com/en/latest/guides/deployment.html)

Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion docs/source/get-started/introduction.rst
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ The following is the basic workflow of using the BentoML framework.
1. Model registration
^^^^^^^^^^^^^^^^^^^^^

To get started, you can save your model in the BentoML :doc:`/guides/model-store`, a centralized repository for managing all local models. BentoML is compatible with a variety of models, including pre-trained models from Hugging Face or custom models trained on your custom datasets. The Model Store simplifies the process of iterating and evaluating different model versions, providing an efficient way to track and manage your ML assets.
To get started, you can save your model in the BentoML :doc:`/guides/model-loading-and-management`, a centralized repository for managing all local models. BentoML is compatible with a variety of models, including pre-trained models from Hugging Face or custom models trained on your custom datasets. The Model Store simplifies the process of iterating and evaluating different model versions, providing an efficient way to track and manage your ML assets.

Note that for simple use cases, you can **skip this step** and use pre-trained models directly when creating your BentoML Service.

Expand Down
2 changes: 1 addition & 1 deletion docs/source/guides/build-options.rst
Original file line number Diff line number Diff line change
Expand Up @@ -132,7 +132,7 @@ Alternatively, create a ``.bentoignore`` file in the ``build_ctx`` directory as
``models``
^^^^^^^^^^

You can specify the model to be used for building a Bento using a string model tag or a dictionary. When you start from an existing project, you can download models from BentoCloud to your local :doc:`/guides/model-store` with the ``models`` configurations by running ``bentoml models pull``.
You can specify the model to be used for building a Bento using a string model tag or a dictionary. When you start from an existing project, you can download models from BentoCloud to your local :doc:`/guides/model-loading-and-management` with the ``models`` configurations by running ``bentoml models pull``.

See the following example for details. If you don't define models in ``bentofile.yaml``, the model specified in the Service is used to build the Bento.

Expand Down
8 changes: 4 additions & 4 deletions docs/source/guides/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -45,11 +45,11 @@ This chapter introduces the key features of BentoML. We recommend you read :doc:

Customize the build configurations of a Bento.

.. grid-item-card:: :doc:`/guides/model-store`
:link: /guides/model-store
.. grid-item-card:: :doc:`/guides/model-loading-and-management`
:link: /guides/model-loading-and-management
:link-type: doc

Use the BentoML local Model Store to manage your models in a unified way.
Load AI models and manage them in a unified way.

.. grid-item-card:: :doc:`/guides/tasks`
:link: /guides/tasks
Expand Down Expand Up @@ -138,7 +138,7 @@ This chapter introduces the key features of BentoML. We recommend you read :doc:
containerization
workers
build-options
model-store
model-loading-and-management
tasks
gpu-inference
model-composition
Expand Down
Original file line number Diff line number Diff line change
@@ -1,17 +1,16 @@
===========
Model Store
===========
============================
Model loading and management
============================

BentoML provides a local Model Store to save and manage models, which is essentially a local file directory maintained by BentoML. This document explains how to use the BentoML Model Store.
BentoML offers simple APIs for you to load, store and manage AI models.

When should you use the Model Store?
------------------------------------
Understand the Model Store
--------------------------

While it's straightforward to download and use pre-trained models from public model hubs like Hugging Face directly within a ``service.py`` file for simple use cases, more complex scenarios often require a more organized approach to model management. We recommend you use the BentoML Model Store in the following scenarios:
BentoML provides a local Model Store to save and manage models, which is essentially a local file directory maintained by BentoML. It is useful in several scenarios including:

- **Private model management**: If you are working with private models that have been fine-tuned or trained from scratch for specific tasks, using BentoML's Model Store offers a secure and efficient way to store, version, and access these models across your projects.
- **Model cataloging**: BentoML's Model Store facilitates easy cataloging and versioning of models, enabling you to maintain a clear record of model iterations and switch between different model versions as required.
- **Model downloading acceleration in BentoCloud**: For deployment on BentoCloud, the Model Store improves the cold start time of model downloading. BentoCloud caches models to expedite their availability and supports streaming loading of models directly to GPU memory.
- **Private model management**: For private models fine-tuned or trained for specific tasks, using BentoML's Model Store offers a secure and efficient way to store, version, and access them.
- **Model cataloging**: BentoML's Model Store facilitates easy cataloging and versioning of models, enabling you to maintain a clear record of model iterations and switch.

Save a model
------------
Expand Down Expand Up @@ -51,49 +50,65 @@ If you have an existing model on disk, you can import it into the BentoML Model
shutil.copytree(local_model_dir, model_ref.path, dirs_exist_ok=True)
print(f"Model saved: {model_ref}")

Retrieve a model
----------------
Load a model
------------

To retrieve a model from the BentoML Model Store, use the ``get`` method.
BentoML provides an efficient mechanism for loading AI models to accelerate model deployment, reducing image build time and cold start time.

.. code-block:: python
.. tab-set::

import bentoml
bento_model: bentoml.Model = bentoml.models.get("summarization-model:latest")
.. tab-item:: From Hugging Face

# Print related attributes of the model object.
print(bento_model.tag)
print(bento_model.path)
To load a model from Hugging Face (HF), instantiate a ``HuggingFaceModel`` class from ``bentoml.models`` and specify the model ID as shown on HF. For a gated Hugging Face model, remember to export your `Hugging Face API token <https://huggingface.co/docs/hub/en/security-tokens>`_ as environment variables before loading the model.

``bentoml.models.get`` returns a ``bentoml.Model`` instance, linking to a saved model entry in the BentoML Model Store. You can then use the instance to get model information like tag, labels, and file system paths, or create a :doc:`Service </guides/services>` on top of it.
Here is an example:

For example, you can load the model into a Transformers pipeline from the ``path`` provided by the ``bentoml.Model`` instance as below. See more in :doc:`/get-started/quickstart`.
.. code-block:: python

.. code-block:: python
import bentoml
from bentoml.models import HuggingFaceModel
from transformers import AutoModelForSequenceClassification, AutoTokenizer

import bentoml
from transformers import pipeline
@bentoml.service(resources={"cpu": "200m", "memory": "512Mi"})
class MyService:
# Specify a model from HF with its ID
model_ref = HuggingFaceModel("google-bert/bert-base-uncased")

def __init__(self):
# Load the actual model and tokenizer within the instance context
self.model = AutoModelForSequenceClassification.from_pretrained(self.model_ref)
self.tokenizer = AutoTokenizer.from_pretrained(self.model_ref)

If you deploy the HF model to BentoCloud, you can view and verify it within your Bento on the details page. It is indicated with the HF icon. Clicking it redirects you to the model page on HF.

.. image:: ../../_static/img/guides/model-loading-and-management/hf-model-on-bentocloud.png

.. tab-item:: From the Model Store or BentoCloud

To load a model from the local Model Store or BentoCloud, instantiate a ``BentoModel`` from ``bentoml.models`` and specify its model tag. Make sure the model is stored locally or available in BentoCloud.

@bentoml.service
class Summarization:
# Define the model as a class variable
model_ref = bentoml.models.get("summarization-model")
Here is an example:

def __init__(self) -> None:
# Load model into pipeline
self.pipeline = pipeline('summarization', self.model_ref.path)
.. code-block:: python

@bentoml.api
def summarize(self, text: str = EXAMPLE_INPUT) -> str:
...
import bentoml
from bentoml.models import BentoModel
import joblib

@bentoml.service(resources={"cpu": "200m", "memory": "512Mi"})
class MyService:
# Define model reference at the class level
# Load a model from the Model Store or BentoCloud
iris_ref = BentoModel("iris_sklearn:latest")

Models must be retrieved from the class scope of a Service. Defining the model as a class variable declares it as a dependency of the Service, ensuring the models are referenced by the Bento when transported and deployed.
def __init__(self):
self.iris_model = joblib.load(self.iris_ref.path_of("model.pkl"))

.. warning::
.. important::

If ``bentoml.models.get()`` is called inside the constructor of a Service class, the model will not be referenced by the Bento therefore not pushed or deployed, leading to model ``NotFound`` in BentoML store error.
When using ``HuggingFaceModel`` and ``BentoModel``, you must load the model from the class scope of a Service. Defining the model as a class variable declares it as a dependency of the Service, ensuring the models are referenced by the Bento when transported and deployed. If you call these two APIs within the constructor of a Service class, the model will not be referenced by the Bento. As a result, it will not be pushed or deployed, leading to a model ``NotFound`` error.

For more information, see :doc:`/reference/stores`.

Manage models
-------------
Expand Down
2 changes: 1 addition & 1 deletion docs/source/guides/services.rst
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ Methods within the class which are defined as accessible HTTP API endpoints are

.. note::

This Service downloads a pre-trained model from Hugging Face. It is possible to use your own model within the Service class. For more information, see :doc:`/guides/model-store`.
This Service downloads a pre-trained model from Hugging Face. It is possible to use your own model within the Service class. For more information, see :doc:`/guides/model-loading-and-management`.

Test the Service code
---------------------
Expand Down
13 changes: 13 additions & 0 deletions docs/source/reference/stores.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,19 @@ Manage Bentos
.. autofunction:: bentoml.build
.. autofunction:: bentoml.bentos.build_bentofile

Load models
-----------

.. autoclass:: bentoml.models.BentoModel
:members: to_info, from_info, resolve
:undoc-members:
:show-inheritance:

.. autoclass:: bentoml.models.HuggingFaceModel
:members: to_info, from_info, resolve
:undoc-members:
:show-inheritance:

Manage models
-------------

Expand Down
2 changes: 1 addition & 1 deletion docs/source/use-cases/custom-models/mlflow.rst
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ This example uses the ``scikit-learn`` framework to train a classification model
model.fit(X_train, Y_train)
mlflow.sklearn.save_model(model, model_uri.resolve())

Next, use the ``bentoml.mlflow.import_model`` API to save the model to the BentoML :doc:`/guides/model-store`, a local directory to store and manage models. You can retrieve this model later in other services to run predictions.
Next, use the ``bentoml.mlflow.import_model`` API to save the model to the BentoML :doc:`/guides/model-loading-and-management`, a local directory to store and manage models. You can retrieve this model later in other services to run predictions.

.. code-block:: bash

Expand Down
2 changes: 1 addition & 1 deletion docs/source/use-cases/custom-models/xgboost.rst
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ This example uses the ``scikit-learn`` framework to load and preprocess the `bre
# Train the model
model = xgb.train(param, dt)

After training, use the ``bentoml.xgboost.save_model`` API to save the model to the BentoML :doc:`/guides/model-store`, a local directory to store and manage models. You can retrieve this model later in other services to run predictions.
After training, use the ``bentoml.xgboost.save_model`` API to save the model to the BentoML :doc:`/guides/model-loading-and-management`, a local directory to store and manage models. You can retrieve this model later in other services to run predictions.

.. code-block:: bash

Expand Down
Loading