From c7aacd61b7e20b183f3d2c41e61f6d07ecdaf324 Mon Sep 17 00:00:00 2001
From: Sherlock113 <sherlockxu07@gmail.com>
Date: Tue, 8 Oct 2024 10:41:50 +0800
Subject: [PATCH 1/2] Clean up examples

Signed-off-by: Sherlock113 <sherlockxu07@gmail.com>
---
 docs/source/index.rst                         |   6 -
 docs/source/use-cases/audio/index.rst         |  30 ---
 docs/source/use-cases/audio/whisperx.rst      | 178 --------------
 docs/source/use-cases/audio/xtts.rst          | 177 --------------
 .../use-cases/diffusion-models/index.rst      |  14 --
 .../diffusion-models/sdxl-lcm-lora.rst        | 186 ---------------
 .../source/use-cases/diffusion-models/svd.rst | 196 ----------------
 .../use-cases/embeddings/clip-embeddings.rst  | 222 ------------------
 docs/source/use-cases/embeddings/index.rst    |  31 ---
 .../embeddings/sentence-transformer.rst       | 196 ----------------
 docs/source/use-cases/index.rst               |  28 +--
 docs/source/use-cases/more-examples/index.rst |  23 ++
 .../more-examples/inference-apis.rst          |  25 ++
 docs/source/use-cases/multimodality/blip.rst  | 155 ------------
 docs/source/use-cases/multimodality/index.rst |  23 --
 15 files changed, 55 insertions(+), 1435 deletions(-)
 delete mode 100644 docs/source/use-cases/audio/index.rst
 delete mode 100644 docs/source/use-cases/audio/whisperx.rst
 delete mode 100644 docs/source/use-cases/audio/xtts.rst
 delete mode 100644 docs/source/use-cases/diffusion-models/sdxl-lcm-lora.rst
 delete mode 100644 docs/source/use-cases/diffusion-models/svd.rst
 delete mode 100644 docs/source/use-cases/embeddings/clip-embeddings.rst
 delete mode 100644 docs/source/use-cases/embeddings/index.rst
 delete mode 100644 docs/source/use-cases/embeddings/sentence-transformer.rst
 create mode 100644 docs/source/use-cases/more-examples/index.rst
 create mode 100644 docs/source/use-cases/more-examples/inference-apis.rst
 delete mode 100644 docs/source/use-cases/multimodality/blip.rst
 delete mode 100644 docs/source/use-cases/multimodality/index.rst

diff --git a/docs/source/index.rst b/docs/source/index.rst
index 3934b188861..b6ed170f84b 100644
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@@ -46,12 +46,6 @@ Featured examples
 
         Deploy an image generation application capable of creating high-quality visuals with just a single inference step.
 
-    .. grid-item-card:: :doc:`/use-cases/audio/whisperx`
-        :link: /use-cases/audio/whisperx
-        :link-type: doc
-
-        Deploy a speech recognition application.
-
 Start your BentoML journey
 --------------------------
 
diff --git a/docs/source/use-cases/audio/index.rst b/docs/source/use-cases/audio/index.rst
deleted file mode 100644
index 909001fc2ad..00000000000
--- a/docs/source/use-cases/audio/index.rst
+++ /dev/null
@@ -1,30 +0,0 @@
-=====
-Audio
-=====
-
-This section provides example projects for audio ML tasks.
-
-.. grid:: 1 2 2 2
-    :gutter: 3
-    :margin: 0
-    :padding: 3 4 0 0
-
-    .. grid-item-card:: :doc:`/use-cases/audio/whisperx`
-        :link: /use-cases/audio/whisperx
-        :link-type: doc
-
-        Deploy a speech recognition application with BentoML.
-
-    .. grid-item-card:: :doc:`/use-cases/audio/xtts`
-        :link: /use-cases/audio/xtts
-        :link-type: doc
-
-        Deploy a text-to-speech application with BentoML.
-
-.. toctree::
-    :maxdepth: 1
-    :titlesonly:
-    :hidden:
-
-    whisperx
-    xtts
diff --git a/docs/source/use-cases/audio/whisperx.rst b/docs/source/use-cases/audio/whisperx.rst
deleted file mode 100644
index 6fde37410a3..00000000000
--- a/docs/source/use-cases/audio/whisperx.rst
+++ /dev/null
@@ -1,178 +0,0 @@
-============================
-WhisperX: Speech recognition
-============================
-
-Speech recognition involves the translation of spoken words into text. It is widely used in AI scenarios like virtual assistants, voice-controlled devices, and automated transcription services.
-
-This document demonstrates how to create a speech recognition application with BentoML. It is inspired by the `WhisperX <https://github.com/m-bain/whisperX>`_ project.
-
-All the source code in this tutorial is available in the `BentoWhisperX GitHub repository <https://github.com/bentoml/BentoWhisperX>`_.
-
-Prerequisites
--------------
-
-- Python 3.9+ and ``pip`` installed. See the `Python downloads page <https://www.python.org/downloads/>`_ to learn more.
-- You have a basic understanding of key concepts in BentoML, such as Services. We recommend you read :doc:`/get-started/quickstart` first.
-- If you want to test this project locally, install FFmpeg on your system.
-- Gain access to the model used in this project: `pyannote/segmentation-3.0 <https://huggingface.co/pyannote/segmentation-3.0>`_.
-- (Optional) We recommend you create a virtual environment for dependency isolation. See the `Conda documentation <https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html>`_ or the `Python documentation <https://docs.python.org/3/library/venv.html>`_ for details.
-
-Install dependencies
---------------------
-
-Clone the project repository and install all the dependencies.
-
-.. code-block:: bash
-
-    git clone https://github.com/bentoml/BentoWhisperX.git
-    cd BentoWhisperX
-    pip install -r requirements.txt
-
-Create a BentoML Service
-------------------------
-
-Create a :doc:`BentoML Service </guides/services>` to define the serving logic of this project. Here is an example file in the project:
-
-.. code-block:: python
-    :caption: `service.py`
-
-    import bentoml
-    import os
-    import typing as t
-
-    from pathlib import Path
-
-    LANGUAGE_CODE = "en"
-
-
-    @bentoml.service(
-        traffic={
-            "timeout": 30,
-            "concurrency": 1,
-        },
-        resources={
-            "gpu": 1,
-            "gpu_type": "nvidia_tesla_t4",
-        },
-    )
-    class WhisperX:
-        """
-        This class is inspired by the implementation shown in the whisperX project.
-        Source: https://github.com/m-bain/whisperX
-        """
-
-        def __init__(self):
-            import torch
-            import whisperx
-
-            self.batch_size = 16 # reduce if low on GPU mem
-            self.device = "cuda" if torch.cuda.is_available() else "cpu"
-            compute_type = "float16" if torch.cuda.is_available() else "int8"
-            self.model = whisperx.load_model("large-v2", self.device, compute_type=compute_type, language=LANGUAGE_CODE)
-            self.model_a, self.metadata = whisperx.load_align_model(language_code=LANGUAGE_CODE, device=self.device)
-
-        @bentoml.api
-        def transcribe(self, audio_file: Path) -> t.Dict:
-            import whisperx
-
-            audio = whisperx.load_audio(audio_file)
-            result = self.model.transcribe(audio, batch_size=self.batch_size)
-            result = whisperx.align(result["segments"], self.model_a, self.metadata, audio, self.device, return_char_alignments=False)
-
-            return result
-
-A breakdown of the Service code:
-
-* The ``@bentoml.service`` decorator is used to define the ``WhisperX`` class as a BentoML Service, specifying additional configurations like timeout and resource allocations (GPU and memory).
-* During initialization, this Service does the following:
-
-  - Loads the Whisper model with a specific language code, device, and compute type. It runs on either a GPU or CPU based on availability.
-  - Loads an alignment model and metadata for the specified language.
-
-* The Service exposes a ``transcribe`` API endpoint: Takes an audio file path as input, uses the Whisper model to transcribe the audio, and aligns the transcription with the audio using the alignment model and metadata. The transcription result is returned as a dictionary.
-
-Run ``bentoml serve`` to start the Service.
-
-.. code-block:: bash
-
-    $ bentoml serve service:WhisperX
-
-    2024-01-22T02:29:10+0000 [WARNING] [cli] Converting 'WhisperX' to lowercase: 'whisperx'.
-    2024-01-22T02:29:11+0000 [INFO] [cli] Starting production HTTP BentoServer from "service:BentoWhisperX" listening on http://localhost:3000 (Press CTRL+C to quit)
-
-The server is active at `http://localhost:3000 <http://localhost:3000>`_. You can interact with it in different ways.
-
-.. tab-set::
-
-    .. tab-item:: CURL
-
-        .. code-block:: bash
-
-            curl -X 'POST' \
-                'http://localhost:3000/transcribe' \
-                -H 'accept: application/json' \
-                -H 'Content-Type: multipart/form-data' \
-                -F 'audio_file=@female.wav;type=audio/wav'
-
-    .. tab-item:: Python client
-
-        You can either include an URL or a local path to your audio file in the BentoML :doc:`client </guides/clients>`.
-
-        .. code-block:: python
-
-            from pathlib import Path
-            import bentoml
-
-            with bentoml.SyncHTTPClient('http://localhost:3000') as client:
-                audio_url = 'https://example.org/female.wav'
-                response = client.transcribe(audio_file=audio_url)
-                print(response)
-
-    .. tab-item:: Swagger UI
-
-        Visit `http://localhost:3000 <http://localhost:3000/>`_, scroll down to **Service APIs**, and select an audio file for interaction.
-
-        .. image:: ../../_static/img/use-cases/audio/whisperx/service-ui.png
-
-Expected output:
-
-.. code-block:: bash
-
-    {"segments":[{"start":0.009,"end":2.813,"text":" The Hispaniola was rolling scuppers under in the ocean swell.","words":[{"word":"The","start":0.009,"end":0.069,"score":0.0},{"word":"Hispaniola","start":0.109,"end":0.81,"score":0.917},{"word":"was","start":0.83,"end":0.95,"score":0.501},{"word":"rolling","start":0.99,"end":1.251,"score":0.839},{"word":"scuppers","start":1.311,"end":1.671,"score":0.947},{"word":"under","start":1.751,"end":1.932,"score":0.939},{"word":"in","start":1.952,"end":2.012,"score":0.746},{"word":"the","start":2.032,"end":2.132,"score":0.667},{"word":"ocean","start":2.212,"end":2.472,"score":0.783},{"word":"swell.","start":2.512,"end":2.813,"score":0.865}]},{"start":3.494,"end":10.263,"text":"The booms were tearing at the blocks, the rudder was banging to and fro, and the whole ship creaking, groaning, and jumping like a manufactory.","words":[{"word":"The","start":3.494,"end":3.594,"score":0.752},{"word":"booms","start":3.614,"end":3.914,"score":0.867},{"word":"were","start":3.934,"end":4.054,"score":0.778},{"word":"tearing","start":4.074,"end":4.315,"score":0.808},{"word":"at","start":4.335,"end":4.395,"score":0.748},{"word":"the","start":4.415,"end":4.475,"score":0.993},{"word":"blocks,","start":4.495,"end":4.855,"score":0.918},{"word":"the","start":5.236,"end":5.316,"score":0.859},{"word":"rudder","start":5.356,"end":5.576,"score":0.894},{"word":"was","start":5.596,"end":5.717,"score":0.711},{"word":"banging","start":5.757,"end":6.117,"score":0.767},{"word":"to","start":6.177,"end":6.317,"score":0.781},{"word":"and","start":6.377,"end":6.458,"score":0.833},{"word":"fro,","start":6.498,"end":6.758,"score":0.657},{"word":"and","start":7.058,"end":7.159,"score":0.759},{"word":"the","start":7.179,"end":7.259,"score":0.833},{"word":"whole","start":7.299,"end":7.479,"score":0.807},{"word":"ship","start":7.539,"end":7.759,"score":0.79},{"word":"creaking,","start":7.859,"end":8.26,"score":0.774},{"word":"groaning,","start":8.44,"end":8.821,"score":0.75},{"word":"and","start":8.861,"end":8.941,"score":0.837},{"word":"jumping","start":8.981,"end":9.321,"score":0.859},{"word":"like","start":9.382,"end":9.502,"score":0.876},{"word":"a","start":9.542,"end":9.582,"score":0.5},{"word":"manufactory.","start":9.622,"end":10.263,"score":0.886}]}],"word_segments":[{"word":"The","start":0.009,"end":0.069,"score":0.0},{"word":"Hispaniola","start":0.109,"end":0.81,"score":0.917},{"word":"was","start":0.83,"end":0.95,"score":0.501},{"word":"rolling","start":0.99,"end":1.251,"score":0.839},{"word":"scuppers","start":1.311,"end":1.671,"score":0.947},{"word":"under","start":1.751,"end":1.932,"score":0.939},{"word":"in","start":1.952,"end":2.012,"score":0.746},{"word":"the","start":2.032,"end":2.132,"score":0.667},{"word":"ocean","start":2.212,"end":2.472,"score":0.783},{"word":"swell.","start":2.512,"end":2.813,"score":0.865},{"word":"The","start":3.494,"end":3.594,"score":0.752},{"word":"booms","start":3.614,"end":3.914,"score":0.867},{"word":"were","start":3.934,"end":4.054,"score":0.778},{"word":"tearing","start":4.074,"end":4.315,"score":0.808},{"word":"at","start":4.335,"end":4.395,"score":0.748},{"word":"the","start":4.415,"end":4.475,"score":0.993},{"word":"blocks,","start":4.495,"end":4.855,"score":0.918},{"word":"the","start":5.236,"end":5.316,"score":0.859},{"word":"rudder","start":5.356,"end":5.576,"score":0.894},{"word":"was","start":5.596,"end":5.717,"score":0.711},{"word":"banging","start":5.757,"end":6.117,"score":0.767},{"word":"to","start":6.177,"end":6.317,"score":0.781},{"word":"and","start":6.377,"end":6.458,"score":0.833},{"word":"fro,","start":6.498,"end":6.758,"score":0.657},{"word":"and","start":7.058,"end":7.159,"score":0.759},{"word":"the","start":7.179,"end":7.259,"score":0.833},{"word":"whole","start":7.299,"end":7.479,"score":0.807},{"word":"ship","start":7.539,"end":7.759,"score":0.79},{"word":"creaking,","start":7.859,"end":8.26,"score":0.774},{"word":"groaning,","start":8.44,"end":8.821,"score":0.75},{"word":"and","start":8.861,"end":8.941,"score":0.837},{"word":"jumping","start":8.981,"end":9.321,"score":0.859},{"word":"like","start":9.382,"end":9.502,"score":0.876},{"word":"a","start":9.542,"end":9.582,"score":0.5},{"word":"manufactory.","start":9.622,"end":10.263,"score":0.886}]}%
-
-Deploy to BentoCloud
---------------------
-
-After the Service is ready, you can deploy the project to BentoCloud for better management and scalability. `Sign up <https://www.bentoml.com/>`_ for a BentoCloud account and get $10 in free credits.
-
-First, specify a configuration YAML file (``bentofile.yaml``) to define the build options for your application. It is used for packaging your application into a Bento. Here is an example file in the project directory:
-
-.. code-block:: yaml
-    :caption: `bentofile.yaml`
-
-    service: "service:WhisperX"
-    labels:
-      owner: bentoml-team
-      project: gallery
-    include:
-      - "*.py"
-    python:
-      requirements_txt: "./requirements.txt"
-    docker:
-      system_packages:
-        - ffmpeg
-        - git
-
-:ref:`Log in to BentoCloud <bentocloud/how-tos/manage-access-token:Log in to BentoCloud using the BentoML CLI>` by running ``bentoml cloud login``, then run the following command to deploy the project.
-
-.. code-block:: bash
-
-    bentoml deploy .
-
-Once the Deployment is up and running on BentoCloud, you can access it via the exposed URL.
-
-.. image:: ../../_static/img/use-cases/audio/whisperx/whisperx-bentocloud.png
-
-.. note::
-
-   For custom deployment in your own infrastructure, use BentoML to :doc:`generate an OCI-compliant image</guides/containerization>`.
diff --git a/docs/source/use-cases/audio/xtts.rst b/docs/source/use-cases/audio/xtts.rst
deleted file mode 100644
index 390f450661f..00000000000
--- a/docs/source/use-cases/audio/xtts.rst
+++ /dev/null
@@ -1,177 +0,0 @@
-====================
-XTTS: Text to speech
-====================
-
-Text-to-speech machine learning technology can convert written text into spoken words. This may involve analyzing the text, understanding its structure and meaning, and then generating speech that mimics human voice and intonation.
-
-This document demonstrates how to build a text-to-speech application using BentoML, powered by the model `XTTS <https://huggingface.co/coqui/XTTS-v2>`_.
-
-All the source code in this tutorial is available in the `BentoXTTS GitHub repository <https://github.com/bentoml/BentoXTTS>`_.
-
-Prerequisites
--------------
-
-- Python 3.9+ and ``pip`` installed. See the `Python downloads page <https://www.python.org/downloads/>`_ to learn more.
-- You have a basic understanding of key concepts in BentoML, such as Services. We recommend you read :doc:`/get-started/quickstart` first.
-- (Optional) We recommend you create a virtual environment for dependency isolation. See the `Conda documentation <https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html>`_ or the `Python documentation <https://docs.python.org/3/library/venv.html>`_ for details.
-
-Install dependencies
---------------------
-
-Clone the project repository and install all the dependencies.
-
-.. code-block:: bash
-
-    git clone https://github.com/bentoml/BentoXTTS.git
-    cd BentoXTTS
-    pip install -r requirements.txt
-
-Create a BentoML Service
-------------------------
-
-Define a :doc:`BentoML Service </guides/services>` to customize the serving logic of the model. You can find the following example ``service.py`` file in the cloned repository.
-
-.. code-block:: python
-    :caption: `service.py`
-
-    from __future__ import annotations
-
-    import os
-    import typing as t
-    from pathlib import Path
-
-    import torch
-    from TTS.api import TTS
-
-    import bentoml
-
-    MODEL_ID = "tts_models/multilingual/multi-dataset/xtts_v2"
-
-    sample_input_data = {
-        'text': 'It took me quite a long time to develop a voice and now that I have it I am not going to be silent.',
-        'language': 'en',
-    }
-
-    @bentoml.service(
-        traffic={
-            "timeout": 30,
-            "concurrency": 1,
-        },
-        resources={
-            "gpu": 1,
-            "gpu_type": "nvidia_tesla_t4",
-        },
-    )
-    class XTTS:
-        def __init__(self) -> None:
-            self.tts = TTS(MODEL_ID, gpu=torch.cuda.is_available())
-
-        @bentoml.api
-        def synthesize(
-                self,
-                context: bentoml.Context,
-                text: str = sample_input_data["text"],
-                lang: str = sample_input_data["language"],
-        ) -> t.Annotated[Path, bentoml.validators.ContentType('audio/*')]:
-            output_path = os.path.join(context.temp_dir, "output.wav")
-            sample_path = "./female.wav"
-            if not os.path.exists(sample_path):
-                sample_path = "./src/female.wav"
-
-            self.tts.tts_to_file(
-                text,
-                file_path=output_path,
-                speaker_wav=sample_path,
-                language=lang,
-                split_sentences=True,
-            )
-            return Path(output_path)
-
-A breakdown of the Service code:
-
-- ``@bentoml.service`` decorates the class ``XTTS`` to define it as a BentoML Service, configuring resources (GPU and memory) and traffic timeout.
-- In the class, the ``__init__`` method initializes an instance of the ``TTS`` model using the ``MODEL_ID`` specified. It checks if a GPU is available and sets the model to use it if so.
-- The ``synthesize`` method is defined as an API endpoint. It takes ``context``, ``text``, and ``lang`` as parameters, with defaults provided for ``text`` and ``lang`` in ``sample_input_data``. This method generates an audio file from the provided text and language, using the TTS model. It creates an output file path in the temporary directory (``temp_dir``). A sample WAV file path (``sample_path``) is used for the TTS process.
-- The Service calls ``tts.tts_to_file`` to generate the audio file (``output.wav``) based on the provided text and language.
-
-Run ``bentoml serve`` in your project directory to start the Service. Set the environment variable ``COQUI_TTS_AGREED=1`` to agree to the terms of Coqui TTS.
-
-.. code-block:: bash
-
-    $ COQUI_TOS_AGREED=1 bentoml serve .
-
-    2024-01-30T10:06:43+0000 [INFO] [cli] Starting production HTTP BentoServer from "service:XTTS" listening on http://localhost:3000 (Press CTRL+C to quit)
-
-The server is active at `http://localhost:3000 <http://localhost:3000>`_. You can interact with it in different ways.
-
-.. tab-set::
-
-    .. tab-item:: CURL
-
-        .. code-block:: bash
-
-            curl -X 'POST' \
-                'http://localhost:3000/synthesize' \
-                -H 'accept: */*' \
-                -H 'Content-Type: application/json' \
-                -d '{
-                "text": "It took me quite a long time to develop a voice and now that I have it I am not going to be silent.",
-                "lang": "en"
-            }'
-
-    .. tab-item:: Python client
-
-        This client returns the audio file as a ``Path`` object. You can use it to access or process the file. See :doc:`/guides/clients` for details.
-
-        .. code-block:: python
-
-            import bentoml
-
-            with bentoml.SyncHTTPClient("http://localhost:3000") as client:
-                    result = client.synthesize(
-                        text="It took me quite a long time to develop a voice and now that I have it I am not going to be silent.",
-                        lang="en"
-                    )
-
-    .. tab-item:: Swagger UI
-
-        Visit `http://localhost:3000 <http://localhost:3000/>`_, scroll down to **Service APIs**, and click **Try it out**. In the **Request body** box, enter your prompt and click **Execute**.
-
-        .. image:: ../../_static/img/use-cases/audio/xtts/service-ui.png
-
-Deploy to BentoCloud
---------------------
-
-After the Service is ready, you can deploy the project to BentoCloud for better management and scalability. `Sign up <https://www.bentoml.com/>`_ for a BentoCloud account and get $10 in free credits.
-
-First, specify a configuration YAML file (``bentofile.yaml``) to define the build options for your application. It is used for packaging your application into a Bento. Here is an example file in the project:
-
-.. code-block:: yaml
-    :caption: `bentofile.yaml`
-
-    service: "service:XTTS"
-    labels:
-      owner: bentoml-team
-      project: gallery
-    include:
-      - "*.py"
-      - "female.wav"
-    python:
-      requirements_txt: requirements.txt
-    envs:
-      - name: "COQUI_TOS_AGREED"
-        value: 1
-
-:ref:`Log in to BentoCloud <bentocloud/how-tos/manage-access-token:Log in to BentoCloud using the BentoML CLI>` by running ``bentoml cloud login``, then run the following command to deploy the project.
-
-.. code-block:: bash
-
-    bentoml deploy .
-
-Once the Deployment is up and running on BentoCloud, you can access it via the exposed URL.
-
-.. image:: ../../_static/img/use-cases/audio/xtts/xtts-bentocloud.png
-
-.. note::
-
-   For custom deployment in your own infrastructure, use BentoML to :doc:`generate an OCI-compliant image</guides/containerization>`.
diff --git a/docs/source/use-cases/diffusion-models/index.rst b/docs/source/use-cases/diffusion-models/index.rst
index 11b5ad9f571..507a8bb1b48 100644
--- a/docs/source/use-cases/diffusion-models/index.rst
+++ b/docs/source/use-cases/diffusion-models/index.rst
@@ -15,18 +15,6 @@ This section provides example projects for image and video generation use cases.
 
         Deploy an image generation server with Stable Diffusion XL Turbo and BentoML.
 
-    .. grid-item-card:: :doc:`/use-cases/diffusion-models/sdxl-lcm-lora`
-        :link: /use-cases/diffusion-models/sdxl-lcm-lora
-        :link-type: doc
-
-        Deploy an image generation server with Stable Diffusion XL and Latent Consistency Model (LCM) LoRAs.
-
-    .. grid-item-card:: :doc:`/use-cases/diffusion-models/svd`
-        :link: /use-cases/diffusion-models/svd
-        :link-type: doc
-
-        Deploy a video generation server with Stable Video Diffusion and BentoML.
-
     .. grid-item-card:: :doc:`/use-cases/diffusion-models/controlnet`
         :link: /use-cases/diffusion-models/controlnet
         :link-type: doc
@@ -39,6 +27,4 @@ This section provides example projects for image and video generation use cases.
     :hidden:
 
     sdxl-turbo
-    sdxl-lcm-lora
-    svd
     controlnet
diff --git a/docs/source/use-cases/diffusion-models/sdxl-lcm-lora.rst b/docs/source/use-cases/diffusion-models/sdxl-lcm-lora.rst
deleted file mode 100644
index a86f91eec18..00000000000
--- a/docs/source/use-cases/diffusion-models/sdxl-lcm-lora.rst
+++ /dev/null
@@ -1,186 +0,0 @@
-==================================
-Stable Diffusion XL with LCM LoRAs
-==================================
-
-`Latent Consistency Models (LCM) <https://huggingface.co/papers/2310.04378>`_ offer a new approach to enhancing the efficiency of the image generation workflow, particularly when applied to models like Stable Diffusion (SD) and Stable Diffusion XL (SDXL). To deliver high-quality inference outcomes within a significantly reduced computational timeframe within just 2 to 8 steps, `LCM LoRA <https://arxiv.org/abs/2311.05556>`_ is proposed as a universal acceleration module for SD-based models.
-
-This document explains how to deploy `SDXL <https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0>`_ with `LCM LoRA weights <https://huggingface.co/latent-consistency/lcm-lora-sdxl>`_ using BentoML.
-
-All the source code in this tutorial is available in the `BentoDiffusion GitHub repository <https://github.com/bentoml/BentoDiffusion>`_.
-
-Prerequisites
--------------
-
-- Python 3.9+ and ``pip`` installed. See the `Python downloads page <https://www.python.org/downloads/>`_ to learn more.
-- You have a basic understanding of key concepts in BentoML, such as Services. We recommend you read :doc:`/get-started/quickstart` first.
-- To run this BentoML Service locally, you need a Nvidia GPU with at least 12G VRAM.
-- (Optional) We recommend you create a virtual environment for dependency isolation. See the `Conda documentation <https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html>`_ or the `Python documentation <https://docs.python.org/3/library/venv.html>`_ for details.
-
-Install dependencies
---------------------
-
-Clone the project repository and install all the dependencies.
-
-.. code-block:: bash
-
-    git clone https://github.com/bentoml/BentoDiffusion.git
-    cd BentoDiffusion/lcm
-    pip install -r requirements.txt
-
-Create a BentoML Service
-------------------------
-
-Create a BentoML :doc:`Service </guides/services>` in a ``service.py`` file to wrap the capabilities of the SDXL model with LCM LoRA weights. You can use this example file in the cloned project:
-
-.. code-block:: python
-    :caption: `service.py`
-
-    import bentoml
-    from PIL.Image import Image
-
-    model_id = "stabilityai/stable-diffusion-xl-base-1.0"
-    lcm_lora_id = "latent-consistency/lcm-lora-sdxl"
-
-    sample_prompt = "close-up photography of old man standing in the rain at night, in a street lit by lamps, leica 35mm summilux"
-
-    @bentoml.service(
-        traffic={
-            "timeout": 300,
-            "external_queue": True,
-            "concurrency": 1,
-        },
-        workers=1,
-        resources={
-            "gpu": 1,
-            "gpu_type": "nvidia-l4",
-        },
-    )
-    class LatentConsistency:
-        def __init__(self) -> None:
-            from diffusers import DiffusionPipeline, LCMScheduler
-            import torch
-
-            self.lcm_txt2img = DiffusionPipeline.from_pretrained(
-                model_id,
-                torch_dtype=torch.float16,
-                variant="fp16",
-            )
-            self.lcm_txt2img.load_lora_weights(lcm_lora_id)
-            self.lcm_txt2img.scheduler = LCMScheduler.from_config(self.lcm_txt2img.scheduler.config)
-            self.lcm_txt2img.to(device="cuda", dtype=torch.float16)
-
-        @bentoml.api
-        def txt2img(
-                self,
-                prompt: str = sample_prompt,
-                num_inference_steps: int = 4,
-                guidance_scale: float = 1.0,
-        ) -> Image:
-            image = self.lcm_txt2img(
-                prompt=prompt,
-                num_inference_steps=num_inference_steps,
-                guidance_scale=guidance_scale,
-            ).images[0]
-            return image
-
-A breakdown of the Service code:
-
-* Uses the ``@bentoml.service`` decorator to define a Service called ``LatentConsistency``. It includes service-specific :doc:`configurations </guides/configurations>` such as timeout settings, the number of workers, and resources (in this example, GPU requirements on BentoCloud).
-* Loads and configures the SDXL model, LoRA weights, and the LCM scheduler during initialization. The model is moved to a GPU device for efficient computation.
-* Exposes the ``txt2img`` method as a web API endpoint, making it callable via HTTP requests. It accepts a text prompt, the number of inference steps, and a guidance scale as inputs, all of which provide default values. These parameters control the image generation process:
-
-  - ``prompt``: The textual description based on which an image will be generated.
-  - ``num_inference_steps``: The number of steps the model takes to refine the generated image. A higher number can lead to more detailed images but requires more computation. Using 4 to 6 steps for this example should be sufficient. See this `Hugging Face blog post <https://huggingface.co/blog/lcm_lora>`_ to learn the difference among images created using different steps.
-  - ``guidance_scale``: A factor that influences how closely the generated image should adhere to the input prompt. A higher value may affect the creativity of the result.
-
-Run ``bentoml serve`` to start the BentoML server.
-
-.. code-block:: bash
-
-    $ bentoml serve service:LatentConsistency
-
-    2024-02-19T07:20:29+0000 [WARNING] [cli] Converting 'LatentConsistency' to lowercase: 'latentconsistency'.
-    2024-02-19T07:20:29+0000 [INFO] [cli] Starting production HTTP BentoServer from "service:LatentConsistency" listening on http://localhost:3000 (Press CTRL+C to quit)
-
-The server is active at `http://localhost:3000 <http://localhost:3000>`_. You can interact with it in different ways.
-
-.. tab-set::
-
-    .. tab-item:: CURL
-
-        .. code-block:: bash
-
-            curl -X 'POST' \
-                'http://localhost:3000/txt2img' \
-                -H 'accept: image/*' \
-                -H 'Content-Type: application/json' \
-                --output output.png \
-                -d '{
-                "prompt": "close-up photography of old man standing in the rain at night, in a street lit by lamps, leica 35mm summilux",
-                "num_inference_steps": 4,
-                "guidance_scale": 1
-            }'
-
-    .. tab-item:: Python client
-
-        The Service returns the image as a ``Path`` object. You can use it to access, read, or process the file. In the following example, the client saves the image to the path ``/path/to/save/image.png``.
-
-        For more information, see :doc:`/guides/clients`.
-
-        .. code-block:: python
-
-            import bentoml
-            from pathlib import Path
-
-            with bentoml.SyncHTTPClient("http://localhost:3000") as client:
-                result_path = client.txt2img(
-                    guidance_scale=1,
-                    num_inference_steps=4,
-                    prompt="close-up photography of old man standing in the rain at night, in a street lit by lamps, leica 35mm summilux",
-                )
-
-                destination_path = Path("/path/to/save/image.png")
-                result_path.rename(destination_path)
-
-    .. tab-item:: Swagger UI
-
-        Visit `http://localhost:3000 <http://localhost:3000/>`_, scroll down to **Service APIs**, specify the parameters, and click **Execute**.
-
-        .. image:: ../../_static/img/use-cases/diffusion-models/sdxl-lcm-lora/service-ui.png
-
-Expected output:
-
-.. image:: ../../_static/img/use-cases/diffusion-models/sdxl-lcm-lora/output-image.png
-
-Deploy to BentoCloud
---------------------
-
-After the Service is ready, you can deploy the project to BentoCloud for better management and scalability. `Sign up <https://www.bentoml.com/>`_ for a BentoCloud account and get $10 in free credits.
-
-First, specify a configuration YAML file (``bentofile.yaml``) to define the build options for your application. It is used for packaging your application into a Bento. Here is an example file in the project:
-
-.. code-block:: yaml
-    :caption: `bentofile.yaml`
-
-    service: "service:LatentConsistency"
-    labels:
-      owner: bentoml-team
-      project: gallery
-    include:
-    - "*.py"
-    python:
-      requirements_txt: "./requirements.txt"
-
-:ref:`Log in to BentoCloud <bentocloud/how-tos/manage-access-token:Log in to BentoCloud using the BentoML CLI>` by running ``bentoml cloud login``, then run the following command to deploy the project.
-
-.. code-block:: bash
-
-    bentoml deploy .
-
-Once the Deployment is up and running on BentoCloud, you can access it via the exposed URL.
-
-.. image:: ../../_static/img/use-cases/diffusion-models/sdxl-lcm-lora/sdxl-lcm-bentocloud.png
-
-.. note::
-
-   For custom deployment in your own infrastructure, use BentoML to :doc:`generate an OCI-compliant image</guides/containerization>`.
diff --git a/docs/source/use-cases/diffusion-models/svd.rst b/docs/source/use-cases/diffusion-models/svd.rst
deleted file mode 100644
index c727c1fd954..00000000000
--- a/docs/source/use-cases/diffusion-models/svd.rst
+++ /dev/null
@@ -1,196 +0,0 @@
-======================
-Stable Video Diffusion
-======================
-
-`Stable Video Diffusion (SVD) <https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt>`_ is a latent diffusion model developed by Stability AI. It's designed to generate short video clips from a still image. Specifically, the model can create 25 frames at a resolution of 576x1024 from a context frame of the same size.
-
-This document demonstrates how to create a video generation server with SVD and BentoML.
-
-All the source code in this tutorial is available in the `BentoDiffusion GitHub repository <https://github.com/bentoml/BentoDiffusion>`_.
-
-Prerequisites
--------------
-
-- Python 3.9+ and ``pip`` installed. See the `Python downloads page <https://www.python.org/downloads/>`_ to learn more.
-- You have a basic understanding of key concepts in BentoML, such as Services. We recommend you read :doc:`/get-started/quickstart` first.
-- To run this BentoML Service locally, you need a Nvidia GPU with at least 16G VRAM.
-- (Optional) We recommend you create a virtual environment for dependency isolation. See the `Conda documentation <https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html>`_ or the `Python documentation <https://docs.python.org/3/library/venv.html>`_ for details.
-
-Install dependencies
---------------------
-
-Clone the project repository and install all the dependencies.
-
-.. code-block:: bash
-
-    git clone https://github.com/bentoml/BentoDiffusion.git
-    cd BentoDiffusion/svd
-    pip install -r requirements.txt
-
-Create a BentoML Service
-------------------------
-
-Create a BentoML :doc:`Service </guides/services>` in a ``service.py`` file to define the serving logic of the model. You can use this example file in the cloned project:
-
-.. code-block:: python
-    :caption: `service.py`
-
-    from __future__ import annotations
-
-    import os
-    import typing as t
-    from pathlib import Path
-    from PIL.Image import Image
-
-    import bentoml
-
-    MODEL_ID = "stabilityai/stable-video-diffusion-img2vid-xt"
-
-
-    @bentoml.service(
-        traffic={
-            "timeout": 600,
-            "external_queue": True,
-            "concurrency": 1,
-        },
-        resources={
-            "gpu": 1,
-            "gpu_type": "nvidia-l4",
-        },
-    )
-    class StableDiffusionVideo:
-
-        def __init__(self) -> None:
-            import torch
-            import diffusers
-
-            self.pipe = diffusers.StableVideoDiffusionPipeline.from_pretrained(
-                MODEL_ID, torch_dtype=torch.float16, variant="fp16"
-            )
-            self.pipe.to("cuda")
-
-
-        @bentoml.api
-        def generate(
-                self, context: bentoml.Context,
-                image: Image,
-                decode_chunk_size: int = 2,
-                seed: t.Optional[int] = None,
-        ) -> t.Annotated[Path, bentoml.validators.ContentType("video/*")]:
-            import torch
-            from diffusers.utils import load_image, export_to_video
-
-            generator = torch.manual_seed(seed) if seed is not None else None
-            image = image.resize((1024, 576))
-            image = image.convert("RGB")
-            output_path = os.path.join(context.temp_dir, "output.mp4")
-
-            frames = self.pipe(
-                image, decode_chunk_size=decode_chunk_size, generator=generator,
-            ).frames[0]
-            export_to_video(frames, output_path)
-            return Path(output_path)
-
-A breakdown of the Service code:
-
-- It defines a BentoML Service ``StableDiffusionVideo`` using the ``@bentoml.service`` decorator, with specified GPU requirements for deployment on BentoCloud, and a timeout of 600 seconds. See :doc:`/guides/configurations` for details.
-- During initialization, the Service loads the model into the ``StableVideoDiffusionPipeline`` and moves it to GPU for efficient computation.
-- It defines an endpoint for video generation using the ``@bentoml.api`` decorator, taking the following parameters:
-
-  - ``image``: A base image for generating video, which will be resized and converted to RGB format for the SVD model.
-  - ``decode_chunk_size``: The number of frames that are decoded at once. A lower ``decode_chunk_size`` value means reduced memory consumption but may lead to inconsistencies between frames, such as flickering. Set this value based on your GPU resources.
-  - ``seed``:  A randomly generated number when not specified. Every time you generate a video with the same seed and input image, you will get the exact same output. This is particularly useful for generating reproducible results.
-  - ``context``: ``bentoml.Context`` lets you access information about the existing Service context. The ``temp_dir`` property provides a temporary directory to store the generated file.
-
-- ``export_to_video`` from the ``diffusers`` package converts the frames into a video file stored at ``output_path``.
-- The method returns a ``Path`` object pointing to the generated video file. The return type is annotated with a content type validator, indicating that the endpoint returns a video file.
-
-Run ``bentoml serve`` to start the BentoML server.
-
-.. code-block:: bash
-
-    $ bentoml serve service:StableDiffusionVideo
-
-    2024-02-28T01:01:17+0000 [WARNING] [cli] Converting 'StableDiffusionVideo' to lowercase: 'stablediffusionvideo'.
-    2024-02-28T01:01:18+0000 [INFO] [cli] Starting production HTTP BentoServer from "service:StableDiffusionVideo" listening on http://localhost:3000 (Press CTRL+C to quit)
-
-The server is active at `http://localhost:3000 <http://localhost:3000>`_. You can interact with it in different ways.
-
-.. tab-set::
-
-    .. tab-item:: CURL
-
-        .. code-block:: bash
-
-            curl -X 'POST' \
-                'http://localhost:3000/generate' \
-                -H 'accept: video/*' \
-                -H 'Content-Type: multipart/form-data' \
-                -F 'image=@assets/girl-image.png;type=image/png' \
-                -o generated.mp4 \
-                -F 'decode_chunk_size=2' \
-                -F 'seed=null'
-
-    .. tab-item:: Python client
-
-        This client returns the image as a ``Path`` object. You can use it to access, read, or process the file. See :doc:`/guides/clients` for details.
-
-        .. code-block:: python
-
-            import bentoml
-            from pathlib import Path
-
-            with bentoml.SyncHTTPClient("http://localhost:3000") as client:
-                result = client.generate(
-                    decode_chunk_size=2,
-                    image=Path("girl-image.png"),
-                    seed=0,
-                )
-
-    .. tab-item:: Swagger UI
-
-        Visit `http://localhost:3000 <http://localhost:3000/>`_, scroll down to **Service APIs**, click the ``generate`` endpoint, specify the parameters, and click **Execute**.
-
-        .. image:: ../../_static/img/use-cases/diffusion-models/svd/service-ui.png
-
-Expected output:
-
-.. image:: ../../_static/img/use-cases/diffusion-models/svd/girl-image-output.gif
-
-Deploy to BentoCloud
---------------------
-
-After the Service is ready, you can deploy the project to BentoCloud for better management and scalability. `Sign up <https://www.bentoml.com/>`_ for a BentoCloud account and get $10 in free credits.
-
-First, specify a configuration YAML file (``bentofile.yaml``) to define the build options for your application. It is used for packaging your application into a Bento. Here is an example file in the project:
-
-.. code-block:: yaml
-    :caption: `bentofile.yaml`
-
-    service: "service:StableDiffusionVideo"
-    labels:
-      owner: bentoml-team
-      project: gallery
-    include:
-      - "*.py"
-    python:
-      requirements_txt: "./requirements.txt"
-    docker:
-      distro: debian
-      system_packages:
-        - ffmpeg
-        - git
-
-:ref:`Log in to BentoCloud <bentocloud/how-tos/manage-access-token:Log in to BentoCloud using the BentoML CLI>` by running ``bentoml cloud login``, then run the following command to deploy the project.
-
-.. code-block:: bash
-
-    bentoml deploy .
-
-Once the Deployment is up and running on BentoCloud, you can access it via the exposed URL.
-
-.. image:: ../../_static/img/use-cases/diffusion-models/svd/svd-bentocloud.png
-
-.. note::
-
-   For custom deployment in your own infrastructure, use BentoML to :doc:`generate an OCI-compliant image</guides/containerization>`.
diff --git a/docs/source/use-cases/embeddings/clip-embeddings.rst b/docs/source/use-cases/embeddings/clip-embeddings.rst
deleted file mode 100644
index f08a57076be..00000000000
--- a/docs/source/use-cases/embeddings/clip-embeddings.rst
+++ /dev/null
@@ -1,222 +0,0 @@
-====
-CLIP
-====
-
-CLIP (Contrastive Language-Image Pre-training) is a machine learning model developed by OpenAI. It is versatile and excels in tasks like zero-shot learning, image classification, and image-text matching without needing specific training for each task. This makes it ideal for a wide range of applications, including content recommendation, image captioning, visual search, and automated content moderation.
-
-This document demonstrates how to build a CLIP application using BentoML, powered by the `clip-vit-base-patch32 <https://huggingface.co/openai/clip-vit-base-patch32>`_ model.
-
-All the source code in this tutorial is available in the `BentoClip GitHub repository <https://github.com/bentoml/BentoClip>`_.
-
-Prerequisites
--------------
-
-- Python 3.9+ and ``pip`` installed. See the `Python downloads page <https://www.python.org/downloads/>`_ to learn more.
-- You have a basic understanding of key concepts in BentoML, such as Services. We recommend you read :doc:`/get-started/quickstart` first.
-- (Optional) We recommend you create a virtual environment for dependency isolation. See the `Conda documentation <https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html>`_ or the `Python documentation <https://docs.python.org/3/library/venv.html>`_ for details.
-
-Install dependencies
---------------------
-
-Clone the project repository and install all the dependencies.
-
-.. code-block:: bash
-
-    git clone https://github.com/bentoml/BentoClip.git
-    cd BentoClip
-    pip install -r requirements.txt
-
-Create a BentoML Service
-------------------------
-
-Define a :doc:`BentoML Service </guides/services>` in a ``service.py`` file to wrap the capabilities of the CLIP model, making them accessible and easy to use in a wide range of applications. Here is an example file in the project:
-
-.. code-block:: python
-    :caption: `service.py`
-
-    import bentoml
-    from PIL.Image import Image
-    import numpy as np
-    from typing import Dict
-    from typing import List
-    from pydantic import Field
-
-    MODEL_ID = "openai/clip-vit-base-patch32"
-
-    @bentoml.service(
-        resources={
-            "cpu": 1
-            "memory": "4Gi"
-        }
-    )
-    class CLIP:
-
-        def __init__(self) -> None:
-            import torch
-            from transformers import CLIPModel, CLIPProcessor
-            self.device = "cuda" if torch.cuda.is_available() else "cpu"
-            self.model = CLIPModel.from_pretrained(MODEL_ID).to(self.device)
-            self.processor = CLIPProcessor.from_pretrained(MODEL_ID)
-            self.logit_scale = self.model.logit_scale.item() if self.model.logit_scale.item() else 4.60517
-            print("Model clip loaded", "device:", self.device)
-
-        @bentoml.api(batchable=True)
-        async def encode_image(self, items: List[Image]) -> np.ndarray:
-            '''
-            generate the 512-d embeddings of the images
-            '''
-            inputs = self.processor(images=items, return_tensors="pt", padding=True).to(self.device)
-            image_embeddings = self.model.get_image_features(**inputs)
-            return image_embeddings.cpu().detach().numpy()
-
-
-        @bentoml.api(batchable=True)
-        async def encode_text(self, items: List[str]) -> np.ndarray:
-            '''
-            generate the 512-d embeddings of the texts
-            '''
-            inputs = self.processor(text=items, return_tensors="pt", padding=True).to(self.device)
-            text_embeddings = self.model.get_text_features(**inputs)
-            return text_embeddings.cpu().detach().numpy()
-
-        @bentoml.api
-        async def rank(self, queries: List[Image], candidates : List[str] = Field(["picture of a dog", "picture of a cat"], description="list of description candidates")) -> Dict[str, List[List[float]]]:
-            '''
-            return the similarity between the query images and the candidate texts
-            '''
-            # Encode embeddings
-            query_embeds = await self.encode_image(queries)
-            candidate_embeds = await self.encode_text(candidates)
-
-            # Compute cosine similarities
-            cosine_similarities = self.cosine_similarity(query_embeds, candidate_embeds)
-            logit_scale = np.exp(self.logit_scale)
-            # Compute softmax scores
-            prob_scores = self.softmax(logit_scale * cosine_similarities)
-            return {
-                "probabilities": prob_scores.tolist(),
-                "cosine_similarities" : cosine_similarities.tolist(),
-            }
-
-        @staticmethod
-        def cosine_similarity(query_embeds, candidates_embeds):
-            # Normalize each embedding to a unit vector
-            query_embeds /= np.linalg.norm(query_embeds, axis=1, keepdims=True)
-            candidates_embeds /= np.linalg.norm(candidates_embeds, axis=1, keepdims=True)
-
-            # Compute cosine similarity
-            cosine_similarities = np.matmul(query_embeds, candidates_embeds.T)
-
-            return cosine_similarities
-
-        @staticmethod
-        def softmax(scores):
-            # Compute softmax scores (probabilities)
-            exp_scores = np.exp(
-                scores - np.max(scores, axis=-1, keepdims=True)
-            )  # Subtract max for numerical stability
-            return exp_scores / np.sum(exp_scores, axis=-1, keepdims=True)
-
-Here is a breakdown of the Service code:
-
-1. The script uses the ``@bentoml.service`` decorator to annotate the ``CLIP`` class as a BentoML Service. You can set more configurations for the Service as needed with the decorator.
-2. In the ``__init__`` method, the CLIP model and processor are loaded based on the specified ``MODEL_ID``. The model is transferred to a GPU if available, otherwise, it uses the CPU. The ``logit_scale`` is set to the model's logit scale or a default value if not available.
-3. The Service defines the following three API endpoints:
-
-   - ``encode_image``: Takes a list of images and generates 512-dimensional embeddings for them.
-   - ``encode_text``: Takes a list of text strings and generates 512-dimensional embeddings for them.
-   - ``rank``: Computes the similarity between a list of query images and candidate text descriptions. It uses the embeddings generated by the previous two endpoints to calculate cosine similarities and softmax scores, indicating how closely each text candidate matches each image.
-
-4. The Service defines the following two static methods:
-
-   - ``cosine_similarity``: Computes the cosine similarity between query embeddings and candidate embeddings. It normalizes each embedding to a unit vector before computing the similarity.
-   - ``softmax``: Computes softmax scores from the similarity scores, turning them into probabilities. This method includes a numerical stability trick by subtracting the maximum score before exponentiation.
-
-This Service can be used for the following use cases:
-
-- **Image and text embedding**: Convert images and text into embeddings, which can then be utilized for various machine learning tasks like clustering and similarity search.
-- **Image-text matching**: Find the most relevant text descriptions for a set of images, which is useful in applications like image captioning and content recommendation.
-
-Run ``bentoml serve`` in your project directory to start the Service.
-
-.. code-block:: bash
-
-    $ bentoml serve service:CLIP
-
-    2024-01-08T09:07:28+0000 [INFO] [cli] Starting production HTTP BentoServer from "service:CLIP" listening on http://localhost:3000 (Press CTRL+C to quit)
-    Model clip loaded device: cuda
-
-The server is active at `http://localhost:3000 <http://localhost:3000>`_. You can interact with it in different ways.
-
-.. tab-set::
-
-    .. tab-item:: CURL
-
-        .. code-block:: bash
-
-            curl -s \
-                -X POST \
-                -F 'items=@image.jpg' \
-                http://localhost:3000/encode_image
-
-    .. tab-item:: Python client
-
-        .. code-block:: python
-
-            import bentoml
-            from pathlib import Path
-
-            with bentoml.SyncHTTPClient("http://localhost:3000") as client:
-                result = client.encode_image(
-                    items=[
-                        Path("image.jpg"),
-                    ],
-                )
-
-    .. tab-item:: Swagger UI
-
-        Visit `http://localhost:3000 <http://localhost:3000/>`_, scroll down to **Service APIs**, and select the desired API endpoint for interaction.
-
-        .. image:: ../../_static/img/use-cases/embeddings/clip-embeddings/service-ui.png
-
-`This is the image <https://github.com/bentoml/BentoClip/blob/main/demo.jpg>`_ sent in the request. Expected output:
-
-.. code-block:: bash
-
-    [[-0.04361145198345184,0.23694464564323425,
-    ...
-    ...
-    -0.17775200307369232,0.33587712049484253]]
-
-Deploy to BentoCloud
---------------------
-
-After the Service is ready, you can deploy the project to BentoCloud for better management and scalability. `Sign up <https://www.bentoml.com/>`_ for a BentoCloud account and get $10 in free credits.
-
-First, specify a configuration YAML file (``bentofile.yaml``) to define the build options for your application. It is used for packaging your application into a Bento. Here is an example file in the project:
-
-.. code-block:: yaml
-    :caption: `bentofile.yaml`
-
-    service: "service:CLIP"
-    labels:
-      owner: bentoml-team
-      project: gallery
-    include:
-    - "*.py"
-    python:
-      requirements_txt: "./requirements.txt"
-
-:ref:`Log in to BentoCloud <bentocloud/how-tos/manage-access-token:Log in to BentoCloud using the BentoML CLI>` by running ``bentoml cloud login``, then run the following command to deploy the project.
-
-.. code-block:: bash
-
-    bentoml deploy .
-
-Once the Deployment is up and running on BentoCloud, you can access it via the exposed URL.
-
-.. image:: ../../_static/img/use-cases/embeddings/clip-embeddings/clip-bentocloud.png
-
-.. note::
-
-   For custom deployment in your own infrastructure, use BentoML to :doc:`generate an OCI-compliant image</guides/containerization>`.
diff --git a/docs/source/use-cases/embeddings/index.rst b/docs/source/use-cases/embeddings/index.rst
deleted file mode 100644
index a689611a298..00000000000
--- a/docs/source/use-cases/embeddings/index.rst
+++ /dev/null
@@ -1,31 +0,0 @@
-==========
-Embeddings
-==========
-
-This section provides example projects for embeddings.
-
-.. grid:: 1 2 2 2
-    :gutter: 3
-    :margin: 0
-    :padding: 3 4 0 0
-
-    .. grid-item-card:: :doc:`/use-cases/embeddings/sentence-transformer`
-        :link: /use-cases/embeddings/sentence-transformer
-        :link-type: doc
-
-        Deploy a sentence embedding application with BentoML.
-
-
-    .. grid-item-card:: :doc:`/use-cases/embeddings/clip-embeddings`
-        :link: /use-cases/embeddings/clip-embeddings
-        :link-type: doc
-
-        Deploy a CLIP embedding application with BentoML.
-
-.. toctree::
-    :maxdepth: 1
-    :titlesonly:
-    :hidden:
-
-    sentence-transformer
-    clip-embeddings
diff --git a/docs/source/use-cases/embeddings/sentence-transformer.rst b/docs/source/use-cases/embeddings/sentence-transformer.rst
deleted file mode 100644
index e6957522ad9..00000000000
--- a/docs/source/use-cases/embeddings/sentence-transformer.rst
+++ /dev/null
@@ -1,196 +0,0 @@
-====================
-Sentence Transformer
-====================
-
-In natural language processing (NLP), embeddings enable computers to understand the underlying semantics of language by transforming words, phrases, or even documents into numerical vectors. It covers a variety of use cases, from recommending products based on textual descriptions to translating languages and identifying relevant images through semantic understanding.
-
-This document demonstrates how to build a sentence embedding application Sentence Transformer using BentoML. It uses the `all-MiniLM-L6-v2 <https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2>`_ model, a specific kind of language model developed for generating embeddings. Due to its smaller size, all-MiniLM-L6-v2 is efficient in terms of computational resources and speed, making it an ideal choice for embedding generation in environments with limited resources.
-
-All the source code in this tutorial is available in the `BentoSentenceTransformers GitHub repository <https://github.com/bentoml/BentoSentenceTransformers>`_.
-
-Prerequisites
--------------
-
-- Python 3.9+ and ``pip`` installed. See the `Python downloads page <https://www.python.org/downloads/>`_ to learn more.
-- You have a basic understanding of key concepts in BentoML, such as Services. We recommend you read :doc:`/get-started/quickstart` first.
-- (Optional) We recommend you create a virtual environment for dependency isolation. See the `Conda documentation <https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html>`_ or the `Python documentation <https://docs.python.org/3/library/venv.html>`_ for details.
-
-Install dependencies
---------------------
-
-Clone the project repository and install all the dependencies.
-
-.. code-block:: bash
-
-    git clone https://github.com/bentoml/BentoSentenceTransformers.git
-    cd BentoSentenceTransformers
-    pip install -r requirements.txt
-
-Create a BentoML Service
-------------------------
-
-Define a :doc:`BentoML Service </guides/services>` to use a model for generating sentence embeddings. The example ``service.py`` file in this project uses ``sentence-transformers/all-MiniLM-L6-v2``:
-
-.. code-block:: python
-    :caption: `service.py`
-
-    from __future__ import annotations
-
-    import typing as t
-
-    import numpy as np
-    import bentoml
-
-
-    SAMPLE_SENTENCES = [
-        "The sun dips below the horizon, painting the sky orange.",
-        "A gentle breeze whispers through the autumn leaves.",
-        "The moon casts a silver glow on the tranquil lake.",
-        "A solitary lighthouse stands guard on the rocky shore.",
-        "The city awakens as morning light filters through the streets.",
-        "Stars twinkle in the velvety blanket of the night sky.",
-        "The aroma of fresh coffee fills the cozy kitchen.",
-        "A curious kitten pounces on a fluttering butterfly."
-    ]
-
-    MODEL_ID = "sentence-transformers/all-MiniLM-L6-v2"
-
-    @bentoml.service(
-        traffic={
-            "timeout": 60
-            "concurrency": 32,
-        },
-        resources={
-            "gpu": "1",
-            "gpu_type": "nvidia-tesla-t4",
-        },
-    )
-    class SentenceTransformers:
-
-        def __init__(self) -> None:
-
-            import torch
-            from sentence_transformers import SentenceTransformer, models
-
-            # Load model and tokenizer
-            self.device = "cuda" if torch.cuda.is_available() else "cpu"
-            # define layers
-            first_layer = SentenceTransformer(MODEL_ID)
-            pooling_model = models.Pooling(first_layer.get_sentence_embedding_dimension())
-            self.model = SentenceTransformer(modules=[first_layer, pooling_model])
-            print("Model loaded", "device:", self.device)
-
-
-        @bentoml.api(batchable=True)
-        def encode(
-            self,
-            sentences: t.List[str] = SAMPLE_SENTENCES,
-        ) -> np.ndarray:
-            print("encoding sentences:", len(sentences))
-            # Tokenize sentences
-            sentence_embeddings= self.model.encode(sentences)
-            return sentence_embeddings
-
-Here is a breakdown of the Service code:
-
-- The script uses the ``@bentoml.service`` decorator to annotate the ``SentenceTransformers`` class as a BentoML Service with timeout and memory specified. You can set more configurations as needed.
-- ``__init__`` loads the model and tokenizer when an instance of the ``SentenceTransformers`` class is created. The model is loaded onto the appropriate device (GPU if available, otherwise CPU).
-- The model consists of two layers: The first layer is the pre-trained MiniLM model (``all-MiniLM-L6-v2``), and the second layer is a pooling layer to aggregate word embeddings into sentence embeddings.
-- The ``encode`` method is defined as a BentoML API endpoint. It takes a list of sentences as input and uses the sentence transformer model to generate sentence embeddings. The returned embeddings are NumPy arrays.
-
-Run ``bentoml serve`` in your project directory to start the Service.
-
-.. code-block:: bash
-
-    $ bentoml serve service:SentenceTransformers
-
-    2023-12-27T07:49:25+0000 [WARNING] [cli] Converting 'all-MiniLM-L6-v2' to lowercase: 'all-minilm-l6-v2'.
-    2023-12-27T07:49:26+0000 [INFO] [cli] Starting production HTTP BentoServer from "service:SentenceTransformers" listening on http://localhost:3000 (Press CTRL+C to quit)
-    Model loaded device: cuda
-
-The server is active at `http://localhost:3000 <http://localhost:3000>`_. You can interact with it in different ways.
-
-.. tab-set::
-
-    .. tab-item:: CURL
-
-        .. code-block:: bash
-
-            curl -X 'POST' \
-                'http://localhost:3000/encode' \
-                -H 'accept: application/json' \
-                -H 'Content-Type: application/json' \
-                -d '{
-                "sentences": [
-                    "hello world"
-                ]
-            }'
-
-    .. tab-item:: Python client
-
-        .. code-block:: python
-
-            import bentoml
-
-            with bentoml.SyncHTTPClient("http://localhost:3000") as client:
-                result = client.encode(
-                    sentences=[
-                            "hello world"
-                    ],
-                )
-
-    .. tab-item:: Swagger UI
-
-        Visit `http://localhost:3000 <http://localhost:3000/>`_, scroll down to **Service APIs**, and click **Try it out**. In the **Request body** box, enter your prompt and click **Execute**.
-
-        .. image:: ../../_static/img/use-cases/embeddings/sentence-embeddings/service-ui.png
-
-Expected output:
-
-.. code-block:: bash
-
-    [
-      [
-        -0.19744610786437988,
-        0.17766520380973816,
-        ......
-        0.2229892462491989,
-        0.17298966646194458
-      ]
-    ]
-
-Deploy to BentoCloud
---------------------
-
-After the Service is ready, you can deploy the project to BentoCloud for better management and scalability. `Sign up <https://www.bentoml.com/>`_ for a BentoCloud account and get $10 in free credits.
-
-First, specify a configuration YAML file (``bentofile.yaml``) to define the build options for your application. It is used for packaging your application into a Bento. Here is an example file in the project:
-
-.. code-block:: yaml
-    :caption: `bentofile.yaml`
-
-    service: "service:SentenceTransformers"
-    labels:
-      owner: bentoml-team
-      project: gallery
-    include:
-    - "*.py"
-    python:
-      requirements_txt: "./requirements.txt"
-    docker:
-      env:
-        NORMALIZE : "True"
-
-:ref:`Log in to BentoCloud <bentocloud/how-tos/manage-access-token:Log in to BentoCloud using the BentoML CLI>` by running ``bentoml cloud login``, then run the following command to deploy the project.
-
-.. code-block:: bash
-
-    bentoml deploy .
-
-Once the Deployment is up and running on BentoCloud, you can access it via the exposed URL.
-
-.. image:: ../../_static/img/use-cases/embeddings/sentence-embeddings/sentence-embedding-bentocloud.png
-
-.. note::
-
-   For custom deployment in your own infrastructure, use BentoML to :doc:`generate an OCI-compliant image</guides/containerization>`.
diff --git a/docs/source/use-cases/index.rst b/docs/source/use-cases/index.rst
index a29bd20f2b4..17625e48717 100644
--- a/docs/source/use-cases/index.rst
+++ b/docs/source/use-cases/index.rst
@@ -21,30 +21,18 @@ This section provides a variety of example projects for you to learn how BentoML
 
         Deploy diffusion models with BentoML.
 
-    .. grid-item-card:: :doc:`/use-cases/embeddings/index`
-        :link: /use-cases/embeddings/index
-        :link-type: doc
-
-        Deploy embedding applications with BentoML.
-
-    .. grid-item-card:: :doc:`/use-cases/audio/index`
-        :link: /use-cases/audio/index
-        :link-type: doc
-
-        Deploy audio applications with BentoML.
-
-    .. grid-item-card:: :doc:`/use-cases/multimodality/index`
-        :link: /use-cases/multimodality/index
-        :link-type: doc
-
-        Deploy multimodal applications with BentoML.
-
     .. grid-item-card:: :doc:`/use-cases/custom-models/index`
         :link: /use-cases/custom-models/index
         :link-type: doc
 
         Deploy custom models with BentoML.
 
+    .. grid-item-card:: :doc:`/use-cases/more-examples/index`
+        :link: /use-cases/more-examples/index
+        :link-type: doc
+
+        More example projects to explore BentoML.
+
 
 .. toctree::
     :maxdepth: 1
@@ -53,7 +41,5 @@ This section provides a variety of example projects for you to learn how BentoML
 
     large-language-models/index
     diffusion-models/index
-    embeddings/index
-    audio/index
-    multimodality/index
     custom-models/index
+    more-examples/index
diff --git a/docs/source/use-cases/more-examples/index.rst b/docs/source/use-cases/more-examples/index.rst
new file mode 100644
index 00000000000..8910a515953
--- /dev/null
+++ b/docs/source/use-cases/more-examples/index.rst
@@ -0,0 +1,23 @@
+=============
+More examples
+=============
+
+More examples to deploy AI systems with BentoML.
+
+.. grid:: 1 2 2 2
+    :gutter: 3
+    :margin: 0
+    :padding: 3 4 0 0
+
+    .. grid-item-card:: :doc:`/use-cases/more-examples/inference-apis`
+        :link: /use-cases/more-examples/inference-apis
+        :link-type: doc
+
+        An index of examples to deploy AI systems with BentoML.
+
+.. toctree::
+    :maxdepth: 1
+    :titlesonly:
+    :hidden:
+
+    inference-apis
diff --git a/docs/source/use-cases/more-examples/inference-apis.rst b/docs/source/use-cases/more-examples/inference-apis.rst
new file mode 100644
index 00000000000..87958430b0e
--- /dev/null
+++ b/docs/source/use-cases/more-examples/inference-apis.rst
@@ -0,0 +1,25 @@
+==============
+Inference APIs
+==============
+
+Check out the following examples to deploy different inference APIs with BentoML.
+
+- `BentoVLLM <https://github.com/bentoml/BentoVLLM>`_ - Accelerate your model inference and improve serving throughput by using vLLM as your LLM backend.
+- `BentoDiffusion <https://github.com/bentoml/BentoDiffusion>`_ - Self-host diffusion models with BentoML to generate custom images and video clips.
+- `BentoXTTS <https://github.com/bentoml/BentoXTTS>`_ - Convert text to speech based on your custom audio data.
+- `BentoWhisperX <https://github.com/bentoml/BentoWhisperX>`_ - Convert spoken words into text for AI scenarios like virtual assistants, voice-controlled devices, and automated transcription services.
+- `Sentence Transformer <https://github.com/bentoml/BentoSentenceTransformers>`_ - Transform text into numerical vectors for a variety of natural language processing (NLP) tasks.
+- `BentoCLIP <https://github.com/bentoml/BentoClip>`_ - Build a CLIP (Contrastive Language-Image Pre-training) application for tasks like zero-shot learning, image classification, and image-text matching.
+- `BentoBLIP <https://github.com/bentoml/BentoBlip>`_ - Leverage BLIP (Bootstrapping Language Image Pre-training) to improve the way AI models understand and process the relationship between images and textual descriptions.
+- `BentoBark <https://github.com/bentoml/BentoBark>`_ - Generate highly realistic audio like music, background noise and simple sound effects with Bark.
+- `BentoYolo <https://github.com/bentoml/BentoYolo>`_ - Build an object detection inference API server with YOLO.
+- `RAG <https://github.com/bentoml/rag-tutorials>`_ - Self-host a private RAG app using custom embedding and language models.
+- `BentoChatTTS <https://github.com/bentoml/BentoChatTTS>`_ - Deploy a text-to-speech model ChatTTS for dialogue scenarios like chatbots and virtual assistants.
+- `BentoMoirai <https://github.com/bentoml/BentoMoirai/>`_ - Create a forecasting inference API for time-series data.
+- `BentoResnet <https://github.com/bentoml/BentoResnet>`_ - Build an image classification inference API server with ResNet.
+- `BentoFunctionCalling <https://github.com/bentoml/BentoFunctionCalling/>`_ - Build LLM function calling capabilities with BentoML.
+- `BentoShield <https://github.com/bentoml/BentoShield/>`_ - Build an AI assistant using BentoML and ShieldGemma to evaluate the safety of prompts and filter out harmful content.
+- `BentoLangGraph <https://github.com/bentoml/BentoLangGraph>`_ - Deploy a LangGraph AI agent application with BentoML.
+- `BentoCrewAI <https://github.com/bentoml/BentoCrewAI>`_ - Deploy a CrewAI multi-agent application with BentoML.
+
+See `bentoml/examples <https://github.com/bentoml/BentoML/tree/main/examples>`_ for more examples.
\ No newline at end of file
diff --git a/docs/source/use-cases/multimodality/blip.rst b/docs/source/use-cases/multimodality/blip.rst
deleted file mode 100644
index 6c8e1b89772..00000000000
--- a/docs/source/use-cases/multimodality/blip.rst
+++ /dev/null
@@ -1,155 +0,0 @@
-====
-BLIP
-====
-
-BLIP (Bootstrapping Language Image Pre-training) is a technique to improve the way AI models understand and process the relationship between images and textual descriptions. It has a variety of use cases in the AI field, particularly in applications that require a nuanced understanding of both visual and textual data, such as image captioning, visual question answering (VQA), and image-text matching. This document demonstrates how to build an image captioning application on top of a BLIP model with BentoML.
-
-All the source code in this tutorial is available in the `BentoBlip GitHub repository <https://github.com/bentoml/BentoBlip>`_.
-
-Prerequisites
--------------
-
-- Python 3.9+ and ``pip`` installed. See the `Python downloads page <https://www.python.org/downloads/>`_ to learn more.
-- You have a basic understanding of key concepts in BentoML, such as Services. We recommend you read :doc:`/get-started/quickstart` first.
-- (Optional) We recommend you create a virtual environment for dependency isolation. See the `Conda documentation <https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html>`_ or the `Python documentation <https://docs.python.org/3/library/venv.html>`_ for details.
-
-Install dependencies
---------------------
-
-Clone the project repository and install all the dependencies.
-
-.. code-block:: bash
-
-    git clone https://github.com/bentoml/BentoBlip.git
-    cd BentoBlip
-    pip install -r requirements.txt
-
-Create a BentoML Service
-------------------------
-
-Define a :doc:`BentoML Service </guides/services>` to customize the serving logic. The example ``service.py`` file in the project uses the BLIP model ``Salesforce/blip-image-captioning-large``, which is capable of generating captions for given images, optionally using additional text input for context. You can choose another model based on your need.
-
-.. code-block:: python
-    :caption: `service.py`
-
-    from __future__ import annotations
-
-    import typing as t
-
-    import bentoml
-    from PIL.Image import Image
-
-    MODEL_ID = "Salesforce/blip-image-captioning-large"
-
-    @bentoml.service(
-        resources={
-            "cpu" : 1,
-            "memory" : "4Gi"
-        }
-    )
-    class BlipImageCaptioning:
-
-        def __init__(self) -> None:
-            import torch
-            from transformers import BlipProcessor, BlipForConditionalGeneration
-            self.device = "cuda" if torch.cuda.is_available() else "cpu"
-            self.model = BlipForConditionalGeneration.from_pretrained(MODEL_ID).to(self.device)
-            self.processor = BlipProcessor.from_pretrained(MODEL_ID)
-            print("Model blip loaded", "device:", self.device)
-
-        @bentoml.api
-        async def generate(self, img: Image, txt: t.Optional[str] = None) -> str:
-            if txt:
-                inputs = self.processor(img, txt, return_tensors="pt").to(self.device)
-            else:
-                inputs = self.processor(img, return_tensors="pt").to(self.device)
-
-            out = self.model.generate(**inputs, max_new_tokens=100, min_new_tokens=20)
-            return self.processor.decode(out[0], skip_special_tokens=True)
-
-Here is a breakdown of the Service code:
-
-- The ``@bentoml.service`` decorator defines the ``BlipImageCaptioning`` class as a BentoML Service, specifying that it requires ``4Gi`` of memory. You can customize the Service configurations if necessary.
-- The Service loads the BLIP model based on ``MODEL_ID`` and moves the model to a GPU if available, otherwise it uses the CPU.
-- The ``generate`` method is exposed as an asynchronous API endpoint. It accepts an image (``img``) and an optional ``txt`` parameter as inputs. If text is provided, the model generates a caption considering both the image and text context; otherwise, it generates a caption based only on the image. The generated tokens are then decoded into a human-readable caption.
-
-Run ``bentoml serve`` in your project directory to start the Service.
-
-.. code-block:: bash
-
-    $ bentoml serve service:BlipImageCaptioning
-
-    2024-01-02T08:32:35+0000 [INFO] [cli] Starting production HTTP BentoServer from "service:BlipImageCaptioning" listening on http://localhost:3000 (Press CTRL+C to quit)
-    Model blip loaded device: cuda
-
-The server is active at http://localhost:3000. You can interact with it in different ways.
-
-.. tab-set::
-
-    .. tab-item:: CURL
-
-        .. code-block:: bash
-
-            curl -s -X POST \
-                -F txt='unicorn at sunset' \
-                -F 'img=@image.jpg' \
-                http://localhost:3000/generate
-
-    .. tab-item:: Python client
-
-        .. code-block:: python
-
-            import bentoml
-            from pathlib import Path
-
-            with bentoml.SyncHTTPClient("http://localhost:3000") as client:
-                result = client.generate(
-                    img=Path("image.jpg"),
-                    txt="unicorn at sunset",
-                )
-
-    .. tab-item:: Swagger UI
-
-        Visit `http://localhost:3000 <http://localhost:3000/>`_, scroll down to **Service APIs**, and click **Try it out**. In the **Request body** box, select an image, optionally enter your prompt text and click **Execute**.
-
-        .. image:: ../_static/img/use-cases/blip/service-ui.png
-
-`This is the image <https://github.com/bentoml/BentoBlip/blob/main/demo.jpg>`_ sent in the request. Expected output:
-
-.. code-block:: bash
-
-    unicorn at sunset by a pond with a beautiful landscape in the background, with a reflection of the sun in the water
-
-Deploy to BentoCloud
---------------------
-
-After the Service is ready, you can deploy the project to BentoCloud for better management and scalability. `Sign up <https://www.bentoml.com/>`_ for a BentoCloud account and get $10 in free credits.
-
-First, specify a configuration YAML file (``bentofile.yaml``) to define the build options for your application. It is used for packaging your application into a Bento. Here is an example file in the project:
-
-.. code-block:: yaml
-    :caption: `bentofile.yaml`
-
-    service: "service:BlipImageCaptioning"
-    labels:
-      owner: bentoml-team
-      project: gallery
-    include:
-    - "*.py"
-    - "demo.jpeg"
-    python:
-      requirements_txt: "./requirements.txt"
-
-:ref:`Log in to BentoCloud <bentocloud/how-tos/manage-access-token:Log in to BentoCloud using the BentoML CLI>` by running ``bentoml cloud login``, then run the following command to deploy the project.
-
-.. code-block:: bash
-
-    bentoml deploy .
-
-Once the Deployment is up and running on BentoCloud, you can access it via the exposed URL.
-
-.. image:: ../../_static/img/use-cases/blip/blip-bentocloud.png
-
-.. note::
-
-   For custom deployment in your own infrastructure, use BentoML to :doc:`generate an OCI-compliant image</guides/containerization>`.
diff --git a/docs/source/use-cases/multimodality/index.rst b/docs/source/use-cases/multimodality/index.rst
deleted file mode 100644
index 4b3aedbd5b7..00000000000
--- a/docs/source/use-cases/multimodality/index.rst
+++ /dev/null
@@ -1,23 +0,0 @@
-=============
-Multimodality
-=============
-
-This section provides example projects for deploying multimodal AI models.
-
-.. grid:: 1 2 2 2
-    :gutter: 3
-    :margin: 0
-    :padding: 3 4 0 0
-
-    .. grid-item-card:: :doc:`/use-cases/multimodality/blip`
-        :link: /use-cases/multimodality/blip
-        :link-type: doc
-
-        Deploy a BLIP (Bootstrapping Language Image Pre-training) application with BentoML.
-
-.. toctree::
-    :maxdepth: 1
-    :titlesonly:
-    :hidden:
-
-    blip

From 46838f450c43c6cb991c4feffc5d9f5a16f9692a Mon Sep 17 00:00:00 2001
From: "pre-commit-ci[bot]"
 <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Date: Tue, 8 Oct 2024 02:43:45 +0000
Subject: [PATCH 2/2] ci: auto fixes from pre-commit.ci

For more information, see https://pre-commit.ci
---
 docs/source/use-cases/more-examples/inference-apis.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/source/use-cases/more-examples/inference-apis.rst b/docs/source/use-cases/more-examples/inference-apis.rst
index 87958430b0e..1bfdb3b6a36 100644
--- a/docs/source/use-cases/more-examples/inference-apis.rst
+++ b/docs/source/use-cases/more-examples/inference-apis.rst
@@ -22,4 +22,4 @@ Check out the following examples to deploy different inference APIs with BentoML
 - `BentoLangGraph <https://github.com/bentoml/BentoLangGraph>`_ - Deploy a LangGraph AI agent application with BentoML.
 - `BentoCrewAI <https://github.com/bentoml/BentoCrewAI>`_ - Deploy a CrewAI multi-agent application with BentoML.
 
-See `bentoml/examples <https://github.com/bentoml/BentoML/tree/main/examples>`_ for more examples.
\ No newline at end of file
+See `bentoml/examples <https://github.com/bentoml/BentoML/tree/main/examples>`_ for more examples.