diff --git a/docs/conf.py b/docs/conf.py
index b183f8ac7..6980387e2 100644
--- a/docs/conf.py
+++ b/docs/conf.py
@@ -80,6 +80,8 @@
html_show_sourcelink = False
html_favicon = '_static/favicon.png'
+intersphinx_mapping = {'docarray': ('https://docarray.jina.ai/', None), 'finetuner': ('https://finetuner.jina.ai/', None)}
+
latex_documents = [(master_doc, f'{slug}.tex', project, author, 'manual')]
man_pages = [(master_doc, slug, project, [author], 1)]
texinfo_documents = [
diff --git a/docs/index.md b/docs/index.md
index 6887fb6f2..7d8183d53 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -178,7 +178,6 @@ It means the client and the server are now connected. Well done!
user-guides/client
user-guides/server
user-guides/faq
-
```
```{toctree}
diff --git a/docs/user-guides/finetuner.md b/docs/user-guides/finetuner.md
new file mode 100644
index 000000000..2962c0730
--- /dev/null
+++ b/docs/user-guides/finetuner.md
@@ -0,0 +1,187 @@
+(Finetuner)=
+# Fine-tune Models
+
+Although CLIP-as-service has provided you a list of pre-trained models, you can also fine-tune your models.
+This guide will show you how to use [Finetuner](https://finetuner.jina.ai) to fine-tune models and use them in CLIP-as-service.
+
+For installation and basic usage of Finetuner, please refer to [Finetuner documentation](https://finetuner.jina.ai).
+You can also [learn more details about fine-tuning CLIP](https://finetuner.jina.ai/tasks/text-to-image/).
+
+## Prepare Training Data
+
+Finetuner accepts training data and evaluation data in the form of {class}`~docarray.array.document.DocumentArray`.
+The training data for CLIP is a list of (text, image) pairs.
+Each pair is stored in a {class}`~docarray.document.Document` which wraps two [`chunks`](https://docarray.jina.ai/fundamentals/document/nested/) with `image` and `text` modality respectively.
+You can push the resulting {class}`~docarray.array.document.DocumentArray` to the cloud using the {meth}`~docarray.array.document.DocumentArray.push` method.
+
+We use [fashion captioning dataset](https://github.com/xuewyang/Fashion_Captioning) as a sample dataset in this tutorial.
+The following are examples of descriptions and image urls from the dataset.
+We also include a preview of each image.
+
+| Description | Image URL | Preview |
+|---------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------|
+| subtly futuristic and edgy this liquid metal cuff bracelet is shaped from sculptural rectangular link | [https://n.nordstrommedia.com/id/sr3/
58d1a13f-b6b6-4e68-b2ff-3a3af47c422e.jpeg](https://n.nordstrommedia.com/id/sr3/58d1a13f-b6b6-4e68-b2ff-3a3af47c422e.jpeg) | |
+| high quality leather construction defines a hearty boot one-piece on a tough lug sole | [https://n.nordstrommedia.com/id/sr3/
21e7a67c-0a54-4d09-a4a4-6a0e0840540b.jpeg](https://n.nordstrommedia.com/id/sr3/21e7a67c-0a54-4d09-a4a4-6a0e0840540b.jpeg) | |
+| this shimmering tricot knit tote is traced with decorative whipstitching and diamond cut chain the two hallmark of the falabella line | [https://n.nordstrommedia.com/id/sr3/
1d8dd635-6342-444d-a1d3-4f91a9cf222b.jpeg](https://n.nordstrommedia.com/id/sr3/1d8dd635-6342-444d-a1d3-4f91a9cf222b.jpeg) | |
+| ... | ... | ... |
+
+You can use the following script to transform the first three entries of the dataset to a {class}`~docarray.array.document.DocumentArray` and push it to the cloud using the name `fashion-sample`.
+
+```python
+from docarray import Document, DocumentArray
+
+train_da = DocumentArray(
+ [
+ Document(
+ chunks=[
+ Document(
+ content='subtly futuristic and edgy this liquid metal cuff bracelet is shaped from sculptural rectangular link',
+ modality='text',
+ ),
+ Document(
+ uri='https://n.nordstrommedia.com/id/sr3/58d1a13f-b6b6-4e68-b2ff-3a3af47c422e.jpeg',
+ modality='image',
+ ),
+ ],
+ ),
+ Document(
+ chunks=[
+ Document(
+ content='high quality leather construction defines a hearty boot one-piece on a tough lug sole',
+ modality='text',
+ ),
+ Document(
+ uri='https://n.nordstrommedia.com/id/sr3/21e7a67c-0a54-4d09-a4a4-6a0e0840540b.jpeg',
+ modality='image',
+ ),
+ ],
+ ),
+ Document(
+ chunks=[
+ Document(
+ content='this shimmering tricot knit tote is traced with decorative whipstitching and diamond cut chain the two hallmark of the falabella line',
+ modality='text',
+ ),
+ Document(
+ uri='https://n.nordstrommedia.com/id/sr3/1d8dd635-6342-444d-a1d3-4f91a9cf222b.jpeg',
+ modality='image',
+ ),
+ ],
+ ),
+ ]
+)
+train_da.push('fashion-sample')
+```
+
+The full dataset has been converted to `clip-fashion-train-data` and `clip-fashion-eval-data` and pushed to the cloud which can be directly used in Finetuner.
+
+## Start Finetuner
+
+You may now create and run a fine-tuning job after login to Jina ecosystem.
+
+```python
+import finetuner
+
+finetuner.login()
+run = finetuner.fit(
+ model='openai/clip-vit-base-patch32',
+ run_name='clip-fashion',
+ train_data='clip-fashion-train-data',
+ eval_data='clip-fashion-eval-data', # optional
+ epochs=5,
+ learning_rate=1e-5,
+ loss='CLIPLoss',
+ cpu=False,
+)
+```
+
+After the job started, you may use {meth}`~finetuner.run.Run.status` to check the status of the job.
+
+```python
+import finetuner
+
+finetuner.login()
+run = finetuner.get_run('clip-fashion')
+print(run.status())
+```
+
+When the status is `FINISHED`, you can download the tuned model to your local machine.
+
+```python
+import finetuner
+
+finetuner.login()
+run = finetuner.get_run('clip-fashion')
+run.save_artifact('clip-model')
+```
+
+You should now get a zip file containing the tuned model named `clip-fashion.zip` under the folder `clip-model`.
+
+## Use the Model
+
+After unzipping the model you get from the previous step, a folder with the following structure will be generated:
+
+```text
+.
+└── clip-fashion/
+ ├── config.yml
+ ├── metadata.yml
+ ├── metrics.yml
+ └── models/
+ ├── clip-text/
+ │ ├── metadata.yml
+ │ └── model.onnx
+ ├── clip-vision/
+ │ ├── metadata.yml
+ │ └── model.onnx
+ └── input-map.yml
+```
+
+Since the tuned model generated from Finetuner contains richer information such as metadata and config, we now transform it to simpler structure used by CLIP-as-service.
+
+* Firstly, create a new folder named `clip-fashion-cas` or name of your choice. This will be the storage of the models to use in CLIP-as-service.
+
+* Secondly, copy the textual model `clip-fashion/models/clip-text/model.onnx` into the folder `clip-fashion-cas` and rename the model to `textual.onnx`.
+
+* Similarly, copy the visual model `clip-fashion/models/clip-vision/model.onnx` into the folder `clip-fashion-cas` and rename the model to `visual.onnx`.
+
+This is the expected structure of `clip-fashion-cas`:
+
+```text
+.
+└── clip-fashion-cas/
+ ├── textual.onnx
+ └── visual.onnx
+```
+
+In order to use the fine-tuned model, create a custom YAML file `finetuned_clip.yml` like below. Learn more about [Flow YAML configuration](https://docs.jina.ai/fundamentals/flow/yaml-spec/) and [`clip_server` YAML configuration](https://clip-as-service.jina.ai/user-guides/server/#yaml-config).
+
+```yaml
+jtype: Flow
+version: '1'
+with:
+ port: 51000
+executors:
+ - name: clip_o
+ uses:
+ jtype: CLIPEncoder
+ metas:
+ py_modules:
+ - clip_server.executors.clip_onnx
+ with:
+ name: ViT-B/32
+ model_path: 'clip-fashion-cas' # path to clip-fashion-cas
+ replicas: 1
+```
+
+```{warning}
+Note that Finetuner only support ViT-B/32 CLIP model currently. The model name should match the fine-tuned model, or you will get incorrect output.
+```
+
+You can now start the `clip_server` using fine-tuned model to get a performance boost:
+
+```bash
+python -m clip_server finetuned_clip.yml
+```
+
+That's it, enjoy 🚀
diff --git a/docs/user-guides/server.md b/docs/user-guides/server.md
index 740ac43b3..b1069e3b1 100644
--- a/docs/user-guides/server.md
+++ b/docs/user-guides/server.md
@@ -75,6 +75,23 @@ Open AI has released 9 models so far. `ViT-B/32` is used as default model in all
| ViT-L/14 | ✅ | ✅ | ❌ | 768 | 933 | 3.66 | 2.04 |
| ViT-L/14@336px | ✅ | ✅ | ❌ | 768 | 934 | 3.74 | 2.23 |
+### Use custom model
+
+You can also use your own model in ONNX runtime by specifying the model name and the path to model directory in YAML file.
+The model directory should have the same structure as below:
+
+```text
+.
+└── custom-model/
+ ├── textual.onnx
+ └── visual.onnx
+```
+
+One may wonder how to produce the model as described above.
+Fortunately, you can simply use the [Finetuner](https://finetuner.jina.ai) to fine-tune your model based on custom dataset.
+[Finetuner](https://finetuner.jina.ai) is a cloud service that makes fine-tuning simple and fast.
+Moving the process into the cloud, [Finetuner](https://finetuner.jina.ai) handles all related complexity and infrastructure, making models performant and production ready.
+{ref}`Click here for detail instructions`.
## YAML config
@@ -230,11 +247,11 @@ executors:
For all backends, you can set the following parameters via `with`:
-| Parameter | Description |
-|-----------|--------------------------------------------------------------------------------------------------------------------------------|
-| `name` | Model weights, default is `ViT-B/32`. Support all OpenAI released pretrained models. |
+| Parameter | Description |
+|-------------------------|--------------------------------------------------------------------------------------------------------------------------------|
+| `name` | Model weights, default is `ViT-B/32`. Support all OpenAI released pretrained models. |
| `num_worker_preprocess` | The number of CPU workers for image & text prerpocessing, default 4. |
-| `minibatch_size` | The size of a minibatch for CPU preprocessing and GPU encoding, default 64. Reduce the size of it if you encounter OOM on GPU. |
+| `minibatch_size` | The size of a minibatch for CPU preprocessing and GPU encoding, default 64. Reduce the size of it if you encounter OOM on GPU. |
There are also runtime-specific parameters listed below:
@@ -252,6 +269,7 @@ There are also runtime-specific parameters listed below:
| Parameter | Description |
|-----------|--------------------------------------------------------------------------------------------------------------------------------|
| `device` | `cuda` or `cpu`. Default is `None` means auto-detect.
+| `model_path` | The path to custom CLIP model, default `None`. |
````
@@ -278,6 +296,33 @@ executors:
- executors/clip_torch.py
```
+To use custom model in ONNX runtime, one can do:
+
+```{code-block} yaml
+---
+emphasize-lines: 9-11
+---
+
+jtype: Flow
+version: '1'
+with:
+ port: 51000
+executors:
+ - name: clip_o
+ uses:
+ jtype: CLIPEncoder
+ with:
+ name: ViT-B/32
+ model_path: 'custom-model'
+ metas:
+ py_modules:
+ - executors/clip_onnx.py
+```
+
+```{warning}
+The model name should match the fine-tuned model, or you will get incorrect output.
+```
+
### Executor config
The full list of configs for Executor can be found via `jina executor --help`. The most important one is probably `replicas`, which **allows you to run multiple CLIP models in parallel** to achieve horizontal scaling.