Skip to content

Commit

Permalink
feat: upload torch executor (#723)
Browse files Browse the repository at this point in the history
* feat: add hub push runner

* fix: hub push yaml

* fix: hub push yaml

* fix: debug script

* fix: debug script

* fix: debug script

* fix: debug script

* fix: debug script

* fix: debug script

* fix: debug script

* fix: debug script

* fix: debug script

* fix: comment manifest

* fix: revert manifest

* fix: use relative import

* fix: change base folder

* fix: hub push

* fix: bumb jina version

* fix: get requirments.txt

* fix: turnon workflow on PR

* fix: update dockerfile

* fix: error

* fix: executor name

* fix: use jinahub auth token

* fix: test torch upload

* fix: docker

* fix: upload gpu executor

* fix: gpu tag

* fix: gpu tag

* feat: upload onnx executor

* fix: debug onnx upload

* fix: debug onnx upload

* fix: minor revision

* fix: add torch exec readme

* fix: add onnx exec readme

* chore: update exec readme

* fix: update readme

* chore: update readme

* chore: onnx readme

* chore: update readme

* docs: fix batch_size

* docs: fix batch_size

* chore: updates

* chore: upload pytorch and onnx runtime based executors

* fix: use relative imports

Co-authored-by: numb3r3 <[email protected]>
  • Loading branch information
ZiniuYu and numb3r3 authored Jun 15, 2022
1 parent 1869e61 commit 4d069a8
Show file tree
Hide file tree
Showing 15 changed files with 614 additions and 42 deletions.
177 changes: 177 additions & 0 deletions .github/README-exec/onnx.readme.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,177 @@
# CLIPOnnxEncoder

**CLIPOnnxEncoder** is the executor implemented in [clip-as-service](https://github.com/jina-ai/clip-as-service).
It serves OpenAI released [CLIP](https://github.com/openai/CLIP) models with ONNX runtime (🚀 **3x** speed up).
The introduction of the CLIP model [can be found here](https://openai.com/blog/clip/).

- 🔀 **Automatic**: Auto-detect image and text documents depending on their content.
-**Efficiency**: Faster CLIP model inference on CPU and GPU via ONNX runtime.
- 📈 **Observability**: Monitoring the serving via Prometheus and Grafana (see [Usage Guide](https://docs.jina.ai/how-to/monitoring/#deploying-locally)).


## Model support

Open AI has released 9 models so far. `ViT-B/32` is used as default model. Please also note that different model give **different size of output dimensions**.

| Model | ONNX | Output dimension |
|----------------|-----| --- |
| RN50 || 1024 |
| RN101 || 512 |
| RN50x4 || 640 |
| RN50x16 || 768 |
| RN50x64 || 1024 |
| ViT-B/32 || 512 |
| ViT-B/16 || 512 |
| ViT-L/14 || 768 |
| ViT-L/14@336px || 768 |

## Usage

### Use in Jina Flow

- **via Docker image (recommended)**

```python
from jina import Flow
from docarray import Document
import numpy as np

f = Flow().add(
uses='jinahub+docker://CLIPOnnxEncoder',
)
```

- **via source code**

```python
from jina import Flow
from docarray import Document
import numpy as np

f = Flow().add(
uses='jinahub://CLIPOnnxEncoder',
)
```

You can set the following parameters via `with`:

| Parameter | Description |
|-----------|-------------------------------------------------------------------------------------------------------------------------------|
| `name` | Model weights, default is `ViT-B/32`. Support all OpenAI released pretrained models. |
| `num_worker_preprocess` | The number of CPU workers for image & text prerpocessing, default 4. |
| `minibatch_size` | The size of a minibatch for CPU preprocessing and GPU encoding, default 16. Reduce the size of it if you encounter OOM on GPU. |
| `device` | `cuda` or `cpu`. Default is `None` means auto-detect. |

### Encoding

Encoding here means getting the fixed-length vector representation of a sentence or image.

```python
from jina import Flow
from docarray import Document, DocumentArray

da = DocumentArray(
[
Document(text='she smiled, with pain'),
Document(uri='apple.png'),
Document(uri='apple.png').load_uri_to_image_tensor(),
Document(blob=open('apple.png', 'rb').read()),
Document(uri='https://clip-as-service.jina.ai/_static/favicon.png'),
Document(
uri=''
),
]
)

f = Flow().add(
uses='jinahub+docker://CLIPTorchEncoder',
)
with f:
f.post(on='/', inputs=da)
da.summary()
```

From the output, you will see all the text and image docs have `embedding` attached.

```text
╭──────────────────────────── Documents Summary ─────────────────────────────╮
│ │
│ Length 6 │
│ Homogenous Documents False │
│ 4 Documents have attributes ('id', 'mime_type', 'uri', 'embedding') │
│ 1 Document has attributes ('id', 'mime_type', 'text', 'embedding') │
│ 1 Document has attributes ('id', 'embedding') │
│ │
╰────────────────────────────────────────────────────────────────────────────╯
╭────────────────────── Attributes Summary ───────────────────────╮
│ │
│ Attribute Data type #Unique values Has empty value │
│ ───────────────────────────────────────────────────────────── │
│ embedding ('ndarray',) 6 False │
│ id ('str',) 6 False │
│ mime_type ('str',) 5 False │
│ text ('str',) 2 False │
│ uri ('str',) 4 False │
│ │
╰─────────────────────────────────────────────────────────────────╯
```

👉 Access the embedding playground in **clip-as-service** [doc](https://clip-as-service.jina.ai/playground/embedding), type sentence or image URL and see **live embedding**!

### Ranking

One can also rank cross-modal matches via `/rank` endpoint.
First construct a *cross-modal* Document where the root contains an image and `.matches` contain sentences to rerank.

```python
from docarray import Document

d = Document(
uri='rerank.png',
matches=[
Document(text=f'a photo of a {p}')
for p in (
'control room',
'lecture room',
'conference room',
'podium indoor',
'television studio',
)
],
)
```

Then send the request via `/rank` endpoint:

```python
f = Flow().add(
uses='jinahub+docker://CLIPTorchEncoder',
)
with f:
r = f.post(on='/rank', inputs=da)
print(r['@m', ['text', 'scores__clip_score__value']])
```

Finally, in the return you can observe the matches are re-ranked according to `.scores['clip_score']`:

```bash
[['a photo of a television studio', 'a photo of a conference room', 'a photo of a lecture room', 'a photo of a control room', 'a photo of a podium indoor'],
[0.9920725226402283, 0.006038925610482693, 0.0009973491542041302, 0.00078492151806131, 0.00010626466246321797]]
```

One can also construct `text-to-image` rerank as below:

```python
from docarray import Document

d = Document(
text='a photo of conference room',
matches=[
Document(uri='https://picsum.photos/300'),
Document(uri='https://picsum.photos/id/331/50'),
Document(uri='https://clip-as-service.jina.ai/_static/favicon.png'),
],
)
```

👉 Access the ranking playground in **clip-as-service** [doc](https://clip-as-service.jina.ai/playground/reasoning/). Just input the reasoning texts as prompts, the server will rank the prompts and return sorted prompts with scores.
179 changes: 179 additions & 0 deletions .github/README-exec/torch.readme.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,179 @@
# CLIPTorchEncoder

**CLIPTorchEncoder** is the executor implemented in [clip-as-service](https://github.com/jina-ai/clip-as-service).
It serves OpenAI released [CLIP](https://github.com/openai/CLIP) models with PyTorch runtime.
The introduction of the CLIP model [can be found here](https://openai.com/blog/clip/).

- 🔀 **Automatic**: Auto-detect image and text documents depending on their content.
-**Efficiency**: Faster CLIP model inference on CPU and GPU via leveraging the best practices.
- 📈 **Observability**: Monitoring the serving via Prometheus and Grafana (see [Usage Guide](https://docs.jina.ai/how-to/monitoring/#deploying-locally)).

With advances of ONNX runtime, you can use `CLIPOnnxEncoder` (see [link](https://hub.jina.ai/executor/2a7auwg2)) instead to achieve **3x** model inference speed up.

## Model support

Open AI has released **9 models** so far. `ViT-B/32` is used as default model. Please also note that different models give **the different sizes of output dimensions**.

| Model | PyTorch | Output dimension |
|----------------|---------|------------------|
| RN50 || 1024 |
| RN101 || 512 |
| RN50x4 || 640 |
| RN50x16 || 768 |
| RN50x64 || 1024 |
| ViT-B/32 || 512 |
| ViT-B/16 || 512 |
| ViT-L/14 || 768 |
| ViT-L/14@336px || 768 |

## Usage

### Use in Jina Flow

- **via Docker image (recommended)**

```python
from jina import Flow
from docarray import Document
import numpy as np

f = Flow().add(
uses='jinahub+docker://CLIPTorchEncoder',
)
```

- **via source code**

```python
from jina import Flow
from docarray import Document
import numpy as np

f = Flow().add(
uses='jinahub://CLIPTorchEncoder',
)
```

You can set the following parameters via `with`:

| Parameter | Description |
|-------------------------|--------------------------------------------------------------------------------------------------------------------------------|
| `name` | Model weights, default is `ViT-B/32`. Support all OpenAI released pretrained models. |
| `num_worker_preprocess` | The number of CPU workers for image & text prerpocessing, default 4. |
| `minibatch_size` | The size of a minibatch for CPU preprocessing and GPU encoding, default 32. Reduce the size of it if you encounter OOM on GPU. |
| `device` | `cuda` or `cpu`. Default is `None` means auto-detect. |
| `jit` | If to enable Torchscript JIT, default is `False`. |

### Encoding

Encoding here means getting the fixed-length vector representation of a sentence or image.

```python
from jina import Flow
from docarray import Document, DocumentArray

da = DocumentArray(
[
Document(text='she smiled, with pain'),
Document(uri='apple.png'),
Document(uri='apple.png').load_uri_to_image_tensor(),
Document(blob=open('apple.png', 'rb').read()),
Document(uri='https://clip-as-service.jina.ai/_static/favicon.png'),
Document(
uri=''
),
]
)

f = Flow().add(
uses='jinahub+docker://CLIPTorchEncoder',
)
with f:
f.post(on='/', inputs=da)
da.summary()
```

From the output, you will see all the text and image docs have `embedding` attached.

```text
╭──────────────────────────── Documents Summary ─────────────────────────────╮
│ │
│ Length 6 │
│ Homogenous Documents False │
│ 4 Documents have attributes ('id', 'mime_type', 'uri', 'embedding') │
│ 1 Document has attributes ('id', 'mime_type', 'text', 'embedding') │
│ 1 Document has attributes ('id', 'embedding') │
│ │
╰────────────────────────────────────────────────────────────────────────────╯
╭────────────────────── Attributes Summary ───────────────────────╮
│ │
│ Attribute Data type #Unique values Has empty value │
│ ───────────────────────────────────────────────────────────── │
│ embedding ('ndarray',) 6 False │
│ id ('str',) 6 False │
│ mime_type ('str',) 5 False │
│ text ('str',) 2 False │
│ uri ('str',) 4 False │
│ │
╰─────────────────────────────────────────────────────────────────╯
```

👉 Access the embedding playground in **clip-as-service** [doc](https://clip-as-service.jina.ai/playground/embedding), type sentence or image URL and see **live embedding**!

### Ranking

One can also rank cross-modal matches via `/rank` endpoint.
First construct a *cross-modal* Document where the root contains an image and `.matches` contain sentences to rerank.

```python
from docarray import Document

d = Document(
uri='rerank.png',
matches=[
Document(text=f'a photo of a {p}')
for p in (
'control room',
'lecture room',
'conference room',
'podium indoor',
'television studio',
)
],
)
```

Then send the request via `/rank` endpoint:

```python
f = Flow().add(
uses='jinahub+docker://CLIPTorchEncoder',
)
with f:
r = f.post(on='/rank', inputs=da)
print(r['@m', ['text', 'scores__clip_score__value']])
```

Finally, you can observe the matches are re-ranked based on `.scores['clip_score']`:

```bash
[['a photo of a television studio', 'a photo of a conference room', 'a photo of a lecture room', 'a photo of a control room', 'a photo of a podium indoor'],
[0.9920725226402283, 0.006038925610482693, 0.0009973491542041302, 0.00078492151806131, 0.00010626466246321797]]
```

One can also construct `text-to-image` rerank as below:

```python
from docarray import Document

d = Document(
text='a photo of conference room',
matches=[
Document(uri='https://picsum.photos/300'),
Document(uri='https://picsum.photos/id/331/50'),
Document(uri='https://clip-as-service.jina.ai/_static/favicon.png'),
],
)
```

👉 Access the ranking playground in **clip-as-service** [doc](https://clip-as-service.jina.ai/playground/reasoning/). Just input the reasoning texts as prompts, the server will rank the prompts and return sorted prompts with scores.
8 changes: 3 additions & 5 deletions .github/workflows/force-docker-build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ jobs:
env:
release_token: ${{ secrets.CAS_RELEASE_TOKEN }}

regular-release:
docker-release:
needs: token-check
runs-on: ubuntu-latest
strategy:
Expand Down Expand Up @@ -104,7 +104,7 @@ jobs:
if: ${{ matrix.engine_tag == '' && matrix.pip_tag != 'tensorrt' }}
uses: docker/build-push-action@v2
with:
context: .
context: server
file: Dockerfiles/base.Dockerfile
platforms: linux/amd64
cache-from: type=registry,ref=jinaai/clip_executor:latest
Expand All @@ -116,13 +116,12 @@ jobs:
CAS_VERSION=${{env.CAS_VERSION}}
VCS_REF=${{env.VCS_REF}}
BACKEND_TAG=${{env.BACKEND_TAG}}
PIP_TAG=${{matrix.pip_tag}}
- name: CUDA Build and push
id: cuda_docker_build
if: ${{ matrix.engine_tag == 'cuda' }}
uses: docker/build-push-action@v2
with:
context: .
context: server
file: Dockerfiles/cuda.Dockerfile
platforms: linux/amd64
cache-from: type=registry,ref=jinaai/clip_executor:latest-cuda
Expand All @@ -134,4 +133,3 @@ jobs:
CAS_VERSION=${{env.CAS_VERSION}}
VCS_REF=${{env.VCS_REF}}
BACKEND_TAG=${{env.BACKEND_TAG}}
PIP_TAG=${{matrix.pip_tag}}
Loading

0 comments on commit 4d069a8

Please sign in to comment.