Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: upload torch executor #723

Merged
merged 46 commits into from
Jun 15, 2022
Merged
Show file tree
Hide file tree
Changes from 44 commits
Commits
Show all changes
46 commits
Select commit Hold shift + click to select a range
417f2ae
feat: add hub push runner
numb3r3 May 18, 2022
a543ff7
fix: hub push yaml
numb3r3 May 18, 2022
fca95a6
fix: hub push yaml
numb3r3 May 18, 2022
f672a1c
fix: debug script
numb3r3 May 18, 2022
d4c5ca8
fix: debug script
numb3r3 May 18, 2022
2507cd2
fix: debug script
numb3r3 May 18, 2022
e010525
fix: debug script
numb3r3 May 18, 2022
a067afc
fix: debug script
numb3r3 May 18, 2022
db1d77c
fix: debug script
numb3r3 May 18, 2022
075ed4e
fix: debug script
numb3r3 May 18, 2022
42799ec
fix: debug script
numb3r3 May 18, 2022
0f4b543
fix: debug script
numb3r3 May 18, 2022
9dbba46
fix: comment manifest
numb3r3 May 18, 2022
03f6c1c
fix: revert manifest
numb3r3 May 18, 2022
3d3a2f4
fix: use relative import
numb3r3 May 26, 2022
88c2a74
fix: change base folder
numb3r3 May 26, 2022
03cd653
fix: hub push
numb3r3 May 27, 2022
684b661
fix: bumb jina version
numb3r3 Jun 9, 2022
4f3b11d
fix: get requirments.txt
numb3r3 Jun 14, 2022
7c07615
fix: turnon workflow on PR
numb3r3 Jun 14, 2022
93f11b0
fix: update dockerfile
numb3r3 Jun 14, 2022
249deb6
fix: error
numb3r3 Jun 14, 2022
3fff618
fix: executor name
numb3r3 Jun 14, 2022
fecdb90
fix: use jinahub auth token
numb3r3 Jun 14, 2022
924ad16
fix: test torch upload
numb3r3 Jun 14, 2022
cf7f725
fix: docker
numb3r3 Jun 14, 2022
66042e0
fix: upload gpu executor
numb3r3 Jun 14, 2022
0761a14
fix: gpu tag
numb3r3 Jun 14, 2022
f8b5eca
fix: gpu tag
numb3r3 Jun 14, 2022
e449a78
feat: upload onnx executor
numb3r3 Jun 14, 2022
463a1df
fix: debug onnx upload
numb3r3 Jun 14, 2022
12afd99
fix: debug onnx upload
numb3r3 Jun 14, 2022
e4558d9
fix: minor revision
numb3r3 Jun 15, 2022
a3f78c5
fix: add torch exec readme
numb3r3 Jun 15, 2022
ecadf68
fix: add onnx exec readme
numb3r3 Jun 15, 2022
99879dc
chore: update exec readme
numb3r3 Jun 15, 2022
95f91b6
fix: update readme
numb3r3 Jun 15, 2022
22b5034
chore: update readme
numb3r3 Jun 15, 2022
54b05bb
chore: onnx readme
numb3r3 Jun 15, 2022
afeaa62
chore: update readme
numb3r3 Jun 15, 2022
ba3bd55
docs: fix batch_size
ZiniuYu Jun 15, 2022
b899ff7
docs: fix batch_size
ZiniuYu Jun 15, 2022
da597e6
chore: updates
numb3r3 Jun 15, 2022
add54a5
Merge branch 'clip_jina_hub' of github.com:jina-ai/clip-as-service in…
numb3r3 Jun 15, 2022
0e3e86a
chore: upload pytorch and onnx runtime based executors
numb3r3 Jun 15, 2022
2cb016a
fix: use relative imports
numb3r3 Jun 15, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
177 changes: 177 additions & 0 deletions .github/README-exec/onnx.readme.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,177 @@
# CLIPOnnxEncoder

**CLIPOnnxEncoder** is the executor implemented in [clip-as-service](https://github.com/jina-ai/clip-as-service).
It serves OpenAI released [CLIP](https://github.com/openai/CLIP) models with ONNX runtime (🚀 **3x** speed up).
The introduction of the CLIP model [can be found here](https://openai.com/blog/clip/).

- 🔀 **Automatic**: Auto-detect image and text documents depending on their content.
- ⚡ **Efficiency**: Faster CLIP model inference on CPU and GPU via ONNX runtime.
- 📈 **Observability**: Monitoring the serving via Prometheus and Grafana (see [Usage Guide](https://docs.jina.ai/how-to/monitoring/#deploying-locally)).


## Model support

Open AI has released 9 models so far. `ViT-B/32` is used as default model. Please also note that different model give **different size of output dimensions**.

| Model | ONNX | Output dimension |
|----------------|-----| --- |
| RN50 | ✅ | 1024 |
| RN101 | ✅ | 512 |
| RN50x4 | ✅ | 640 |
| RN50x16 | ✅ | 768 |
| RN50x64 | ✅ | 1024 |
| ViT-B/32 | ✅ | 512 |
| ViT-B/16 | ✅ | 512 |
| ViT-L/14 | ✅ | 768 |
| ViT-L/14@336px | ✅ | 768 |

## Usage

### Use in Jina Flow

- **via Docker image (recommended)**

```python
from jina import Flow
from docarray import Document
import numpy as np

f = Flow().add(
uses='jinahub+docker://CLIPOnnxEncoder',
)
```

- **via source code**

```python
from jina import Flow
from docarray import Document
import numpy as np

f = Flow().add(
uses='jinahub://CLIPOnnxEncoder',
)
```

You can set the following parameters via `with`:

| Parameter | Description |
|-----------|-------------------------------------------------------------------------------------------------------------------------------|
| `name` | Model weights, default is `ViT-B/32`. Support all OpenAI released pretrained models. |
| `num_worker_preprocess` | The number of CPU workers for image & text prerpocessing, default 4. |
| `minibatch_size` | The size of a minibatch for CPU preprocessing and GPU encoding, default 16. Reduce the size of it if you encounter OOM on GPU. |
| `device` | `cuda` or `cpu`. Default is `None` means auto-detect. |

### Encoding

Encoding here means getting the fixed-length vector representation of a sentence or image.

```python
from jina import Flow
from docarray import Document, DocumentArray

da = DocumentArray(
[
Document(text='she smiled, with pain'),
Document(uri='apple.png'),
Document(uri='apple.png').load_uri_to_image_tensor(),
Document(blob=open('apple.png', 'rb').read()),
Document(uri='https://clip-as-service.jina.ai/_static/favicon.png'),
Document(
uri='data:image/gif;base64,R0lGODlhEAAQAMQAAORHHOVSKudfOulrSOp3WOyDZu6QdvCchPGolfO0o/XBs/fNwfjZ0frl3/zy7////wAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACH5BAkAABAALAAAAAAQABAAAAVVICSOZGlCQAosJ6mu7fiyZeKqNKToQGDsM8hBADgUXoGAiqhSvp5QAnQKGIgUhwFUYLCVDFCrKUE1lBavAViFIDlTImbKC5Gm2hB0SlBCBMQiB0UjIQA7'
),
]
)

f = Flow().add(
uses='jinahub+docker://CLIPTorchEncoder',
)
with f:
f.post(on='/', inputs=da)
da.summary()
```

From the output, you will see all the text and image docs have `embedding` attached.

```text
╭──────────────────────────── Documents Summary ─────────────────────────────╮
│ │
│ Length 6 │
│ Homogenous Documents False │
│ 4 Documents have attributes ('id', 'mime_type', 'uri', 'embedding') │
│ 1 Document has attributes ('id', 'mime_type', 'text', 'embedding') │
│ 1 Document has attributes ('id', 'embedding') │
│ │
╰────────────────────────────────────────────────────────────────────────────╯
╭────────────────────── Attributes Summary ───────────────────────╮
│ │
│ Attribute Data type #Unique values Has empty value │
│ ───────────────────────────────────────────────────────────── │
│ embedding ('ndarray',) 6 False │
│ id ('str',) 6 False │
│ mime_type ('str',) 5 False │
│ text ('str',) 2 False │
│ uri ('str',) 4 False │
│ │
╰─────────────────────────────────────────────────────────────────╯
```

👉 Access the embedding playground in **clip-as-service** [doc](https://clip-as-service.jina.ai/playground/embedding), type sentence or image URL and see **live embedding**!

### Ranking

One can also rank cross-modal matches via `/rank` endpoint.
First construct a *cross-modal* Document where the root contains an image and `.matches` contain sentences to rerank.

```python
from docarray import Document

d = Document(
uri='rerank.png',
matches=[
Document(text=f'a photo of a {p}')
for p in (
'control room',
'lecture room',
'conference room',
'podium indoor',
'television studio',
)
],
)
```

Then send the request via `/rank` endpoint:

```python
f = Flow().add(
uses='jinahub+docker://CLIPTorchEncoder',
)
with f:
r = f.post(on='/rank', inputs=da)
print(r['@m', ['text', 'scores__clip_score__value']])
```

Finally, in the return you can observe the matches are re-ranked according to `.scores['clip_score']`:

```bash
[['a photo of a television studio', 'a photo of a conference room', 'a photo of a lecture room', 'a photo of a control room', 'a photo of a podium indoor'],
[0.9920725226402283, 0.006038925610482693, 0.0009973491542041302, 0.00078492151806131, 0.00010626466246321797]]
```

One can also construct `text-to-image` rerank as below:

```python
from docarray import Document

d = Document(
text='a photo of conference room',
matches=[
Document(uri='https://picsum.photos/300'),
Document(uri='https://picsum.photos/id/331/50'),
Document(uri='https://clip-as-service.jina.ai/_static/favicon.png'),
],
)
```

👉 Access the ranking playground in **clip-as-service** [doc](https://clip-as-service.jina.ai/playground/reasoning/). Just input the reasoning texts as prompts, the server will rank the prompts and return sorted prompts with scores.
179 changes: 179 additions & 0 deletions .github/README-exec/torch.readme.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,179 @@
# CLIPTorchEncoder

**CLIPTorchEncoder** is the executor implemented in [clip-as-service](https://github.com/jina-ai/clip-as-service).
It serves OpenAI released [CLIP](https://github.com/openai/CLIP) models with PyTorch runtime.
The introduction of the CLIP model [can be found here](https://openai.com/blog/clip/).

- 🔀 **Automatic**: Auto-detect image and text documents depending on their content.
- ⚡ **Efficiency**: Faster CLIP model inference on CPU and GPU via leveraging the best practices.
- 📈 **Observability**: Monitoring the serving via Prometheus and Grafana (see [Usage Guide](https://docs.jina.ai/how-to/monitoring/#deploying-locally)).

With advances of ONNX runtime, you can use `CLIPOnnxExecutor` (see [link](https://hub.jina.ai/executor/2a7auwg2)) instead to achieve **3x** model inference speed up.
numb3r3 marked this conversation as resolved.
Show resolved Hide resolved

## Model support

Open AI has released **9 models** so far. `ViT-B/32` is used as default model. Please also note that different models give **the different sizes of output dimensions**.

| Model | PyTorch | Output dimension |
|----------------|---------|------------------|
| RN50 | ✅ | 1024 |
| RN101 | ✅ | 512 |
| RN50x4 | ✅ | 640 |
| RN50x16 | ✅ | 768 |
| RN50x64 | ✅ | 1024 |
| ViT-B/32 | ✅ | 512 |
| ViT-B/16 | ✅ | 512 |
| ViT-L/14 | ✅ | 768 |
| ViT-L/14@336px | ✅ | 768 |

## Usage

### Use in Jina Flow

- **via Docker image (recommended)**

```python
from jina import Flow
from docarray import Document
import numpy as np

f = Flow().add(
uses='jinahub+docker://CLIPTorchEncoder',
)
```

- **via source code**

```python
from jina import Flow
from docarray import Document
import numpy as np

f = Flow().add(
uses='jinahub://CLIPTorchEncoder',
)
```

You can set the following parameters via `with`:

| Parameter | Description |
|-------------------------|--------------------------------------------------------------------------------------------------------------------------------|
| `name` | Model weights, default is `ViT-B/32`. Support all OpenAI released pretrained models. |
| `num_worker_preprocess` | The number of CPU workers for image & text prerpocessing, default 4. |
| `minibatch_size` | The size of a minibatch for CPU preprocessing and GPU encoding, default 32. Reduce the size of it if you encounter OOM on GPU. |
| `device` | `cuda` or `cpu`. Default is `None` means auto-detect. |
| `jit` | If to enable Torchscript JIT, default is `False`. |

### Encoding

Encoding here means getting the fixed-length vector representation of a sentence or image.

```python
from jina import Flow
from docarray import Document, DocumentArray

da = DocumentArray(
[
Document(text='she smiled, with pain'),
Document(uri='apple.png'),
Document(uri='apple.png').load_uri_to_image_tensor(),
Document(blob=open('apple.png', 'rb').read()),
Document(uri='https://clip-as-service.jina.ai/_static/favicon.png'),
Document(
uri='data:image/gif;base64,R0lGODlhEAAQAMQAAORHHOVSKudfOulrSOp3WOyDZu6QdvCchPGolfO0o/XBs/fNwfjZ0frl3/zy7////wAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACH5BAkAABAALAAAAAAQABAAAAVVICSOZGlCQAosJ6mu7fiyZeKqNKToQGDsM8hBADgUXoGAiqhSvp5QAnQKGIgUhwFUYLCVDFCrKUE1lBavAViFIDlTImbKC5Gm2hB0SlBCBMQiB0UjIQA7'
),
]
)

f = Flow().add(
uses='jinahub+docker://CLIPTorchEncoder',
)
with f:
f.post(on='/', inputs=da)
da.summary()
```

From the output, you will see all the text and image docs have `embedding` attached.

```text
╭──────────────────────────── Documents Summary ─────────────────────────────╮
│ │
│ Length 6 │
│ Homogenous Documents False │
│ 4 Documents have attributes ('id', 'mime_type', 'uri', 'embedding') │
│ 1 Document has attributes ('id', 'mime_type', 'text', 'embedding') │
│ 1 Document has attributes ('id', 'embedding') │
│ │
╰────────────────────────────────────────────────────────────────────────────╯
╭────────────────────── Attributes Summary ───────────────────────╮
│ │
│ Attribute Data type #Unique values Has empty value │
│ ───────────────────────────────────────────────────────────── │
│ embedding ('ndarray',) 6 False │
│ id ('str',) 6 False │
│ mime_type ('str',) 5 False │
│ text ('str',) 2 False │
│ uri ('str',) 4 False │
│ │
╰─────────────────────────────────────────────────────────────────╯
```

👉 Access the embedding playground in **clip-as-service** [doc](https://clip-as-service.jina.ai/playground/embedding), type sentence or image URL and see **live embedding**!

### Ranking

One can also rank cross-modal matches via `/rank` endpoint.
First construct a *cross-modal* Document where the root contains an image and `.matches` contain sentences to rerank.

```python
from docarray import Document

d = Document(
uri='rerank.png',
matches=[
Document(text=f'a photo of a {p}')
for p in (
'control room',
'lecture room',
'conference room',
'podium indoor',
'television studio',
)
],
)
```

Then send the request via `/rank` endpoint:

```python
f = Flow().add(
uses='jinahub+docker://CLIPTorchEncoder',
)
with f:
r = f.post(on='/rank', inputs=da)
print(r['@m', ['text', 'scores__clip_score__value']])
```

Finally, you can observe the matches are re-ranked based on `.scores['clip_score']`:

```bash
[['a photo of a television studio', 'a photo of a conference room', 'a photo of a lecture room', 'a photo of a control room', 'a photo of a podium indoor'],
[0.9920725226402283, 0.006038925610482693, 0.0009973491542041302, 0.00078492151806131, 0.00010626466246321797]]
```

One can also construct `text-to-image` rerank as below:

```python
from docarray import Document

d = Document(
text='a photo of conference room',
matches=[
Document(uri='https://picsum.photos/300'),
Document(uri='https://picsum.photos/id/331/50'),
Document(uri='https://clip-as-service.jina.ai/_static/favicon.png'),
],
)
```
numb3r3 marked this conversation as resolved.
Show resolved Hide resolved

👉 Access the ranking playground in **clip-as-service** [doc](https://clip-as-service.jina.ai/playground/reasoning/). Just input the reasoning texts as prompts, the server will rank the prompts and return sorted prompts with scores.
8 changes: 3 additions & 5 deletions .github/workflows/force-docker-build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ jobs:
env:
release_token: ${{ secrets.CAS_RELEASE_TOKEN }}

regular-release:
docker-release:
needs: token-check
runs-on: ubuntu-latest
strategy:
Expand Down Expand Up @@ -104,7 +104,7 @@ jobs:
if: ${{ matrix.engine_tag == '' && matrix.pip_tag != 'tensorrt' }}
uses: docker/build-push-action@v2
with:
context: .
context: server
file: Dockerfiles/base.Dockerfile
platforms: linux/amd64
cache-from: type=registry,ref=jinaai/clip_executor:latest
Expand All @@ -116,13 +116,12 @@ jobs:
CAS_VERSION=${{env.CAS_VERSION}}
VCS_REF=${{env.VCS_REF}}
BACKEND_TAG=${{env.BACKEND_TAG}}
PIP_TAG=${{matrix.pip_tag}}
- name: CUDA Build and push
id: cuda_docker_build
if: ${{ matrix.engine_tag == 'cuda' }}
uses: docker/build-push-action@v2
with:
context: .
context: server
file: Dockerfiles/cuda.Dockerfile
platforms: linux/amd64
cache-from: type=registry,ref=jinaai/clip_executor:latest-cuda
Expand All @@ -134,4 +133,3 @@ jobs:
CAS_VERSION=${{env.CAS_VERSION}}
VCS_REF=${{env.VCS_REF}}
BACKEND_TAG=${{env.BACKEND_TAG}}
PIP_TAG=${{matrix.pip_tag}}
Loading