Skip to content

💫 Patch v0.8.1

Compare
Choose a tag to compare
@github-actions github-actions released this 15 Nov 11:16
e4717a3

Release Note (0.8.1)

Release time: 2022-11-15 11:15:48

This release contains 1 new feature, 1 performance improvement, 2 bug fixes and 4 documentation improvements.

🆕 Features

Allow custom callback in clip_client (#849)

This feature allows clip-client users to send a request to a server and then process the response with a custom callback function. There are three callbacks that users can process with custom functions: on_done, on_error and on_always.

The following code snippet shows how to send a request to a server and save the response to a database.

from clip_client import Client

db = {}

def my_on_done(resp):
    for doc in resp.docs:
        db[doc.id] = doc


def my_on_error(resp):
    with open('error.log', 'a') as f:
        f.write(resp)


def my_on_always(resp):
    print(f'{len(resp.docs)} docs processed')


c = Client('grpc://0.0.0.0:12345')
c.encode(
    ['hello', 'world'], on_done=my_on_done, on_error=my_on_error, on_always=my_on_always
)

For more details, please refer to the CLIP client documentation.

🚀 Performance

Integrate flash attention (#853)

We have integrated the flash attention module as a faster replacement for nn.MultiHeadAttention. To take advantage of this feature, you will need to install the flash attention module manually:

pip install git+https://github.com/HazyResearch/flash-attention.git

If flash attention is present, clip_server will automatically try to use it.

The table below compares CLIP performance with and without the flash attention module. We conducted all tests on a Tesla T4 GPU, and times how long it took to encode a batch of documents 100 times.

Model Input data Input shape w/o flash attention flash attention Speedup
ViT-B-32 text (1, 77) 0.42692 0.37867 1.1274
ViT-B-32 text (8, 77) 0.48738 0.45324 1.0753
ViT-B-32 text (16, 77) 0.4764 0.44315 1.07502
ViT-B-32 image (1, 3, 224, 224) 0.4349 0.40392 1.0767
ViT-B-32 image (8, 3, 224, 224) 0.47367 0.45316 1.04527
ViT-B-32 image (16, 3, 224, 224) 0.51586 0.50555 1.0204

Based on our experiments, performance improvements vary depending on the model and GPU, but in general, the flash attention module improves performance.

🐞 Bug Fixes

Increase timeout at startup for Executor docker images (#854)

During Executor initialization, it can take quite a lot of time to download model parameters. If a model is very large and downloading slowly, the Executor may fail due to time-out before even starting. We have increased the timeout to 3000000ms.

Install transformers for Executor docker images (#851)

We have added the transformers package to Executor docker images, in order to support the multilingual CLIP model.

📗 Documentation Improvements

  • Update Finetuner docs (#843)
  • Add tips for client parallelism usage (#846)
  • Move benchmark conclusion to beginning (#847)
  • Add instructions for using clip server hosted by Jina (#848)

🤟 Contributors

We would like to thank all contributors to this release: