💫 Patch v0.8.1
Release Note (0.8.1
)
Release time: 2022-11-15 11:15:48
This release contains 1 new feature, 1 performance improvement, 2 bug fixes and 4 documentation improvements.
🆕 Features
Allow custom callback in clip_client
(#849)
This feature allows clip-client
users to send a request to a server and then process the response with a custom callback function. There are three callbacks that users can process with custom functions: on_done
, on_error
and on_always
.
The following code snippet shows how to send a request to a server and save the response to a database.
from clip_client import Client
db = {}
def my_on_done(resp):
for doc in resp.docs:
db[doc.id] = doc
def my_on_error(resp):
with open('error.log', 'a') as f:
f.write(resp)
def my_on_always(resp):
print(f'{len(resp.docs)} docs processed')
c = Client('grpc://0.0.0.0:12345')
c.encode(
['hello', 'world'], on_done=my_on_done, on_error=my_on_error, on_always=my_on_always
)
For more details, please refer to the CLIP client documentation.
🚀 Performance
Integrate flash attention (#853)
We have integrated the flash attention module as a faster replacement for nn.MultiHeadAttention
. To take advantage of this feature, you will need to install the flash attention module manually:
pip install git+https://github.com/HazyResearch/flash-attention.git
If flash attention is present, clip_server
will automatically try to use it.
The table below compares CLIP performance with and without the flash attention module. We conducted all tests on a Tesla T4
GPU, and times how long it took to encode a batch of documents 100 times.
Model | Input data | Input shape | w/o flash attention | flash attention | Speedup |
---|---|---|---|---|---|
ViT-B-32 |
text | (1, 77) | 0.42692 | 0.37867 | 1.1274 |
ViT-B-32 |
text | (8, 77) | 0.48738 | 0.45324 | 1.0753 |
ViT-B-32 |
text | (16, 77) | 0.4764 | 0.44315 | 1.07502 |
ViT-B-32 |
image | (1, 3, 224, 224) | 0.4349 | 0.40392 | 1.0767 |
ViT-B-32 |
image | (8, 3, 224, 224) | 0.47367 | 0.45316 | 1.04527 |
ViT-B-32 |
image | (16, 3, 224, 224) | 0.51586 | 0.50555 | 1.0204 |
Based on our experiments, performance improvements vary depending on the model and GPU, but in general, the flash attention module improves performance.
🐞 Bug Fixes
Increase timeout at startup for Executor docker images (#854)
During Executor
initialization, it can take quite a lot of time to download model parameters. If a model is very large and downloading slowly, the Executor
may fail due to time-out before even starting. We have increased the timeout to 3000000ms.
Install transformers for Executor docker images (#851)
We have added the transformers
package to Executor
docker images, in order to support the multilingual CLIP model.
📗 Documentation Improvements
- Update Finetuner docs (#843)
- Add tips for client parallelism usage (#846)
- Move benchmark conclusion to beginning (#847)
- Add instructions for using clip server hosted by Jina (#848)
🤟 Contributors
We would like to thank all contributors to this release:
- Ziniu Yu (@ZiniuYu)
- Jie Fu (@jemmyshin)
- felix-wang (@numb3r3)
- YangXiuyu (@OrangeSodahub)