feat: in-place result in clip_client; preserve output order by uid #815

ZiniuYu · 2022-09-06T06:01:07Z

This PR introduces the following changes to CAS:

clip_client used to return a copy of the input with embeddings and ranking Infos in it. It now updates these into the original input.
the output order is preserved using uid and is now the same as the input order
Notice: the result is not complete if the ids of input are not unique
return empty list or docarray for empty inputs
test cases

Basic tests for memory usage:

from clip_client import Client
from docarray import Document, DocumentArray
import psutil

def func():
    c = Client('grpc://0.0.0.0:51000')

    da = DocumentArray()
    for _ in range(8):
        da.append(Document(uri='toy.jpeg'))

    r = c.encode(da)
    return da, r

if __name__ == '__main__':
    mem = []
    for _ in range(10):
        a, b = func()
        mem.append(psutil.Process().memory_info().rss / 1024**2)
    print(mem)
    print(sum(mem)/len(mem))

Send 10 batches of images with each batch_size = 8

before (no in-place) in MB:

76.23828125, 76.98046875, 78.0234375, 78.99609375, 79.8125, 79.953125, 73.06640625, 73.42578125, 73.6484375, 74.0703125
avg 76.421484375

after (in-place) in MB:

73.96875, 71.5703125, 72.72265625, 73.1875, 73.3671875, 73.67578125, 74.2265625, 74.234375, 74.26171875, 72.33203125
avg 73.3546875

Basic tests for time usage:

from clip_client import Client
from docarray import Document, DocumentArray
import time

def func():
    c = Client('grpc://0.0.0.0:51000')

    da = DocumentArray()
    for _ in range(1000):
        da.append(Document(uri='toy.jpeg'))
    c.encode(da, show_progress=True)

if __name__ == '__main__':
    timecost = []
    for _ in range(10):
        t1 = time.time()
        func()
        t2 = time.time()
        timecost.append(t2-t1)
    print(timecost)
    print(sum(timecost)/len(timecost))

Time for encoding 1000 images (local machine with CPU):

before (no in-place) in second:

229.51802515983582, 228.4744529724121, 228.77719569206238, 229.18066692352295, 228.6370599269867, 228.76180005073547, 228.8146288394928, 234.90695881843567, 264.61701130867004, 268.92049384117126
avg 237.0608293533325

after (in-place) in second:

230.03532719612122, 229.0823450088501, 228.96690702438354, 229.40257811546326, 228.63984608650208, 228.8968267440796, 229.03895783424377, 229.23136687278748, 229.0418107509613, 229.43238496780396
avg 229.17683506011963

Another test to encode 10000 images (local with CPU):

before (no in-place) 2307.465388059616 s

after (in-place) 2306.363124847412 s

Test to encode 10000 images 10 times (GPU server):

before (no in-place) in second:

33.92143511772156, 33.70487380027771, 33.75519323348999, 33.83939599990845, 33.79131293296814, 33.8446307182312, 33.81287407875061, 33.930410385131836, 33.99196481704712, 33.83139514923096
avg 33.842348623275754

after (in-place) in second:

34.06313490867615, 33.81302094459534, 33.66564702987671, 33.78828310966492, 33.66629767417908, 33.921727657318115, 33.7433865070343, 33.724066972732544, 33.69422483444214, 33.75126600265503
avg 33.783105564117434

we don't observe a substantial difference after this change

codecov · 2022-09-06T06:06:10Z

Codecov Report

Merging #815 (5e030e4) into main (8d9725f) will increase coverage by 0.37%.
The diff coverage is 95.23%.

@@            Coverage Diff             @@
##             main     #815      +/-   ##
==========================================
+ Coverage   84.30%   84.67%   +0.37%     
==========================================
  Files          21       21              
  Lines        1548     1573      +25     
==========================================
+ Hits         1305     1332      +27     
+ Misses        243      241       -2

Flag	Coverage Δ
cas	`84.67% <95.23%> (+0.37%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
client/clip_client/client.py	`89.02% <95.17%> (+1.56%)`	⬆️
client/clip_client/__init__.py	`100.00% <100.00%> (ø)`
server/clip_server/__init__.py	`100.00% <100.00%> (ø)`

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

client/clip_client/client.py

tests/test_asyncio.py

tests/test_ranker.py

tests/test_simple.py

jemmyshin · 2022-09-16T07:06:28Z

client/clip_client/client.py


-    def _gather_result(self, response, results: 'DocumentArray'):
+    def _gather_result(
+        self, response, results: 'DocumentArray', attribute: Optional[str] = ''


please add docstring for attribute

default value is None, not ''

jemmyshin · 2022-09-16T07:09:07Z

client/clip_client/client.py

        from rich import filesize
        from docarray import Document

        if hasattr(self, '_pbar'):
            self._pbar.start_task(self._s_task)

-        for c in content:
+        for i, c in enumerate(content):


where is this i used?

it is used to mark the order of input, and it is now deprecated. Will remove it

jemmyshin · 2022-09-16T07:19:04Z

client/clip_client/client.py

        self._prepare_streaming(
            not kwargs.get('show_progress'),
-            total=len(docs),
+            total=len(docs) if hasattr(docs, '__len__') else None,


please add test for generator in rank and index

tests/test_client.py

jemmyshin · 2022-09-16T07:36:58Z

tests/test_client.py

+
+
+@pytest.mark.asyncio
+async def test_str_input(make_torch_flow):


rename test_str_input -> test_wrong_type_input

numb3r3

LGTM

github-actions bot added size/s component/client labels Sep 6, 2022

github-actions bot added size/m size/s area/testing and removed size/s size/m labels Sep 7, 2022

numb3r3 requested changes Sep 8, 2022

View reviewed changes

github-actions bot added size/l and removed size/m labels Sep 8, 2022

ZiniuYu marked this pull request as ready for review September 8, 2022 06:27

jemmyshin reviewed Sep 8, 2022

View reviewed changes

tests/test_asyncio.py Show resolved Hide resolved

tests/test_ranker.py Show resolved Hide resolved

tests/test_simple.py Show resolved Hide resolved

ZiniuYu requested a review from a team September 8, 2022 10:26

ZiniuYu changed the title ~~feat: inplace for embeddings; preserve output order~~ feat: in-place result in clip_client; preserve output order Sep 8, 2022

ZiniuYu changed the title ~~feat: in-place result in clip_client; preserve output order~~ feat: in-place result in clip_client; preserve output order by uid Sep 13, 2022

ZiniuYu force-pushed the inplace-order branch from 4d41716 to 69ef7b4 Compare September 13, 2022 09:41

ZiniuYu added 11 commits September 15, 2022 18:48

feat: inplace for embeddings; preserve output order

c8f48a3

feat: inplace for embeddings; preserve output order

0196312

feat: does not allow empty input

7a90da8

fix: check type before unboxing result

9a02098

chore: clean up gather_result logic

f1f8c39

feat: in place for rank

10ff90f

feat: does not allow empty input

cc7c0a7

test: clip_client in place

44b157f

test: empty input

a126d38

test: preserve original order

d179bf1

test: empty input

d0859d2

ZiniuYu added 6 commits September 15, 2022 18:48

fix: return for empty input

aea569b

fix: improve unboxde_result

7b99d03

test: preserve order

42ee35b

fix: refactor gather_result

4535808

fix: in-place for index and search

7b7c37e

chore: remove unused import

f5eadd5

ZiniuYu force-pushed the inplace-order branch from 69ef7b4 to f5eadd5 Compare September 15, 2022 10:51

ZiniuYu requested review from numb3r3 and jemmyshin September 16, 2022 05:56

ZiniuYu added 3 commits September 16, 2022 14:22

fix: typo

befd7ea

test: str input

ebaa9d0

fix: typo in test

4b45dd1

jemmyshin suggested changes Sep 16, 2022

View reviewed changes

ZiniuYu added 3 commits September 16, 2022 16:24

test: generator rank

2697488

fix: typo

f4ee3a2

fix: wrong input exception

5e030e4

ZiniuYu mentioned this pull request Sep 17, 2022

feat: add option on server to remove input image content after encoding to save space #817

Closed

numb3r3 approved these changes Sep 19, 2022

View reviewed changes

numb3r3 merged commit bcce990 into main Sep 19, 2022

numb3r3 deleted the inplace-order branch September 19, 2022 06:03

numb3r3 mentioned this pull request Oct 10, 2022

chore: draft release note v0.8.0 #840

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: in-place result in clip_client; preserve output order by uid #815

feat: in-place result in clip_client; preserve output order by uid #815

ZiniuYu commented Sep 6, 2022 •

edited

Loading

codecov bot commented Sep 6, 2022 •

edited

Loading

jemmyshin Sep 16, 2022

jemmyshin Sep 16, 2022

jemmyshin Sep 16, 2022

ZiniuYu Sep 16, 2022

jemmyshin Sep 16, 2022

jemmyshin Sep 16, 2022

numb3r3 left a comment



		@pytest.mark.asyncio
		async def test_str_input(make_torch_flow):

feat: in-place result in clip_client; preserve output order by uid #815

feat: in-place result in clip_client; preserve output order by uid #815

Conversation

ZiniuYu commented Sep 6, 2022 • edited Loading

codecov bot commented Sep 6, 2022 • edited Loading

Codecov Report

jemmyshin Sep 16, 2022

Choose a reason for hiding this comment

jemmyshin Sep 16, 2022

Choose a reason for hiding this comment

jemmyshin Sep 16, 2022

Choose a reason for hiding this comment

ZiniuYu Sep 16, 2022

Choose a reason for hiding this comment

jemmyshin Sep 16, 2022

Choose a reason for hiding this comment

jemmyshin Sep 16, 2022

Choose a reason for hiding this comment

numb3r3 left a comment

Choose a reason for hiding this comment

ZiniuYu commented Sep 6, 2022 •

edited

Loading

codecov bot commented Sep 6, 2022 •

edited

Loading