Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: in-place result in clip_client; preserve output order by uid #815

Merged
merged 23 commits into from
Sep 19, 2022

Conversation

ZiniuYu
Copy link
Member

@ZiniuYu ZiniuYu commented Sep 6, 2022

This PR introduces the following changes to CAS:

  • clip_client used to return a copy of the input with embeddings and ranking Infos in it. It now updates these into the original input.
  • the output order is preserved using uid and is now the same as the input order
    Notice: the result is not complete if the ids of input are not unique
  • return empty list or docarray for empty inputs
  • test cases

Basic tests for memory usage:

from clip_client import Client
from docarray import Document, DocumentArray
import psutil

def func():
    c = Client('grpc://0.0.0.0:51000')

    da = DocumentArray()
    for _ in range(8):
        da.append(Document(uri='toy.jpeg'))

    r = c.encode(da)
    return da, r

if __name__ == '__main__':
    mem = []
    for _ in range(10):
        a, b = func()
        mem.append(psutil.Process().memory_info().rss / 1024**2)
    print(mem)
    print(sum(mem)/len(mem))

Send 10 batches of images with each batch_size = 8

before (no in-place) in MB:

76.23828125, 76.98046875, 78.0234375, 78.99609375, 79.8125, 79.953125, 73.06640625, 73.42578125, 73.6484375, 74.0703125
avg 76.421484375

after (in-place) in MB:

73.96875, 71.5703125, 72.72265625, 73.1875, 73.3671875, 73.67578125, 74.2265625, 74.234375, 74.26171875, 72.33203125
avg 73.3546875

Basic tests for time usage:

from clip_client import Client
from docarray import Document, DocumentArray
import time

def func():
    c = Client('grpc://0.0.0.0:51000')

    da = DocumentArray()
    for _ in range(1000):
        da.append(Document(uri='toy.jpeg'))
    c.encode(da, show_progress=True)

if __name__ == '__main__':
    timecost = []
    for _ in range(10):
        t1 = time.time()
        func()
        t2 = time.time()
        timecost.append(t2-t1)
    print(timecost)
    print(sum(timecost)/len(timecost))

Time for encoding 1000 images (local machine with CPU):

before (no in-place) in second:

229.51802515983582, 228.4744529724121, 228.77719569206238, 229.18066692352295, 228.6370599269867, 228.76180005073547, 228.8146288394928, 234.90695881843567, 264.61701130867004, 268.92049384117126
avg 237.0608293533325

after (in-place) in second:

230.03532719612122, 229.0823450088501, 228.96690702438354, 229.40257811546326, 228.63984608650208, 228.8968267440796, 229.03895783424377, 229.23136687278748, 229.0418107509613, 229.43238496780396
avg 229.17683506011963

Another test to encode 10000 images (local with CPU):

before (no in-place) 2307.465388059616 s

after (in-place) 2306.363124847412 s

Test to encode 10000 images 10 times (GPU server):

before (no in-place) in second:

33.92143511772156, 33.70487380027771, 33.75519323348999, 33.83939599990845, 33.79131293296814, 33.8446307182312, 33.81287407875061, 33.930410385131836, 33.99196481704712, 33.83139514923096
avg 33.842348623275754

after (in-place) in second:

34.06313490867615, 33.81302094459534, 33.66564702987671, 33.78828310966492, 33.66629767417908, 33.921727657318115, 33.7433865070343, 33.724066972732544, 33.69422483444214, 33.75126600265503
avg 33.783105564117434

we don't observe a substantial difference after this change

@codecov
Copy link

codecov bot commented Sep 6, 2022

Codecov Report

Merging #815 (5e030e4) into main (8d9725f) will increase coverage by 0.37%.
The diff coverage is 95.23%.

@@            Coverage Diff             @@
##             main     #815      +/-   ##
==========================================
+ Coverage   84.30%   84.67%   +0.37%     
==========================================
  Files          21       21              
  Lines        1548     1573      +25     
==========================================
+ Hits         1305     1332      +27     
+ Misses        243      241       -2     
Flag Coverage Δ
cas 84.67% <95.23%> (+0.37%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
client/clip_client/client.py 89.02% <95.17%> (+1.56%) ⬆️
client/clip_client/__init__.py 100.00% <100.00%> (ø)
server/clip_server/__init__.py 100.00% <100.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

client/clip_client/client.py Outdated Show resolved Hide resolved
client/clip_client/client.py Outdated Show resolved Hide resolved
client/clip_client/client.py Outdated Show resolved Hide resolved
client/clip_client/client.py Outdated Show resolved Hide resolved
client/clip_client/client.py Outdated Show resolved Hide resolved
client/clip_client/client.py Outdated Show resolved Hide resolved
@github-actions github-actions bot added size/l and removed size/m labels Sep 8, 2022
@ZiniuYu ZiniuYu marked this pull request as ready for review September 8, 2022 06:27
tests/test_asyncio.py Show resolved Hide resolved
tests/test_ranker.py Show resolved Hide resolved
tests/test_simple.py Show resolved Hide resolved
@ZiniuYu ZiniuYu requested a review from a team September 8, 2022 10:26
@ZiniuYu ZiniuYu changed the title feat: inplace for embeddings; preserve output order feat: in-place result in clip_client; preserve output order Sep 8, 2022
@ZiniuYu ZiniuYu changed the title feat: in-place result in clip_client; preserve output order feat: in-place result in clip_client; preserve output order by uid Sep 13, 2022

def _gather_result(self, response, results: 'DocumentArray'):
def _gather_result(
self, response, results: 'DocumentArray', attribute: Optional[str] = ''
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add docstring for attribute

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default value is None, not ''

from rich import filesize
from docarray import Document

if hasattr(self, '_pbar'):
self._pbar.start_task(self._s_task)

for c in content:
for i, c in enumerate(content):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is this i used?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is used to mark the order of input, and it is now deprecated. Will remove it

self._prepare_streaming(
not kwargs.get('show_progress'),
total=len(docs),
total=len(docs) if hasattr(docs, '__len__') else None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add test for generator in rank and index

tests/test_client.py Show resolved Hide resolved


@pytest.mark.asyncio
async def test_str_input(make_torch_flow):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename test_str_input -> test_wrong_type_input

Copy link
Member

@numb3r3 numb3r3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@numb3r3 numb3r3 merged commit bcce990 into main Sep 19, 2022
@numb3r3 numb3r3 deleted the inplace-order branch September 19, 2022 06:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants