feat: add traversal paths #750

numb3r3 · 2022-06-10T10:29:04Z

No description provided.

codecov · 2022-06-10T10:33:01Z

Codecov Report

Merging #750 (ada03bc) into main (d5be8c2) will increase coverage by 0.24%.
The diff coverage is 98.27%.

@@            Coverage Diff             @@
##             main     #750      +/-   ##
==========================================
+ Coverage   81.49%   81.74%   +0.24%     
==========================================
  Files          17       17              
  Lines        1232     1205      -27     
==========================================
- Hits         1004      985      -19     
+ Misses        228      220       -8

Flag	Coverage Δ
cas	`81.74% <98.27%> (+0.24%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
server/clip_server/executors/clip_onnx.py	`85.48% <90.90%> (+0.76%)`	⬆️
client/clip_client/client.py	`88.13% <100.00%> (-0.20%)`	⬇️
server/clip_server/executors/clip_hg.py	`86.07% <100.00%> (+0.54%)`	⬆️
server/clip_server/executors/clip_tensorrt.py	`100.00% <100.00%> (+7.01%)`	⬆️
server/clip_server/executors/clip_torch.py	`87.03% <100.00%> (+1.09%)`	⬆️
server/clip_server/executors/helper.py	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d5be8c2...ada03bc. Read the comment docs.

client/clip_client/client.py

hanxiao · 2022-06-10T10:33:36Z

server/clip_server/executors/clip_hg.py

        num_worker_preprocess: int = 4,
        minibatch_size: int = 32,
+        traversal_paths: str = '@r',


Suggested change

traversal_paths: str = '@r',

hanxiao · 2022-06-10T10:33:43Z

server/clip_server/executors/clip_hg.py

+        :param traversal_paths: Default traversal paths for encoding, used if
+            the traversal path is not passed as a parameter with the request.


Suggested change

:param traversal_paths: Default traversal paths for encoding, used if

the traversal path is not passed as a parameter with the request.

hanxiao · 2022-06-10T10:33:48Z

server/clip_server/executors/clip_hg.py

        """
        super().__init__(*args, **kwargs)
        self._minibatch_size = minibatch_size

        self._use_default_preprocessing = use_default_preprocessing
        self._max_length = max_length
+        self._traversal_paths = traversal_paths


Suggested change

self._traversal_paths = traversal_paths

hanxiao · 2022-06-10T10:34:16Z

server/clip_server/executors/clip_hg.py

        """
+
+        traversal_paths = parameters.get('traversal_paths', self._traversal_paths)


Suggested change

traversal_paths = parameters.get('traversal_paths', self._traversal_paths)

traversal_paths = parameters.get('traversal_paths', '@r')

@hanxiao I don't agree with this suggestion. It will break the following use case:

gateway -> encoder #1 (work on root_level) -> encoder #2(work on chunk_level)

It is impossible to pass the proper parameters:

client.post(on='/', parameters={'traversal_paths': '?????'})

By defining the default traversal path in __init__, client.pos(on='/') works

gateway -> encoder #1 (traversal_paths='@r') -> encoder #2(traversal_paths='@c')

why it is impossible? i dont get it

@hanxiao From the above flow example, there are two encoders both working on different level documents (one on root-level, another on chunk-level).

use @r on request parameter at client: client.post(on='/', parameters={'traversal_paths': '@r'})
-> encoder 2 cannot work

use @c on request parameter at client: client.post(on='/', parameters={'traversal_paths': '@c'})
-> encoder 1 cannot work

but you should be able to send parameter to one particular Executor

hanxiao · 2022-06-10T10:37:28Z

client/clip_client/client.py

@@ -184,10 +181,15 @@ def _iter_doc(self, content) -> Generator['Document', None, None]:
                )

    def _get_post_payload(self, content, kwargs):
+        parameters = {}
+        if 'batch_size' in kwargs:
+            parameters['minibatch_size'] = kwargs['batch_size']


i disagree the exposing minibatch_size to public client. It can easily overload a CAS server. Imagine user now has the capability of controlling both request_size and minibatch_size, the user can easily occupy the full GPU usage on our Berlin GPU server. It can easily make our GPU OOM by setting large request_size and minibatch_size

In C-S architecture, one should not aim to expose every server args to client, it is very risky.

I see, that makes sense. Then we need to update the document about how to control batch size.

hanxiao · 2022-06-13T09:45:32Z

client/clip_client/client.py

@@ -187,7 +184,7 @@ def _get_post_payload(self, content, kwargs):
        return dict(
            on='/',
            inputs=self._iter_doc(content),
-            request_size=kwargs.get('batch_size', 8),
+            request_size=kwargs.get('batch_size', 32),


Suggested change

request_size=kwargs.get('batch_size', 32),

request_size=kwargs.get('batch_size', 8),

numb3r3 added 12 commits June 10, 2022 18:28

feat: add traversal paths

8082eec

fix: collate batch

1e3c2aa

fix: parameters dict

2933e2e

fix: rank parameters

07186dd

fix: change default minibatch size

780bb36

fix: support traversal in client

e50b477

fix: pass minibatch_size from client

dff3e7b

fix: unittest

d232b8e

fix: error

0877128

fix: unittest

6561225

fix: tensorrt traversal paths

49eee18

fix: revert client

726c661

github-actions bot added size/m area/testing component/client component/server labels Jun 10, 2022

numb3r3 requested a review from hanxiao June 10, 2022 10:30

hanxiao reviewed Jun 10, 2022

View reviewed changes

client/clip_client/client.py Outdated Show resolved Hide resolved

hanxiao reviewed Jun 10, 2022

View reviewed changes

fix: minor revision

5a7f205

hanxiao reviewed Jun 10, 2022

View reviewed changes

fix: clinet

2331d91

numb3r3 requested a review from hanxiao June 10, 2022 10:58

hanxiao reviewed Jun 13, 2022

View reviewed changes

refactor: parameter rename and set default batch_size

ada03bc

hanxiao approved these changes Jun 13, 2022

View reviewed changes

numb3r3 merged commit e022bd4 into main Jun 13, 2022

numb3r3 deleted the feat-traversal-paths branch June 13, 2022 11:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add traversal paths #750

feat: add traversal paths #750

numb3r3 commented Jun 10, 2022

codecov bot commented Jun 10, 2022 •

edited

Loading

hanxiao Jun 10, 2022

hanxiao Jun 10, 2022

hanxiao Jun 10, 2022

hanxiao Jun 10, 2022

numb3r3 Jun 10, 2022 •

edited

Loading

numb3r3 Jun 10, 2022

hanxiao Jun 10, 2022

numb3r3 Jun 11, 2022

hanxiao Jun 13, 2022

hanxiao Jun 10, 2022

hanxiao Jun 10, 2022

numb3r3 Jun 10, 2022

hanxiao Jun 13, 2022

		:param traversal_paths: Default traversal paths for encoding, used if
		the traversal path is not passed as a parameter with the request.

		"""

		traversal_paths = parameters.get('traversal_paths', self._traversal_paths)

	request_size=kwargs.get('batch_size', 32),
	request_size=kwargs.get('batch_size', 8),

feat: add traversal paths #750

feat: add traversal paths #750

Conversation

numb3r3 commented Jun 10, 2022

codecov bot commented Jun 10, 2022 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

numb3r3 Jun 10, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Jun 10, 2022 •

edited

Loading

numb3r3 Jun 10, 2022 •

edited

Loading