fix: Make it so the configured max_batch_size is respected when batching inference requests together #3741

RShang97 · 2023-04-03T22:45:47Z

What does this PR address?

Address issue #3710 by narrowing types passed to functions so we can access the batch size if a client pre batches their input. Also raise an error if a single batch submitted by the client exceeds the max_batch_size. This lets us accurately batch requests such that the batch is less than the configured max_batch_size in the main dispatcher loop.

I didn't touch any of the logic that dealt with inferring the queue size.

Testing

in Bentoml/examples/quickstart created a configuration.yaml

# configuration.yaml
runners:
  batching:
    max_batch_size: 5

started the server with the following command

BENTOML_CONFIG=./configuration.yaml b serve --production

added custom logging to sklearn.py.

def add_runnable_method(method_name: str, options: ModelSignature):
        def _run(
            self: SklearnRunnable, input_data: ext.NpNDArray | ext.PdDataFrame
        ) -> ext.NpNDArray:
            print(f"\n\nReceived input size:{len(input_data)}\n\n")
            # TODO: set inner_max_num_threads and n_jobs param here base on strategy env vars
            with parallel_backend(backend="loky"):
                return getattr(self.model, method_name)(input_data)

Swarmed the server with the locust module included in the examples/quickstart directory

Saw that the print statements never returned anything greater than 5 and that exceptions were raised when batches that were large than max_batch_size were returned e.g.
output.txt

Fixes #3710

Before submitting:

Does the Pull Request follow Conventional Commits specification naming? Here are GitHub's
guide on how to create a pull request.
Does the code follow BentoML's code style, both make format and make lint script have passed (instructions)?
Did you read through contribution guidelines and follow development guidelines?
Did your changes require updates to the documentation? Have you updated
those accordingly? Here are documentation guidelines and tips on writting docs.
Did you write tests to cover your changes?

…pect the max_batch_size while batching

RShang97 · 2023-04-03T22:47:39Z

cc @ssheng @sauyon @bojiang

I don't think I have permission to add specific reviewers so flagging here

src/bentoml/_internal/marshal/dispatcher.py

sauyon

Just needs a format; I believe make format should do the trick!

sauyon

LGTM! 🎉

RShang97 added 5 commits April 2, 2023 21:46

raise an error if an inbound call exceeds the maxs batch size and res…

742615d

…pect the max_batch_size while batching

update error message if batching doesnt work

c310b18

fix error message

f9a967f

remove batch sample logging

11a86c4

remove space

acc8a90

RShang97 requested a review from a team as a code owner April 3, 2023 22:45

RShang97 requested review from parano and removed request for a team April 3, 2023 22:45

Merge branch 'main' into rich/adaptive-batching/3710

9751dc4

aarnphm requested review from ssheng, sauyon and aarnphm April 3, 2023 23:04

add black formatting

ee75655

sauyon reviewed Apr 4, 2023

View reviewed changes

src/bentoml/_internal/marshal/dispatcher.py Outdated Show resolved Hide resolved

aarnphm removed their request for review April 4, 2023 02:23

fix isort

7c3cefa

RShang97 changed the title ~~fix: Make it so the configured max_batch_size is respecteed when batching inference requests together~~ fix: Make it so the configured max_batch_size is respected when batching inference requests together Apr 4, 2023

fix merge conflicts and incorporate Job object into queue

1b28aac

RShang97 requested a review from sauyon April 4, 2023 20:25

sauyon previously approved these changes Apr 4, 2023

View reviewed changes

make format

452a07c

RShang97 dismissed sauyon’s stale review via 452a07c April 4, 2023 21:06

RShang97 requested a review from sauyon April 4, 2023 21:07

sauyon approved these changes Apr 4, 2023

View reviewed changes

sauyon merged commit 99c9b2d into bentoml:main Apr 4, 2023

RShang97 deleted the rich/adaptive-batching/3710 branch April 11, 2023 17:56

ssheng mentioned this pull request May 17, 2023

bug: max_batch_size does not take into account input batch size #3859

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Make it so the configured max_batch_size is respected when batching inference requests together #3741

fix: Make it so the configured max_batch_size is respected when batching inference requests together #3741

RShang97 commented Apr 3, 2023

RShang97 commented Apr 3, 2023

sauyon left a comment

sauyon left a comment

fix: Make it so the configured max_batch_size is respected when batching inference requests together #3741

fix: Make it so the configured max_batch_size is respected when batching inference requests together #3741

Conversation

RShang97 commented Apr 3, 2023

What does this PR address?

Testing

Before submitting:

RShang97 commented Apr 3, 2023

sauyon left a comment

Choose a reason for hiding this comment

sauyon left a comment

Choose a reason for hiding this comment