Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Make it so the configured max_batch_size is respected when batching inference requests together #3741

Merged
merged 10 commits into from
Apr 4, 2023

Conversation

RShang97
Copy link
Contributor

@RShang97 RShang97 commented Apr 3, 2023

What does this PR address?

Address issue #3710 by narrowing types passed to functions so we can access the batch size if a client pre batches their input. Also raise an error if a single batch submitted by the client exceeds the max_batch_size. This lets us accurately batch requests such that the batch is less than the configured max_batch_size in the main dispatcher loop.

I didn't touch any of the logic that dealt with inferring the queue size.

Testing

in Bentoml/examples/quickstart created a configuration.yaml

# configuration.yaml
runners:
  batching:
    max_batch_size: 5

started the server with the following command

BENTOML_CONFIG=./configuration.yaml b serve --production

added custom logging to sklearn.py.

def add_runnable_method(method_name: str, options: ModelSignature):
        def _run(
            self: SklearnRunnable, input_data: ext.NpNDArray | ext.PdDataFrame
        ) -> ext.NpNDArray:
            print(f"\n\nReceived input size:{len(input_data)}\n\n")
            # TODO: set inner_max_num_threads and n_jobs param here base on strategy env vars
            with parallel_backend(backend="loky"):
                return getattr(self.model, method_name)(input_data)

Swarmed the server with the locust module included in the examples/quickstart directory

Saw that the print statements never returned anything greater than 5 and that exceptions were raised when batches that were large than max_batch_size were returned e.g.
output.txt

Fixes #3710

Before submitting:

@RShang97 RShang97 requested a review from a team as a code owner April 3, 2023 22:45
@RShang97 RShang97 requested review from parano and removed request for a team April 3, 2023 22:45
@RShang97
Copy link
Contributor Author

RShang97 commented Apr 3, 2023

cc @ssheng @sauyon @bojiang

I don't think I have permission to add specific reviewers so flagging here

@aarnphm aarnphm removed their request for review April 4, 2023 02:23
@RShang97 RShang97 changed the title fix: Make it so the configured max_batch_size is respecteed when batching inference requests together fix: Make it so the configured max_batch_size is respected when batching inference requests together Apr 4, 2023
@RShang97 RShang97 requested a review from sauyon April 4, 2023 20:25
sauyon
sauyon previously approved these changes Apr 4, 2023
Copy link
Contributor

@sauyon sauyon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just needs a format; I believe make format should do the trick!

Copy link
Contributor

@sauyon sauyon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 🎉

@sauyon sauyon merged commit 99c9b2d into bentoml:main Apr 4, 2023
@RShang97 RShang97 deleted the rich/adaptive-batching/3710 branch April 11, 2023 17:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bug: Adaptive batching exceeds max_batch_size configuration
2 participants