Releases: bentoml/BentoML
BentoML - v1.1.0
🍱 We're thrilled to announce the release of BentoML v1.1.0, our first minor version update since the milestone v1.0.
- Backward Compatibility: Rest assured that this release maintains full API backward compatibility with v1.0.
- Official gRPC Support: We've transitioned gRPC support in BentoML from experimental to official status, expanding your toolkit for high-performance, low-latency services.
- Ray Integration: Ray is a popular open-source compute framework that makes it easy to scale Python workloads. BentoML integrates natively with Ray Serve to enable users to deploy Bento applications in a Ray cluster without modifying code or configuration.
- Enhanced Hugging Face Transformers and Diffusers Support: All Hugging Face Diffuser models and pipelines can be seamlessly imported and integrated into BentoML applications through the Transformers and Diffusers framework libraries.
- Enhanced Model Version Management: Enjoy greater flexibility with the improved model version management, enabling flexible configuration and synchronization of model versions with your remote model store.
🦾 We are also excited to announce the launch of OpenLLM v0.2.0 featuring the support of Llama 2 models.
-
GPU and CPU Support: Running Llama is support on both GPU and CPU.
-
Model variations and parameter sizes: Support all model weights and parameter sizes on Hugging Face.
meta-llama/llama-2-70b-chat-hf meta-llama/llama-2-13b-chat-hf meta-llama/llama-2-7b-chat-hf meta-llama/llama-2-70b-hf meta-llama/llama-2-13b-hf meta-llama/llama-2-7b-hf openlm-research/open_llama_7b_v2 openlm-research/open_llama_3b_v2 openlm-research/open_llama_13b huggyllama/llama-65b huggyllama/llama-30b huggyllama/llama-13b huggyllama/llama-7b
Users can use any weights on HuggingFace (e.g.
TheBloke/Llama-2-13B-chat-GPTQ
), custom weights from local path (e.g./path/to/llama-1
), or fine-tuned weights as long as it adheres to LlamaModelForCausalLM. -
Stay tuned for Fine-tuning capabilities in OpenLLM: Fine-tuning various Llama 2 models will be added in a future release. Try the experimental script for fine-tuning Llama-2 with QLoRA under OpenLLM playground.
python -m openllm.playground.llama2_qlora --help
BentoML - v1.0.22
🍱 BentoML v1.0.22
release has brought a list of well-anticipated updates.
-
Added support for Pydantic 2 for better validate performance.
-
Added support for CUDA 12 versions in builds and containerization.
-
Introduced service lifecycle events allowing adding custom logic
on_deployment
,on_startup
, andon_shutdown
. States can be managed using the contextctx
variable during theon_startup
andon_shutdown
events and during request serving in the API.@svc.on_deployment def on_deployment(): pass @svc.on_startup def on_startup(ctx: bentoml.Context): ctx.state["object_key"] = create_object() @svc.on_shutdown def on_shutdown(ctx: bentoml.Context): cleanup_state(ctx.state["object_key"]) @svc.api def predict(input_data, ctx): object = ctx.state["object_key"] pass
-
Added support for traffic control for both API Server and Runners. Timeout and maximum concurrency can now be configured through configuration.
api_server: traffic: timeout: 10 # API Server request timeout in seconds max_concurrency: 32 # Maximum concurrency requests in the API Server runners: iris: traffic: timeout: 10 # Runner request timeout in seconds max_concurrency: 32 # Maximum concurrency requests in the Runner
-
Improved performance of
bentoml push
performance for large Bentos.
🚀 One more thing, the team is delighted to unveil our latest endeavor, OpenLLM. This innovative project allows you to effortless build with the state-of-the-art open source or fine-tuned Large Language Models.
-
Supports all variants of Flan-T5, Dolly V2, StarCoder, Falcon, StableLM, and ChatGLM out-of-box. Fully customizable with model specific arguments.
openllm start [falcon | flan_t5 | dolly_v2 | chatglm | stablelm | starcoder]
-
Exposes the familiar BentoML APIs and transforms LLMs seamlessly into Runners.
llm_runner = openllm.Runner("dolly-v2")
-
Builds LLM application into the Bento format that can be deployed to BentoCloud or containerized into OCI images.
openllm build [falcon | flan_t5 | dolly_v2 | chatglm | stablelm | starcoder]
Our dedicated team is working hard to pioneering more integrations of advanced models for our upcoming releases of OpenLLM. Stay tuned for the unfolding developments.
BentoML - v1.0.20
🍱 BentoML v1.0.20
is released with improved usability and compatibility features.
-
Production Mode by Default:
bentoml serve
command will now run with the--production
option by default. The change is made the simulate the production behavior during development. The--reload
option will continue to with as expected. To achieve the serving behavior previously, use--development
instead. -
Optional Dependency for OpenTelemetry Exporter: The
opentelemetry-exporter-otlp-proto-http
dependency has been moved from a required dependency to an optional one to address aprotobuf
dependency incompatibility issue.⚠️ If you are currently using the Model Monitoring and Inference Data Collection feature, you must install the package with themonitor-otlp
****option from this release onwards to include the necessary dependency.pip install "bentoml[monitor-otlp]"
-
OpenTelemetry Trace ID Configuration Option: A new configuration option has been added to return the OpenTelemetry Trace ID in the response. This feature is particularly helpful when tracing has not been initialized in the upstream caller, but the caller still wishes to log the Trace ID in case of an error.
api_server: http: response: trace_id: True
-
Start from a Service: Added the ability to start a server from a
bentoml.Service
object. This is helpful for troubleshooting a project in a development environment where no Bentos has been built yet.import bentoml # import the Service defined in `/clip_api_service/service.py` file from clip_api_service.service import svc if __name__ == "__main__": # start a server: server = bentoml.HTTPServer(svc) server.start(blocking=False) client = server.get_client() client.predict(..)
What's Changed
- fix(dispatcher): handling empty o_stat in
trigger_refresh
by @larme in #3796 - fix(framework): adjust diffusers device_map default behavior by @larme in #3779
- chore(dispatcher): cancel jobs with a for loop by @sauyon in #3788
- fix: correctly reraise
CancelledError
by @sauyon in #3801 - use path as resource for non-OS paths by @sauyon in #3800
- chore(deps): bump coverage[toml] from 7.2.3 to 7.2.4 by @dependabot in #3803
- feat: embedded runner by @larme in #3735
- feat(tensorflow): support list types inputs by @enmanuelmag in #3807
- chore(deps): bump ruff from 0.0.263 to 0.0.264 by @dependabot in #3817
- feat: subprocess build by @aarnphm in #3814
- docs: update community slack links by @parano in #3824
- chore(deps): bump pyarrow from 11.0.0 to 12.0.0 by @dependabot in #3820
- chore(deps): remove imageio by @aarnphm in #3812
- chore(deps): bump tritonclient[all] from 2.32.0 to 2.33.0 by @dependabot in #3795
- ci: add Pillow to tests dependencies by @aarnphm in #3830
- feat(observability): support
service.name
by @aarnphm in #3825 - feat: optional returning trace_id in response by @aarnphm in #3827
- chore: 3.11 support by @PeterJCLaw in #3792
- fix: Eliminate the exception during shutdown by @frostming in #3826
- chore: expose scheduling_strategy in to_runner by @bojiang in #3831
- feat: allow starting server with bentoml.Service instance by @parano in #3829
- chore(deps): bump bufbuild/buf-setup-action from 1.17.0 to 1.18.0 by @dependabot in #3838
- fix: make sure to set content-type for file type by @aarnphm in #3837
- docs: update default docs to use env as key:value instead of list type by @aarnphm in #3841
- deps: move exporter-proto to optional by @aarnphm in #3840
- feat(server): improve server APIs by @aarnphm in #3834
New Contributors
- @enmanuelmag made their first contribution in #3807
- @PeterJCLaw made their first contribution in #3792
Full Changelog: v1.0.19...v1.0.20
BentoML - v1.0.19
🍱 BentoML v1.0.19
is released with enhanced GPU utilization and expanded ML framework support.
-
Optimized GPU resource utilization: Enabled scheduling of multiple instances of the same runner using the
workers_per_resource
scheduling strategy configuration. The following configuration allows scheduling 2 instances of the “iris” runner per GPU instance.workers_per_resource
is 1 by default.runners: iris: resources: nvidia.com/gpu: 1 workers_per_resource: 2
-
New ML framework support: We've added support for EasyOCR and Detectron2 to our growing list of supported ML frameworks.
-
Enhanced runner communication: Implemented PEP 574 out-of-band pickling to improve runner communication by eliminating memory copying, resulting in better performance and efficiency.
-
Backward compatibility for Hugging Face Transformers: Resolved compatibility issues with Hugging Face Transformers versions prior to
v4.18
, ensuring a seamless experience for users with older versions.
⚙️ With the release of Kubeflow 1.7, BentoML now has native integration with Kubeflow, allowing developers to leverage BentoML's cloud-native components. Prior, developers were limited to exporting and deploying Bento
as a single container. With this integration, models trained in Kubeflow can easily be packaged, containerized, and deployed to a Kubernetes cluster as microservices. This architecture enables the individual models to run in their own pods, utilizing the most optimal hardware for their respective tasks and enabling independent scaling.
💡 With each release, we consistently update our blog, documentation and examples to empower the community in harnessing the full potential of BentoML.
- Learn more scheduling strategy to get better resource utilization.
- Learn more about model monitoring and drift detection in BentoML and integration with various monitoring framework.
- Learn more about using Nvidia Triton Inference Server as a runner to improve your application’s performance and throughput.
What's Changed
- fix(env): using
python -m
to run pip commands by @frostming in #3762 - chore(deps): bump pytest from 7.3.0 to 7.3.1 by @dependabot in #3766
- feat: lazy load
bentoml.server
by @aarnphm in #3763 - fix(client): service route prefix by @aarnphm in #3765
- chore: add test with many requests by @sauyon in #3768
- fix: using http config for grpc server by @aarnphm in #3771
- feat: apply pep574 out-of-band pickling to DefaultContainer by @larme in #3736
- fix: passing serve_cmd and passthrough kwargs by @aarnphm in #3764
- feat: Detectron by @aarnphm in #3711
- chore(dispatcher): (re-)factor out training code by @sauyon in #3767
- feat: EasyOCR by @aarnphm in #3712
- feat(build): support 3.11 by @aarnphm in #3774
- patch: backports module availability for transformers<4.18 by @aarnphm in #3775
- fix(dispatcher): set wait to 0 while training by @sauyon in #3664
- chore(deps): bump ruff from 0.0.261 to 0.0.262 by @dependabot in #3778
- feat: add
model#load_model
method by @parano in #3780 - feat: Allow spawning more than 1 worker on each resource by @frostming in #3776
- docs: Fix TensorFlow
save_model
parameter order by @ssheng in #3781 - chore(deps): bump yamllint from 1.30.0 to 1.31.0 by @dependabot in #3782
- chore(deps): bump imageio from 2.27.0 to 2.28.0 by @dependabot in #3783
- chore(deps): bump ruff from 0.0.262 to 0.0.263 by @dependabot in #3790
- fix: allow import service defined under a Python package by @parano in #3794
New Contributors
- @frostming made their first contribution in #3762
Full Changelog: v1.0.18...v1.0.19
BentoML - v1.0.18
🍱 BentoML v1.0.18
brings a new way of creating the server and client natively from Python.
-
Start an HTTP or gRPC server and client asynchronously with a context manager.
server = HTTPServer("iris_classifier:latest", production=True, port=3000) # Start the server in a separate process and connect to it using a client with server.start() as client: res = client.classify(np.array([[4.9, 3.0, 1.4, 0.2]]))
-
Start an HTTP or gRPC server synchronously.
server = HTTPServer("iris_classifier:latest", production=True, port=3000) server.start(blocking=True)
-
As always, a client can be created and connected to an running server.
client = Client.from_url("http://localhost:3000") res = client.classify(np.array([[4.9, 3.0, 1.4, 0.2]]))
What's Changed
- chore(deps): bump coverage[toml] from 7.2.2 to 7.2.3 by @dependabot in #3746
- bugs: Fix an f-string bug in Tranformers framework. by @ssheng in #3753
- chore(deps): bump pytest from 7.2.2 to 7.3.0 by @dependabot in #3751
- chore(deps): bump bufbuild/buf-setup-action from 1.16.0 to 1.17.0 by @dependabot in #3750
- fix: BufferError when pushing model to BentoCloud by @aarnphm in #3737
- chore: remove codecov dependencies by @aarnphm in #3754
- feat: implement new serve API by @sauyon in #3696
- examples: Add a client example to quickstart by @ssheng in #3752
Full Changelog: v1.0.17...v1.0.18
BentoML - v1.0.17
🍱 We are excited to announce the release of BentoML v1.0.17, which includes support for 🤗 Hugging Face Transformers pre-trained instances. Prior to this release, only pipelines could be saved and loaded using the bentoml.transformers
APIs. However, based on the community's demand to work with pre-trained models, tokenizers, preprocessors, etc., without pipelines, we have expanded our capabilities in bentoml.transformers
APIs. With this release, all pre-trained instances can be saved and loaded into either built-in Transformers framework runners or custom runners. This update opens up new possibilities for users to work with pre-trained models, and we are thrilled to see what the community will create using this feature. To learn more, visit BentoML Transformers framework documentation.
-
Pre-trained models and instances, such as tokenizers, preprocessors, and feature extractors, can also be saved as standalone models using the
bentoml.transformers.save_model
API.import bentoml from transformers import AutoTokenizer processor = SpeechT5Processor.from_pretrained("microsoft/speecht5_tts") model = SpeechT5ForTextToSpeech.from_pretrained("microsoft/speecht5_tts") vocoder = SpeechT5HifiGan.from_pretrained("microsoft/speecht5_hifigan") bentoml.transformers.save_model("speecht5_tts_processor", processor) bentoml.transformers.save_model("speecht5_tts_model", model, signatures={"generate_speech": {"batchable": False}}) bentoml.transformers.save_model("speecht5_tts_vocoder", vocoder)
-
Pre-trained models and instances can be run either independently as Transformers framework runners or jointly in a custom runner. To use pre-trained models and instances as individual framework runners, simply get the models reference and convert them to runners using the
to_runner
method.import bentoml import torch from bentoml.io import Text, NumpyNdarray from datasets import load_dataset proccessor_runner = bentoml.transformers.get("speecht5_tts_processor").to_runner() model_runner = bentoml.transformers.get("speecht5_tts_model").to_runner() vocoder_runner = bentoml.transformers.get("speecht5_tts_vocoder").to_runner() embeddings_dataset = load_dataset("Matthijs/cmu-arctic-xvectors", split="validation") speaker_embeddings = torch.tensor(embeddings_dataset[7306]["xvector"]).unsqueeze(0) svc = bentoml.Service("text2speech", runners=[proccessor_runner, model_runner, vocoder_runner]) @svc.api(input=Text(), output=NumpyNdarray()) def generate_speech(inp: str): inputs = proccessor_runner.run(text=inp, return_tensors="pt") speech = model_runner.generate_speech.run(input_ids=inputs["input_ids"], speaker_embeddings=speaker_embeddings, vocoder=vocoder_runner.run) return speech.numpy()
-
To use the pre-trained models and instances together in a custom runner, use the
bentoml.transformers.get
API to get the models references and load them in a custom runner. The pretrained instances can then be used for inference in the custom runner.import bentoml import torch from datasets import load_dataset processor_ref = bentoml.models.get("speecht5_tts_processor:latest") model_ref = bentoml.models.get("speecht5_tts_model:latest") vocoder_ref = bentoml.models.get("speecht5_tts_vocoder:latest") class SpeechT5Runnable(bentoml.Runnable): def __init__(self): self.processor = bentoml.transformers.load_model(processor_ref) self.model = bentoml.transformers.load_model(model_ref) self.vocoder = bentoml.transformers.load_model(vocoder_ref) self.embeddings_dataset = load_dataset("Matthijs/cmu-arctic-xvectors", split="validation") self.speaker_embeddings = torch.tensor(self.embeddings_dataset[7306]["xvector"]).unsqueeze(0) @bentoml.Runnable.method(batchable=False) def generate_speech(self, inp: str): inputs = self.processor(text=inp, return_tensors="pt") speech = self.model.generate_speech(inputs["input_ids"], self.speaker_embeddings, vocoder=self.vocoder) return speech.numpy() text2speech_runner = bentoml.Runner(SpeechT5Runnable, name="speecht5_runner", models=[processor_ref, model_ref, vocoder_ref]) svc = bentoml.Service("talk_gpt", runners=[text2speech_runner]) @svc.api(input=bentoml.io.Text(), output=bentoml.io.NumpyNdarray()) async def generate_speech(inp: str): return await text2speech_runner.generate_speech.async_run(inp)
What's Changed
- feat(containerize): caching pip/conda installation layers by @smidm in #3673
- docs(batching): update docs to 503 by @sauyon in #3677
- chore(deps): bump ruff from 0.0.255 to 0.0.256 by @dependabot in #3676
- fix(type): annotate PdSeries with pandas-stubs by @aarnphm in #3466
- chore(dispatcher): refactor out training code by @sauyon in #3663
- fix: makes containerize for triton examples to all amd64 by @aarnphm in #3678
- chore(deps): bump coverage[toml] from 7.2.1 to 7.2.2 by @dependabot in #3679
- revert: "chore(dispatcher): refactor out training code (#3663)" by @sauyon in #3680
- doc: add more links to Bentoml/examples by @larme in #3631
- perf: serialization optimization by @larme in #3606
- examples: Kubeflow by @ssheng in #3656
- chore(deps): bump pytest-asyncio from 0.20.3 to 0.21.0 by @dependabot in #3688
- chore(deps): bump ruff from 0.0.256 to 0.0.257 by @dependabot in #3689
- chore(deps): bump imageio from 2.26.0 to 2.26.1 by @dependabot in #3690
- chore(deps): bump yamllint from 1.29.0 to 1.30.0 by @dependabot in #3694
- fix: remove duplicate dependabot check for pip by @aarnphm in #3691
- chore(deps): bump ruff from 0.0.257 to 0.0.258 by @dependabot in #3699
- docs: Update the Kubeflow example by @ssheng in #3703
- chore(deps): bump ruff from 0.0.258 to 0.0.259 by @dependabot in #3709
- docs: add link to pyfilesystem plugins by @sauyon in #3716
- docs: Kubeflow integration documentation by @ssheng in #3704
- docs: replace load_runner() to get().to_runner() by @KimSoungRyoul in #3715
- chore(deps): bump imageio from 2.26.1 to 2.27.0 by @dependabot in #3720
- fix(readme): format markdown table by @aarnphm in #3722
- fix: copy files before running
setup_script
by @aarnphm in #3713 - chore: remove experimental warning for
bentoml.metrics
by @aarnphm in #3725 - ci: temporary disable coverage by @aarnphm in #3726
- chore(deps): bump ruff from 0.0.259 to 0.0.260 by @dependabot in #3734
- chore(deps): bump tritonclient[all] from 2.31.0 to 2.32.0 by @dependabot in #3730
- fix(type):
bentoml.container.build
should accept multipleimage_tag
by @pmayd in #3719 - chore(deps): bump bufbuild/buf-setup-action from 1.15.1 to 1.16.0 by @dependabot in #3738
- feat: add query params to request context by @sauyon in #3717
- chore(dispatcher): use attr class instead of a tuple by @sauyon in #3731
- fix: Make it so the configured max_batch_size is respected when batching inference requests together by @RShang97 in #3741
- feat(transformers): pretrained protocol support by @aarnphm in #3684
- fix(tests): broken CI by @aarnphm in #3742
- chore(deps): bump ruff from 0.0.260 to 0.0.261 by @dependabot in #3744
- docs: Transformers documentation on pre-trained instances support by @ssheng in #3745
New Contributors
- @smidm made their first contribution in #3673
- @pmayd made their first contribution in #3719
- @RShang97 made their first contribution in #3741
Full Changelog: v1.0.16...v1.0.17
BentoML - v1.0.16
🍱 BentoML v1.0.16
release is here featuring the introduction of the bentoml.triton
framework. With this integration, BentoML now supports running NVIDIA Triton Inference Server as a Runner. See Triton Inference Server documentation to learn more!
-
Triton Inference Server can be configured as a Runner in BentoML with its model repository and CLI arguments specified as parameters.
import bentoml triton_runner = bentoml.triton.Runner( "triton_runner", model_repository="s3://bucket/path/to/model_repository", cli_args=["--load-model=torchscrip_yolov5s", "--model-control-mode=explicit"], )
-
Models served by the Triton Inference Server Runner can be called as a method on the runner handle both synchronously and asynchronously.
@svc.api( input=bentoml.io.Image.from_sample("./data/0.png"), output=bentoml.io.NumpyNdarray() ) async def bentoml_torchscript_mnist_infer(im: Image) -> NDArray[t.Any]: arr = np.array(im) / 255.0 arr = np.expand_dims(arr, (0, 1)).astype("float32") InferResult = await triton_runner.torchscript_mnist.async_run(arr) return InferResult.as_numpy("OUTPUT__0")
-
Build bentos and containerize images with Triton Runners by specifying
nvcr.io/nvidia/tritonserver
base image inbentofile.yaml
.service: service:svc include: - /model_repository - /data/*.png - /*.py exclude: - /__pycache__ - /venv - /train.py - /build_bento.py - /containerize_bento.py python: packages: - bentoml[triton] docker: base_image: nvcr.io/nvidia/tritonserver:22.12-py3
💡 If you are an existing Triton user, the integration provides simpler ways to add custom logics in Python, deploy distributed multi-model inference graph, unify model management across different ML frameworks and workflows, and standardize model packaging format with versioning and collaboration features. If you are an existing BentoML user, the integration improves the runner efficiency and throughput under high load thanks to Triton’s efficient C++ runtime.
What's Changed
- fix(container): podman virtual machine healthcheck (#3575) by @timc in #3576
- chore(aiohttp): remove deprecated verify_ssl to ssl by @aarnphm in #3574
- feat(triton): support HTTP client by @aarnphm in #3502
- fix(grpc): handle backward protocol version by @aarnphm in #3332
- chore(deps): bump ruff from 0.0.246 to 0.0.247 by @dependabot in #3579
- chore(test): using container API for testing by @aarnphm in #3582
- fix(serve-cli): Make sure to use BENTOML_CONFIG value by @aarnphm in #3597
- docs: Update documentation with an examples link by @ssheng in #3599
- chore: lock starlette version by @sauyon in #3600
- feature(diffusers): support
enable_attention_slicing
by @larme in #3598 - chore(cli): figlet to show on CLI only by @aarnphm in #3603
- chore(cli): using default background as color by @aarnphm in #3608
- feat: Flax by @aarnphm in #3123
- feat(gRPC): client implementation by @aarnphm in #3280
- fix: invalid option dtype=True for pd.read_csv by @parano in #3601
- chore(deps): bump coverage[toml] from 7.1.0 to 7.2.0 by @dependabot in #3616
- chore(deps): bump ruff from 0.0.247 to 0.0.252 by @dependabot in #3617
- docs: containerisation API by @aarnphm in #3518
- chore(deps): bump coverage[toml] from 7.2.0 to 7.2.1 by @dependabot in #3621
- chore(deps): bump imageio from 2.25.1 to 2.26.0 by @dependabot in #3620
- fix(docs): missing space bug causes table not to render by @aarnphm in #3622
- chore(deps): bump ruff from 0.0.252 to 0.0.253 by @dependabot in #3624
- feat: enable cork for non-batched workloads by @sauyon in #3602
- docs: Fix typo in concepts/service by @FelixSchuSi in #3627
- chore(deps): bump tritonclient[all] from 2.30.0 to 2.31.0 by @dependabot in #3628
- fix(docs): broken inline docstring by @aarnphm in #3538
- fix: use a semaphore to limit runner connections by @sauyon in #3607
- fix: make inference_api handle None type by @aarnphm in #3611
- fix: make sure not to override user set values for from_sample by @aarnphm in #3610
- docs: add exceptions API section by @aarnphm in #3609
- revert(pyproject): add back pytest plugins by @aarnphm in #3633
- fix(configuration): CORS docs,
allow_origins
andallow_headers
by @larme in #3643 - chore(deps): bump ruff from 0.0.253 to 0.0.254 by @dependabot in #3641
- chore(deps): bump pytest from 7.2.1 to 7.2.2 by @dependabot in #3642
- chore: http client healthcheck by @denyszhak in #3636
- docs: typo in configuration.rst by @davkime in #3644
- docs: correct links to configuration source code by @davkime in #3645
- example: add fraud detection and benchmark examples by @parano in #3647
- fix(containerize): remove autoconfig for buildctl by @aarnphm in #3484
- feat: name in bentofile.yaml by @aarnphm in #3604
- chore: ensure all labels are dict[str,str] by @aarnphm in #3605
- fix(triton): enable runtime options by @aarnphm in #3649
- docs: Triton Inference Server by @aarnphm in #3519
- example: Triton Inference Server by @aarnphm in #3471
- chore(deps): bump pytest from 7.2.1 to 7.2.2 in /requirements by @dependabot in #3639
- chore(deps): bump bufbuild/buf-setup-action from 1.14.0 to 1.15.0 by @dependabot in #3638
- fix: some missing logics for triton examples by @aarnphm in #3650
- fix: use async implementation by @characat0 in #3654
- feat: add ray deploy support by @parano in #3632
- chore(deps): bump pytest-xdist[psutil] from 3.2.0 to 3.2.1 by @dependabot in #3659
- chore(deps): bump bufbuild/buf-setup-action from 1.15.0 to 1.15.1 by @dependabot in #3655
- fix: update scheme logic using ssl.enabled by @aarnphm in #3660
- feat:
from_sample
docstring by @aarnphm in #3318 - fix(ci): locking starlette for container tests by @aarnphm in #3666
- chore: better exception for numpy by @sauyon in #3665
- feat: make file io descriptor allow any mime type by default by @sauyon in #3626
- fix(docs): broken link by @aarnphm in #3537
- chore(stubs): remove unused by @aarnphm in #3612
- docs: Update Triton documentation and examples by @ssheng in #3668
- chore(deps): bump ruff from 0.0.254 to 0.0.255 by @dependabot in #3671
- docs: Update integration docs by @ssheng in #3672
New Contributors
- @FelixSchuSi made their first contribution in #3627
- @denyszhak made their first contribution in #3636
- @davkime made their first contribution in #3644
Full Changelog: v1.0.15...v1.0.16
BentoML - v1.0.15
🍱 BentoML v1.0.15
release is here featuring the introduction of the bentoml.diffusers
framework.
-
Learn more about the capabilities of the
bentoml.diffusers
framework in the Creating Stable Diffusion 2.0 Service With BentoML And Diffusers blog and BentoML Diffusers example project. -
Import a diffusion model with the
bentoml.diffusers.import_model
API.import bentoml bentoml.diffusers.import_model( "sd2", "stabilityai/stable-diffusion-2", )
-
Create a
text2img
service using a Stable Diffusion 2.0 model runner with the familiarto_runner
API from thebentoml.diffuser
framework.import torch from diffusers import StableDiffusionPipeline import bentoml from bentoml.io import Image, JSON, Multipart bento_model = bentoml.diffusers.get("sd2:latest") stable_diffusion_runner = bento_model.to_runner() svc = bentoml.Service("stable_diffusion_v2", runners=[stable_diffusion_runner]) @svc.api(input=JSON(), output=Image()) def txt2img(input_data): images, _ = stable_diffusion_runner.run(**input_data) return images[0]
🍱 Fixed a incompatibility change introduced in starlette==0.25.0
result in the type MultiPartMessage
not being found in starlette.formparsers
.
ImportError: cannot import name 'MultiPartMessage' from 'starlette.formparsers' (/opt/miniconda3/envs/bentoml/lib/python3.10/site-packages/starlette/formparsers.py)
What's Changed
- chore(deps): bump pytest-xdist[psutil] from 3.1.0 to 3.2.0 by @dependabot in #3536
- fix: include dockerfile_template to Bento for containerize by @aarnphm in #3501
- chore: add missing logger and fix types by @aarnphm in #3453
- chore(rtd): disable epub and pdf as format by @aarnphm in #3544
- feat(torchscript): support
_extra_files
by @aarnphm in #3480 - refactor(ci): make sure to run types on py,pyi files by @aarnphm in #3545
- fix(server): deprecate client and cache get_client by @aarnphm in #3547
- chore(serve): update options for triton_options by @aarnphm in #3503
- tools(linter): Ruff by @aarnphm in #3539
- chore(deps): bump ruff from 0.0.243 to 0.0.244 by @dependabot in #3548
- chore(type): remove cattr type ignore by @aarnphm in #3550
- chore: bumping otlp deps to 1.15 by @aarnphm in #3351
- docs: Add an example index by @ssheng in #3551
- revert: "chore: bumping otlp deps to 1.15" by @bojiang in #3553
- chore(deps): bump bufbuild/buf-setup-action from 1.13.1 to 1.14.0 by @dependabot in #3554
- chore(deps): bump ruff from 0.0.244 to 0.0.246 by @dependabot in #3559
- chore(deps): bump imageio from 2.25.0 to 2.25.1 by @dependabot in #3557
- chore: update README.md by @timliubentoml in #3565
- feat(containerization): support 11.7 by @aarnphm in #3567
- chore: remove deprecation warning when building bentos by @CheeksTheGeek in #3566
- feature(framework): diffusers by @larme in #3534
- fix: update formparser for new starlette by @sauyon in #3569
New Contributors
- @CheeksTheGeek made their first contribution in #3566
Full Changelog: v1.0.14...v1.0.15
BentoML - v1.0.14
🍱 Fixed the backward incompatibility introduced in starlette
version 0.24.0
. Upgrade BentoML to v1.0.14
if you encounter the error related to content_type
like below.
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/bentoml/_internal/server/service_app.py", line 305, in api_func
input_data = await api.input.from_http_request(request)
File "/usr/local/lib/python3.8/dist-packages/bentoml/_internal/io_descriptors/multipart.py", line 208, in from_http_request
reqs = await populate_multipart_requests(request)
File "/usr/local/lib/python3.8/dist-packages/bentoml/_internal/utils/formparser.py", line 188, in populate_multipart_requests
form = await multipart_parser.parse()
File "/usr/local/lib/python3.8/dist-packages/bentoml/_internal/utils/formparser.py", line 158, in parse
multipart_file = UploadFile(
TypeError: __init__() got an unexpected keyword argument 'content_type'
BentoML - v1.0.13
🍱 BentoML v1.0.13
is released featuring a preview of batch inference with Spark.
-
Run the batch inference job using the
bentoml.batch.run_in_spark()
method. This method takes the API name, the Spark DataFrame containing the input data, and the Spark session itself as parameters, and it returns a DataFrame containing the results of the batch inference job.import bentoml # Import the bento from a repository or get the bento from the bento store bento = bentoml.import_bento("s3://bentoml/quickstart") # Run the run_in_spark function with the bento, API name, and Spark session results_df = bentoml.batch.run_in_spark(bento, "classify", df, spark)
-
Internally, what happens when you run
run_in_spark
is as follows:- First, the bento is distributed to the cluster. Note that if the bento has already been distributed, i.e. you have already run a computation with that bento, this step is skipped.
- Next, a process function is created, which starts a BentoML server on each of the Spark workers, then uses a client to process all the data. This is done so that the workers take advantage of the batch processing features of the BentoML server. PySpark pickles this process function and dispatches it, along with the relevant data, to the workers.
- Finally, the function is evaluated on the given dataframe. Once all methods that the user defined in the script have been executed, the data is returned to the master node.
bentoml.batch
API may undergo incompatible changes until general availability announced in a later minor version release.
🥂 Shout out to jeffthebear, KimSoungRyoul, Robert Fernandez, Marco Vela, Quan Nguyen, and y1450 from the community for their contributions in this release.
What's Changed
- docs: add inline notes and better exception by @bojiang in #3296
- chore(deps): bump pytest-asyncio from 0.20.2 to 0.20.3 by @dependabot in #3334
- feat: bentoserver client by @qu8n in #3321
- fix(transformers): check for task aliases by @jeffthebear in #3337
- chore(framework): add partial_kwargs to picklable and pytorch runners by @bojiang in #3338
- feat: protobuf shim by @aarnphm in #3333
- fix: CI breakage by @aarnphm in #3350
- chore(deps): bump black[jupyter] from 22.10.0 to 22.12.0 by @dependabot in #3354
- chore(deps): bump isort from 5.10.1 to 5.11.1 by @dependabot in #3355
- feat(http server): pass-through openapi of mounted apps by @bojiang in #3358
- fix(pytorch): runnable method collision by @bojiang in #3357
- fix(torchscript): runnable method collision by @bojiang in #3364
- chore(deps): bump isort from 5.11.1 to 5.11.2 by @dependabot in #3361
- chore(deps): bump isort from 5.11.2 to 5.11.3 in /requirements by @dependabot in #3374
- chore(deps): bump bufbuild/buf-setup-action from 1.9.0 to 1.10.0 by @dependabot in #3370
- chore(deps): bump coverage[toml] from 6.5.0 to 7.0.0 in /requirements by @dependabot in #3373
- chore(deps): bump pylint from 2.15.8 to 2.15.9 in /requirements by @dependabot in #3372
- chore(deps): bump imageio from 2.22.4 to 2.23.0 in /requirements by @dependabot in #3371
- fix: make sure to handle relative path for templates by @aarnphm in #3375
- fix(containerize): fs path format on windows by @bojiang in #3378
- chore(deps): bump isort from 5.11.3 to 5.11.4 by @dependabot in #3380
- docs: tracing and configuration by @aarnphm in #3067
- fix: use relative urls in swagger UI by @sauyon in #3381
- chore(deps): bump bufbuild/buf-setup-action from 1.10.0 to 1.11.0 by @dependabot in #3382
- chore(deps): bump coverage[toml] from 7.0.0 to 7.0.1 by @dependabot in #3383
- chore(config): ignore blank lines in bentoml config options by @bojiang in #3385
- chore(deps): bump coverage[toml] from 7.0.1 to 7.0.2 by @dependabot in #3386
- fix: log error when runnable instantiation fails by @sauyon in #3388
- chore(deps): bump coverage[toml] from 7.0.2 to 7.0.3 by @dependabot in #3390
- fix: don't use logger for CLI output by @sauyon in #3395
- fix: allow passing server URLs with paths by @sauyon in #3394
- fix(sdk): handling container platform from CLI separately by @aarnphm in #3366
- fix: wrong self annotations by @aarnphm in #3397
- chore(deps): bump imageio from 2.23.0 to 2.24.0 by @dependabot in #3410
- chore(deps): bump coverage[toml] from 7.0.3 to 7.0.4 by @dependabot in #3409
- chore(deps): bump pylint from 2.15.9 to 2.15.10 by @dependabot in #3407
- fix: serve missing logic from #3321 by @aarnphm in #3336
- chore(deps): bump coverage[toml] from 7.0.4 to 7.0.5 by @dependabot in #3413
- chore(deps): bump yamllint from 1.28.0 to 1.29.0 by @dependabot in #3414
- fix: regression f-string by @aarnphm in #3416
- fix(runner): log correct error types during model validation by @characat0 in #3421
- fix(client): make sure tags is available in specs by @KimSoungRyoul in #3359
- fix: handling KeyError when accessing IODescriptor spec by @aarnphm in #3398
- chore(deps): bump build[virtualenv] from 0.9.0 to 0.10.0 by @dependabot in #3419
- feat: support bentos and tags in bentoml.bentos.serve by @sauyon in #3424
- feat: add endpoints list to client by @sauyon in #3423
- fix: #3399 during
containerize
by @aarnphm in #3400 - feat: add context manager support for
bentoml.client
by @y1450 in #3402 - chore: migrate to newer API in docstring by @KimSoungRyoul in #3429
- chore(deps): bump bufbuild/buf-setup-action from 1.11.0 to 1.12.0 by @dependabot in #3430
- chore(deps): bump pytest from 7.2.0 to 7.2.1 by @dependabot in #3433
- feat: openapi_components method for Multipart by @RobbieFernandez in #3438
- ci: disable 3.10 e2e for gRPC on Mac X86 by @aarnphm in #3441
- chore(exportable): update exception message and errors imports by @aarnphm in #3435
- feat: make
load_bento
take Tag and Bento by @sauyon in #3444 - chore: add setuptools-scm as dev deps by @aarnphm in #3443
- fix: load_bento Tag import by @sauyon in #3445
- feat: support batch inference with Spark by @sauyon in #3425
- chore: add pandas-stubs as dev-dependencies by @aarnphm in #3442
- fix: raise more specific error in
from_spec
by @sauyon in #3447 - fix(cli): overriding memoized options via
--opt
by @aarnphm in #3401 - fix(exception): wrong variable reference by @aarnphm in #3450
- fix: make sure to run migration for envvar by @aarnphm in #3339
- feat: YataiClient context to communicate with multiple Yatai instances by @ssheng in #3448
New Contributors
- @characat0 made their first contribution in #3421
- @y1450 made their first contribution in #3402
- @RobbieFernandez made their first contribution in #3438
Full Changelog: v1.0.12...v1.0.13