Release BentoML - v1.1.0 · bentoml/BentoML

🍱 We're thrilled to announce the release of BentoML v1.1.0, our first minor version update since the milestone v1.0.

Backward Compatibility: Rest assured that this release maintains full API backward compatibility with v1.0.
Official gRPC Support: We've transitioned gRPC support in BentoML from experimental to official status, expanding your toolkit for high-performance, low-latency services.
Ray Integration: Ray is a popular open-source compute framework that makes it easy to scale Python workloads. BentoML integrates natively with Ray Serve to enable users to deploy Bento applications in a Ray cluster without modifying code or configuration.
Enhanced Hugging Face Transformers and Diffusers Support: All Hugging Face Diffuser models and pipelines can be seamlessly imported and integrated into BentoML applications through the Transformers and Diffusers framework libraries.
Enhanced Model Version Management: Enjoy greater flexibility with the improved model version management, enabling flexible configuration and synchronization of model versions with your remote model store.

🦾 We are also excited to announce the launch of OpenLLM v0.2.0 featuring the support of Llama 2 models.

GPU and CPU Support: Running Llama is support on both GPU and CPU.

Model variations and parameter sizes: Support all model weights and parameter sizes on Hugging Face.

meta-llama/llama-2-70b-chat-hf
meta-llama/llama-2-13b-chat-hf
meta-llama/llama-2-7b-chat-hf
meta-llama/llama-2-70b-hf
meta-llama/llama-2-13b-hf
meta-llama/llama-2-7b-hf
openlm-research/open_llama_7b_v2
openlm-research/open_llama_3b_v2
openlm-research/open_llama_13b
huggyllama/llama-65b
huggyllama/llama-30b
huggyllama/llama-13b
huggyllama/llama-7b

Users can use any weights on HuggingFace (e.g. TheBloke/Llama-2-13B-chat-GPTQ), custom weights from local path (e.g. /path/to/llama-1), or fine-tuned weights as long as it adheres to LlamaModelForCausalLM.

Stay tuned for Fine-tuning capabilities in OpenLLM: Fine-tuning various Llama 2 models will be added in a future release. Try the experimental script for fine-tuning Llama-2 with QLoRA under OpenLLM playground.
```
python -m openllm.playground.llama2_qlora --help
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BentoML - v1.1.0