Skip to content

BentoML - v1.1.0

Compare
Choose a tag to compare
@ssheng ssheng released this 24 Jul 20:34
· 811 commits to main since this release
2ab6de7

🍱 We're thrilled to announce the release of BentoML v1.1.0, our first minor version update since the milestone v1.0.

  • Backward Compatibility: Rest assured that this release maintains full API backward compatibility with v1.0.
  • Official gRPC Support: We've transitioned gRPC support in BentoML from experimental to official status, expanding your toolkit for high-performance, low-latency services.
  • Ray Integration: Ray is a popular open-source compute framework that makes it easy to scale Python workloads. BentoML integrates natively with Ray Serve to enable users to deploy Bento applications in a Ray cluster without modifying code or configuration.
  • Enhanced Hugging Face Transformers and Diffusers Support: All Hugging Face Diffuser models and pipelines can be seamlessly imported and integrated into BentoML applications through the Transformers and Diffusers framework libraries.
  • Enhanced Model Version Management: Enjoy greater flexibility with the improved model version management, enabling flexible configuration and synchronization of model versions with your remote model store.

🦾 We are also excited to announce the launch of OpenLLM v0.2.0 featuring the support of Llama 2 models.

image

  • GPU and CPU Support: Running Llama is support on both GPU and CPU.

  • Model variations and parameter sizes: Support all model weights and parameter sizes on Hugging Face.

    meta-llama/llama-2-70b-chat-hf
    meta-llama/llama-2-13b-chat-hf
    meta-llama/llama-2-7b-chat-hf
    meta-llama/llama-2-70b-hf
    meta-llama/llama-2-13b-hf
    meta-llama/llama-2-7b-hf
    openlm-research/open_llama_7b_v2
    openlm-research/open_llama_3b_v2
    openlm-research/open_llama_13b
    huggyllama/llama-65b
    huggyllama/llama-30b
    huggyllama/llama-13b
    huggyllama/llama-7b

    Users can use any weights on HuggingFace (e.g. TheBloke/Llama-2-13B-chat-GPTQ), custom weights from local path (e.g. /path/to/llama-1), or fine-tuned weights as long as it adheres to LlamaModelForCausalLM.

  • Stay tuned for Fine-tuning capabilities in OpenLLM: Fine-tuning various Llama 2 models will be added in a future release. Try the experimental script for fine-tuning Llama-2 with QLoRA under OpenLLM playground.

    python -m openllm.playground.llama2_qlora --help