BentoML is an open-source platform for high-performance ML model serving.
What does BentoML do?
- Turn trained ML model into production API endpoint with a few lines of code
- Support all major machine learning training frameworks
- End-to-end model serving solution with DevOps best practices baked-in
- Micro-batching support, bringing the advantage of batch processing to online serving
- Model management for teams, providing CLI access and Web UI dashboard
- Flexible model deployment orchestration supporting Docker, Kubernetes, AWS Lambda, SageMaker, Azure ML and more
👉 Join BentoML Slack to follow the latest development updates and roadmap discussions.
Getting Machine Learning models into production is hard. Data Scientists are not experts in building production services and DevOps best practices. The trained models produced by a Data Science team are hard to test and hard to deploy. This often leads us to a time consuming and error-prone workflow, where a pickled model or weights file is handed over to a software engineering team.
BentoML is an end-to-end solution for model serving, making it possible for Data Science teams to build production-ready model serving endpoints, with common DevOps best practices and performance optimizations baked in.
Check out Frequently Asked Questions page on how does BentoML compares to Tensorflow-serving, Clipper, AWS SageMaker, MLFlow, etc.
Before starting, make sure Python version is 3.6 or above , and install BentoML with
pip
:
pip install bentoml
A minimal prediction service in BentoML looks something like this:
# https://github.com/bentoml/BentoML/blob/master/guides/quick-start/iris_classifier.py
from bentoml import env, artifacts, api, BentoService
from bentoml.handlers import DataframeHandler
from bentoml.artifact import SklearnModelArtifact
@env(auto_pip_dependencies=True)
@artifacts([SklearnModelArtifact('model')])
class IrisClassifier(BentoService):
@api(DataframeHandler)
def predict(self, df):
# Optional pre-processing, post-processing code goes here
return self.artifacts.model.predict(df)
This code defines a prediction service that bundles a scikit-learn model and provides an
API. The API here is the entry point for accessing this prediction service, and an API
with DataframeHandler
will convert HTTP JSON request into pandas.DataFrame
object
before passing it to the user-defined API function for inferencing.
The following code trains a scikit-learn model and bundles the trained model with an
IrisClassifier
instance. The IrisClassifier
instance is then saved to disk in the
BentoML SavedBundle format, which is a versioned file archive that is ready for
production models serving deployment.
# https://github.com/bentoml/BentoML/blob/master/guides/quick-start/main.py
from sklearn import svm
from sklearn import datasets
from iris_classifier import IrisClassifier
if __name__ == "__main__":
# Load training data
iris = datasets.load_iris()
X, y = iris.data, iris.target
# Model Training
clf = svm.SVC(gamma='scale')
clf.fit(X, y)
# Create a iris classifier service instance
iris_classifier_service = IrisClassifier()
# Pack the newly trained model artifact
iris_classifier_service.pack('model', clf)
# Save the prediction service to disk for model serving
saved_path = iris_classifier_service.save()
By default, BentoML stores SavedBundle files under the ~/bentoml
directory. Users
can also customize BentoML to use a different directory or cloud storage like
AWS S3. BentoML also comes with a model management
component YataiService,
which provides advanced model management features including a dashboard web UI:
To start a REST API server with the saved IrisClassifier
service, use bentoml serve
command:
bentoml serve IrisClassifier:latest
The IrisClassifier
model is now served at localhost:5000
. Use curl
command to send
a prediction request:
curl -i \
--header "Content-Type: application/json" \
--request POST \
--data '[[5.1, 3.5, 1.4, 0.2]]' \
http://localhost:5000/predict
The BentoML API server also provides a web UI for accessing predictions and debugging the server. Visit http://localhost:5000 in the browser and use the Web UI to send prediction request:
BentoML provides a convenient way to containerize the model API server with Docker:
-
Find the SavedBundle directory with
bentoml get
command -
Run
docker build
with the SavedBundle directory which contains a generated Dockerfile -
Run the generated docker image to start a docker container serving the model
# If jq command not found, install jq (the command-line JSON processor) here: https://stedolan.github.io/jq/download/
saved_path=$(bentoml get IrisClassifier:latest -q | jq -r ".uri.uri")
docker build -t {docker_username}/iris-classifier $saved_path
docker run -p 5000:5000 -e BENTOML_ENABLE_MICROBATCH=True {docker_username}/iris-classifier
This made it possible to deploy BentoML bundled ML models with platforms such as Kubeflow, Knative, Kubernetes, which provides advanced model deployment features such as auto-scaling, A/B testing, scale-to-zero, canary rollout and multi-armed bandit.
BentoML can also deploy SavedBundle directly to cloud services such as AWS Lambda or AWS SageMaker, with the bentoml CLI command:
$ bentoml get IrisClassifier
BENTO_SERVICE CREATED_AT APIS ARTIFACTS
IrisClassifier:20200121114004_360ECB 2020-01-21 19:40 predict<DataframeHandler> model<SklearnModelArtifact>
IrisClassifier:20200120082658_4169CF 2020-01-20 16:27 predict<DataframeHandler> clf<PickleArtifact>
...
$ bentoml lambda deploy test-deploy -b IrisClassifier:20200121114004_360ECB
...
$ bentoml deployment list
NAME NAMESPACE PLATFORM BENTO_SERVICE STATUS AGE
test-deploy dev aws-lambda IrisClassifier:20200121114004_360ECB running 2 days and 11 hours
...
Check out the deployment guides and other deployment options with BentoML here.
BentoML full documentation: https://docs.bentoml.org/
- Quick Start Guide: https://docs.bentoml.org/en/latest/quickstart.html
- Core Concepts: https://docs.bentoml.org/en/latest/concepts.html
- Deployment Guides: https://docs.bentoml.org/en/latest/deployment/index.html
- API References: https://docs.bentoml.org/en/latest/api/index.html
- Frequently Asked Questions: https://docs.bentoml.org/en/latest/faq.html
Visit bentoml/gallery repository for more examples and tutorials.
- Pet Image Classification - Google Colab | nbviewer | source
- Salary Range Prediction - Google Colab | nbviewer | source
- Sentiment Analysis - Google Colab | nbviewer | source
- Fashion MNIST - Google Colab | nbviewer | source
- CIFAR-10 Image Classification - Google Colab | nbviewer | source
- Fashion MNIST - Google Colab | nbviewer | source
- Text Classification - Google Colab | nbviewer | source
- Toxic Comment Classifier - Google Colab | nbviewer | source
- tf.Function model - Google Colab | nbviewer | source
- Fashion MNIST - Google Colab | nbviewer | source
- Movie Review Sentiment with BERT - Google Colab | nbviewer | source
- Titanic Survival Prediction - Google Colab | nbviewer | source
- League of Legend win Prediction - Google Colab | nbviewer | source
- Titanic Survival Prediction - Google Colab | nbviewer | source
- Loan Default Prediction - Google Colab | nbviewer | source
- Prostate Cancer Prediction - Google Colab | nbviewer | source
- Text Classification - Google Colab | nbviewer | source
-
End-to-end deployment management with BentoML
-
Deployment guides for open-source platforms:
-
Deployment guides for Cloud service providers:
Have questions or feedback? Post a new github issue or discuss in our Slack channel:
Want to help build BentoML? Check out our contributing guide and the development guide.
BentoML is under active development and is evolving rapidly. Currently it is a Beta release, we may change APIs in future releases.
Read more about the latest features and changes in BentoML from the releases page.
BentoML by default collects anonymous usage data using Amplitude. It only collects BentoML library's own actions and parameters, no user or model data will be collected. Here is the code that does it.
This helps BentoML team to understand how the community is using this tool and what to build next. You can easily opt-out of usage tracking by running the following command:
# From terminal:
bentoml config set usage_tracking=false
# From python:
import bentoml
bentoml.config().set('core', 'usage_tracking', 'False')