To quickly get started using ModelMesh Serving, here is a brief guide.
- A Kubernetes cluster v 1.16+ with cluster administrative privileges
- kubectl and kustomize (v4.0.0+)
- At least 4 vCPU and 8 GB memory. For more details, please see here.
git clone [email protected]:kserve/modelmesh-serving.git
cd modelmesh-serving
kubectl create namespace modelmesh-serving
./scripts/install.sh --namespace modelmesh-serving --quickstart
This will install ModelMesh serving in the modelmesh-serving
namespace, along with an etcd and MinIO instances.
Eventually after running this script, you should see a Successfully installed ModelMesh Serving!
message.
To see more details about installation, click here.
Check that the pods are running:
kubectl get pods
NAME READY STATUS RESTARTS AGE
pod/etcd 1/1 Running 0 5m
pod/minio 1/1 Running 0 5m
pod/modelmesh-controller-547bfb64dc-mrgrq 1/1 Running 0 5m
Check that the ServingRuntimes
are available:
kubectl get servingruntimes
NAME DISABLED MODELTYPE CONTAINERS AGE
mlserver-0.x sklearn mlserver 5m
triton-2.x tensorflow triton 5m
ServingRuntimes
are automatically provisioned based on the framework of the model deployed.
Two ServingRuntimes
are included with ModelMesh Serving by default. The current mappings for these
are:
ServingRuntime | Supported Frameworks |
---|---|
triton-2.x | tensorflow, pytorch, onnx, tensorrt |
mlserver-0.x | sklearn, xgboost, lightgbm |
With ModelMesh Serving now installed, try deploying a model using the Predictor
CRD.
Note: ModelMesh Serving also supports deployment using KServe's InferenceService interface. Please refer to these instructions for information on alternatively using InferenceServices.
Here, we deploy an SKLearn MNIST model which is served from the local MinIO container:
kubectl apply -f - <<EOF
apiVersion: serving.kserve.io/v1alpha1
kind: Predictor
metadata:
name: example-mnist-predictor
spec:
modelType:
name: sklearn
path: sklearn/mnist-svm.joblib
storage:
s3:
secretKey: localMinIO
EOF
After applying this predictor, you should see it in the Loading
state:
kubectl get predictors
NAME TYPE AVAILABLE ACTIVEMODEL TARGETMODEL TRANSITION AGE
example-mnist-predictor sklearn false Loading UpToDate 7s
Eventually, you should see the ServingRuntime pods that will hold the SKLearn model become Running
.
kubectl get pods
...
modelmesh-serving-mlserver-0.x-7db675f677-twrwd 3/3 Running 0 2m
modelmesh-serving-mlserver-0.x-7db675f677-xvd8q 3/3 Running 0 2m
Then, checking on the predictors
again, you should see that it is now available:
kubectl get predictors
NAME TYPE AVAILABLE ACTIVEMODEL TARGETMODEL TRANSITION AGE
example-mnist-predictor sklearn true Loaded UpToDate 2m
To see more detailed instructions and information, click here.
Now that a model is loaded and available, you can then perform inference. Currently, only gRPC inference requests are supported by ModelMesh, but REST support is enabled via a REST proxy container. By default, ModelMesh Serving uses a headless Service since a normal Service has issues load balancing gRPC requests. See more info here.
To test out gRPC inference requests, you can port-forward the headless service in a separate terminal window:
kubectl port-forward --address 0.0.0.0 service/modelmesh-serving 8033 -n modelmesh-serving
Then a gRPC client generated from the KFServing grpc_predict_v2.proto
file can be used with localhost:8033
. A ready-to-use Python example of this can be found here.
Alternatively, you can test inference with grpcurl. This can easily be installed with brew install grpcurl
if on macOS.
With grpcurl
, a request can be sent to the SKLearn MNIST model like the following. Make sure that the MODEL_NAME
variable below is set to the name of your Predictor/InferenceService.
MODEL_NAME=example-mnist-predictor
grpcurl \
-plaintext \
-proto fvt/proto/kfs_inference_v2.proto \
-d '{ "model_name": "'"${MODEL_NAME}"'", "inputs": [{ "name": "predict", "shape": [1, 64], "datatype": "FP32", "contents": { "fp32_contents": [0.0, 0.0, 1.0, 11.0, 14.0, 15.0, 3.0, 0.0, 0.0, 1.0, 13.0, 16.0, 12.0, 16.0, 8.0, 0.0, 0.0, 8.0, 16.0, 4.0, 6.0, 16.0, 5.0, 0.0, 0.0, 5.0, 15.0, 11.0, 13.0, 14.0, 0.0, 0.0, 0.0, 0.0, 2.0, 12.0, 16.0, 13.0, 0.0, 0.0, 0.0, 0.0, 0.0, 13.0, 16.0, 16.0, 6.0, 0.0, 0.0, 0.0, 0.0, 16.0, 16.0, 16.0, 7.0, 0.0, 0.0, 0.0, 0.0, 11.0, 13.0, 12.0, 1.0, 0.0] }}]}' \
localhost:8033 \
inference.GRPCInferenceService.ModelInfer
This should give you output like the following:
{
"modelName": "example-mnist-predictor__ksp-7702c1b55a",
"outputs": [
{
"name": "predict",
"datatype": "FP32",
"shape": ["1"],
"contents": {
"fp32Contents": [8]
}
}
]
}
Note: The REST proxy is currently in an alpha state and may still have issues with certain usage scenarios.
You will need to port-forward a different port for REST.
kubectl port-forward --address 0.0.0.0 service/modelmesh-serving 8008 -n modelmesh-serving
With curl
, a request can be sent to the SKLearn MNIST model like the following. Make sure that the MODEL_NAME
variable below is set to the name of your Predictor/InferenceService.
MODEL_NAME=example-mnist-predictor
curl -X POST -k http://localhost:8008/v2/models/${MODEL_NAME}/infer -d '{"inputs": [{ "name": "predict", "shape": [1, 64], "datatype": "FP32", "data": [0.0, 0.0, 1.0, 11.0, 14.0, 15.0, 3.0, 0.0, 0.0, 1.0, 13.0, 16.0, 12.0, 16.0, 8.0, 0.0, 0.0, 8.0, 16.0, 4.0, 6.0, 16.0, 5.0, 0.0, 0.0, 5.0, 15.0, 11.0, 13.0, 14.0, 0.0, 0.0, 0.0, 0.0, 2.0, 12.0, 16.0, 13.0, 0.0, 0.0, 0.0, 0.0, 0.0, 13.0, 16.0, 16.0, 6.0, 0.0, 0.0, 0.0, 0.0, 16.0, 16.0, 16.0, 7.0, 0.0, 0.0, 0.0, 0.0, 11.0, 13.0, 12.0, 1.0, 0.0]}]}'
This should give you a response like the following:
{
"model_name": "example-mnist-predictor__ksp-7702c1b55a",
"outputs": [
{
"name": "predict",
"datatype": "FP32",
"shape": [1],
"data": [8]
}
]
}
To see more detailed instructions and information, click here.
To delete all ModelMesh Serving resources that were installed, run the following from the root of the project:
./scripts/delete.sh --namespace modelmesh-serving