To quickly get started using ModelMesh Serving, here is a brief guide.
- A Kubernetes cluster v 1.16+ with cluster administrative privileges
- kubectl and kustomize (v4.0.0+)
- At least 4 vCPU and 8 GB memory. For more details, please see here.
RELEASE=release-0.9
git clone -b $RELEASE --depth 1 --single-branch https://github.com/kserve/modelmesh-serving.git
cd modelmesh-serving
kubectl create namespace modelmesh-serving
./scripts/install.sh --namespace modelmesh-serving --quickstart
This will install ModelMesh serving in the modelmesh-serving
namespace, along with an etcd and MinIO instances.
Eventually after running this script, you should see a Successfully installed ModelMesh Serving!
message.
Note: These etcd and MinIO deployments are intended for development/experimentation and not for production.
To see more details about installation, click here.
Check that the pods are running:
kubectl get pods
NAME READY STATUS RESTARTS AGE
pod/etcd 1/1 Running 0 5m
pod/minio 1/1 Running 0 5m
pod/modelmesh-controller-547bfb64dc-mrgrq 1/1 Running 0 5m
Check that the ServingRuntimes
are available:
kubectl get servingruntimes
NAME DISABLED MODELTYPE CONTAINERS AGE
mlserver-0.x sklearn mlserver 5m
triton-2.x tensorflow triton 5m
ServingRuntimes
are automatically provisioned based on the framework of the model deployed.
Two ServingRuntimes
are included with ModelMesh Serving by default. The current mappings for these
are:
ServingRuntime | Supported Frameworks |
---|---|
triton-2.x | tensorflow, pytorch, onnx, tensorrt |
mlserver-0.x | sklearn, xgboost, lightgbm |
With ModelMesh Serving now installed, try deploying a model using the KServe InferenceService
CRD.
Note: While both the KServe controller and ModelMesh controller will reconcile InferenceService resources, the ModelMesh controller will only handle those InferenceServices with the
serving.kserve.io/deploymentMode: ModelMesh
annotation. Otherwise, the KServe controller will handle reconciliation. Likewise, the KServe controller will not reconcile an InferenceService with theserving.kserve.io/deploymentMode: ModelMesh
annotation, and will defer under the assumption that the ModelMesh controller will handle it.
Here, we deploy an SKLearn MNIST model which is served from the local MinIO container:
kubectl apply -f - <<EOF
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
name: example-sklearn-isvc
annotations:
serving.kserve.io/deploymentMode: ModelMesh
spec:
predictor:
model:
modelFormat:
name: sklearn
storage:
key: localMinIO
path: sklearn/mnist-svm.joblib
EOF
Note: the above YAML uses the InferenceService
predictor storage spec. You can also continue
using the storageUri
field in lieu of the storage spec:
kubectl apply -f - <<EOF
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
name: example-sklearn-isvc
annotations:
serving.kserve.io/deploymentMode: ModelMesh
serving.kserve.io/secretKey: localMinIO
spec:
predictor:
model:
modelFormat:
name: sklearn
storageUri: s3://modelmesh-example-models/sklearn/mnist-svm.joblib
EOF
After applying this InferenceService
, you should see that it is likely not yet ready.
kubectl get isvc
NAME URL READY PREV LATEST PREVROLLEDOUTREVISION LATESTREADYREVISION AGE
example-sklearn-isvc False 3s
Eventually, you should see the ServingRuntime pods that will hold the SKLearn model become Running
.
kubectl get pods
...
modelmesh-serving-mlserver-0.x-7db675f677-twrwd 3/3 Running 0 2m
modelmesh-serving-mlserver-0.x-7db675f677-xvd8q 3/3 Running 0 2m
Then, checking on the InferenceServices
again, you should see that the one we deployed is now ready with a provided URL:
kubectl get isvc
NAME URL READY PREV LATEST PREVROLLEDOUTREVISION LATESTREADYREVISION AGE
example-sklearn-isvc grpc://modelmesh-serving.modelmesh-serving:8033 True 97s
You can describe the InferenceService
to get more status information:
kubectl describe isvc example-sklearn-isvc
Name: example-sklearn-isvc
...
Status:
Components:
Predictor:
Grpc URL: grpc://modelmesh-serving.modelmesh-serving:8033
Rest URL: http://modelmesh-serving.modelmesh-serving:8008
URL: grpc://modelmesh-serving.modelmesh-serving:8033
Conditions:
Last Transition Time: 2022-07-18T18:01:54Z
Status: True
Type: PredictorReady
Last Transition Time: 2022-07-18T18:01:54Z
Status: True
Type: Ready
Model Status:
Copies:
Failed Copies: 0
Total Copies: 2
States:
Active Model State: Loaded
Target Model State:
Transition Status: UpToDate
URL: grpc://modelmesh-serving.modelmesh-serving:8033
...
To see more detailed instructions and information, click here.
Now that a model is loaded and available, you can then perform inference. Currently, only gRPC inference requests are supported by ModelMesh, but REST support is enabled via a REST proxy container. By default, ModelMesh Serving uses a headless Service since a normal Service has issues load balancing gRPC requests. See more info here.
To test out gRPC inference requests, you can port-forward the headless service in a separate terminal window:
kubectl port-forward --address 0.0.0.0 service/modelmesh-serving 8033 -n modelmesh-serving
Then a gRPC client generated from the KServe grpc_predict_v2.proto
file can be used with localhost:8033
. A ready-to-use Python example of this can be found here.
Alternatively, you can test inference with grpcurl. This can easily be installed with brew install grpcurl
if on macOS.
With grpcurl
, a request can be sent to the SKLearn MNIST model like the following. Make sure that the MODEL_NAME
variable below is set to the name of your InferenceService.
MODEL_NAME=example-sklearn-isvc
grpcurl \
-plaintext \
-proto fvt/proto/kfs_inference_v2.proto \
-d '{ "model_name": "'"${MODEL_NAME}"'", "inputs": [{ "name": "predict", "shape": [1, 64], "datatype": "FP32", "contents": { "fp32_contents": [0.0, 0.0, 1.0, 11.0, 14.0, 15.0, 3.0, 0.0, 0.0, 1.0, 13.0, 16.0, 12.0, 16.0, 8.0, 0.0, 0.0, 8.0, 16.0, 4.0, 6.0, 16.0, 5.0, 0.0, 0.0, 5.0, 15.0, 11.0, 13.0, 14.0, 0.0, 0.0, 0.0, 0.0, 2.0, 12.0, 16.0, 13.0, 0.0, 0.0, 0.0, 0.0, 0.0, 13.0, 16.0, 16.0, 6.0, 0.0, 0.0, 0.0, 0.0, 16.0, 16.0, 16.0, 7.0, 0.0, 0.0, 0.0, 0.0, 11.0, 13.0, 12.0, 1.0, 0.0] }}]}' \
localhost:8033 \
inference.GRPCInferenceService.ModelInfer
This should give you output like the following:
{
"modelName": "example-sklearn-isvc__isvc-3642375d03",
"outputs": [
{
"name": "predict",
"datatype": "INT64",
"shape": ["1"],
"contents": {
"int64Contents": ["8"]
}
}
]
}
Note: The REST proxy is currently in an alpha state and may still have issues with certain usage scenarios.
You will need to port-forward a different port for REST.
kubectl port-forward --address 0.0.0.0 service/modelmesh-serving 8008 -n modelmesh-serving
With curl
, a request can be sent to the SKLearn MNIST model like the following. Make sure that the MODEL_NAME
variable below is set to the name of your InferenceService.
MODEL_NAME=example-sklearn-isvc
curl -X POST -k http://localhost:8008/v2/models/${MODEL_NAME}/infer -d '{"inputs": [{ "name": "predict", "shape": [1, 64], "datatype": "FP32", "data": [0.0, 0.0, 1.0, 11.0, 14.0, 15.0, 3.0, 0.0, 0.0, 1.0, 13.0, 16.0, 12.0, 16.0, 8.0, 0.0, 0.0, 8.0, 16.0, 4.0, 6.0, 16.0, 5.0, 0.0, 0.0, 5.0, 15.0, 11.0, 13.0, 14.0, 0.0, 0.0, 0.0, 0.0, 2.0, 12.0, 16.0, 13.0, 0.0, 0.0, 0.0, 0.0, 0.0, 13.0, 16.0, 16.0, 6.0, 0.0, 0.0, 0.0, 0.0, 16.0, 16.0, 16.0, 7.0, 0.0, 0.0, 0.0, 0.0, 11.0, 13.0, 12.0, 1.0, 0.0]}]}'
This should give you a response like the following:
{
"model_name": "example-sklearn-isvc__ksp-7702c1b55a",
"outputs": [
{
"name": "predict",
"datatype": "FP32",
"shape": [1],
"data": [8]
}
]
}
To see more detailed instructions and information, click here.
To delete all ModelMesh Serving resources that were installed, run the following from the root of the project:
./scripts/delete.sh --namespace modelmesh-serving