-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat: Add PVC storage support #267
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: chinhuang007 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @chinhuang007, please see inline comments.
controllers/modelmesh/runtime.go
Outdated
volumeMounts = append([]corev1.VolumeMount{ | ||
{ | ||
Name: pvcName, | ||
MountPath: PVCRootDir + "/" + pvcName, | ||
ReadOnly: true, | ||
}, | ||
}, volumeMounts...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be simplified:
volumeMounts = append([]corev1.VolumeMount{ | |
{ | |
Name: pvcName, | |
MountPath: PVCRootDir + "/" + pvcName, | |
ReadOnly: true, | |
}, | |
}, volumeMounts...) | |
volumeMounts = append(volumeMounts, corev1.VolumeMount{ | |
Name: pvcName, | |
MountPath: PVCRootDir + "/" + pvcName, | |
ReadOnly: true, | |
}) |
controllers/modelmesh/modelmesh.go
Outdated
@@ -53,6 +53,7 @@ type Deployment struct { | |||
RESTProxyImage string | |||
RESTProxyResources *corev1.ResourceRequirements | |||
RESTProxyPort uint16 | |||
PVCs map[string]struct{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this could just be a []string
?
controllers/predictor_controller.go
Outdated
s := &corev1.Secret{} | ||
if err := pr.Client.Get(context.TODO(), types.NamespacedName{ | ||
Name: modelmesh.StorageSecretName, | ||
Namespace: predictor.Namespace, | ||
}, s); err != nil { | ||
return "Could not get the storage secret" | ||
} | ||
var storageConfig map[string]string | ||
for _, storageData := range s.Data { | ||
if err := json.Unmarshal(storageData, &storageConfig); err != nil { | ||
return "Could not parse storage configuration json" | ||
} | ||
if storageConfig["type"] == StoragePVCType && storageConfig["name"] == storage.PersistentVolumeClaim.ClaimName { | ||
return "" | ||
} | ||
} | ||
|
||
return "PersistentVolumeClaim name is not in the storage config secret" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't want this here. That field in the CRD is deprecated, we decided against "hard coded" storage types so it will just be included as the "type" parameter in the StorageSpec.
We could change the returned error message however to indicate this e.g. "spec.storage.PersistentVolumeClaim is not supported, use 'pvc' type storage in the storageSpec instead"
controllers/modelmesh/puller.go
Outdated
{ | ||
Name: PVCMount, | ||
MountPath: PullerPVCPath, | ||
ReadOnly: true, | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed, we should add all the same PVC mounts here as done for the other containers below. Will need to add a new arg to this func and the one that calls it to pass through the list of PVCs.
controllers/modelmesh/const.go
Outdated
@@ -29,6 +29,7 @@ const ( | |||
SocketVolume = "domain-socket" | |||
|
|||
ConfigStorageMount = "storage-config" | |||
PVCMount = "pvc-mount" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is needed.
main.go
Outdated
// default is true | ||
envVarVal, _ := os.LookupEnv(envVar) | ||
if envVarVal != FalseString { | ||
err = cl.Get(context.Background(), client.ObjectKey{Name: "foo", Namespace: ControllerNamespace}, resourceObject) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be best to just check doing a Get
on a secret called storage-config
if parameters["type"] == pvcStorageType { | ||
p.Spec.Storage.PersistentVolumeClaim = &corev1.PersistentVolumeClaimVolumeSource{} | ||
p.Spec.Storage.PersistentVolumeClaim.ClaimName = parameters["name"] | ||
p.Spec.Storage.PersistentVolumeClaim.ReadOnly = true | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this per previous comment
case pvcStorageType: | ||
modelPath = strings.TrimPrefix(u.Path, "/") | ||
uriParameters["type"] = pvcStorageType | ||
uriParameters["name"] = u.Host |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@@ -39,6 +40,7 @@ const ( | |||
runtimeAnnotation = "serving.kserve.io/servingRuntime" | |||
|
|||
azureBlobHostSuffix = "blob.core.windows.net" | |||
pvcStorageType = "pvc" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think a constant is needed here, best to keep it consistent with the others.
@@ -623,3 +676,16 @@ func (r *ServingRuntimeReconciler) clusterServingRuntimeRequests(csr *kserveapi. | |||
|
|||
return requests | |||
} | |||
|
|||
func (r *ServingRuntimeReconciler) secretRequests(secret *corev1.Secret, namespace string) []reconcile.Request { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename to storageSecretRequests
? (otherwise a bit generic)
Also don't need a separate namespace
arg, can just get it from the secret inside this func.
457c64e
to
ea63c61
Compare
@njhill thanks for the detailed review as always! Just made a few changes so please take a look again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @chinhuang007 it's getting closer :)
One of the things is nontrivial and I'll aim to add some more detail for that.
} | ||
// add pvcs for predictors when the global config is enabled | ||
if r.DeployPVCForPredictor { | ||
list := &v1beta1.InferenceServiceList{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we still need to change this per prior comment, to support different predictor "sources" (could be predictor or inferencservice). I will aim to write an example of what I mean.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chinhuang007 here is what I was thinking of, sorry again for the delay:
if cfg.AllowAnyPVC {
restProxyEnabled := cfg.RESTProxy.Enabled
f := func(p *api.Predictor) bool {
if runtimeSupportsPredictor(rt, p, restProxyEnabled, req.Name) &&
p.Spec.Storage != nil && p.Spec.Storage.Parameters != nil {
params := *p.Spec.Storage.Parameters
if stype := params["type"]; stype == "pvc" {
if name := params["name"]; name != "" {
pvcsMap[name] = struct{}{}
}
}
}
return false
}
for _, pr := range r.RegistryMap {
if found, err := pr.Find(ctx, req.Namespace, f); found || err != nil {
return nil, err
}
}
}
controllers/predictor_controller.go
Outdated
if err := json.Unmarshal(storageData, &storageConfig); err != nil { | ||
return "Could not parse storage configuration json" | ||
} | ||
if storageConfig["type"] == StoragePVCType && storageConfig["name"] == parameters["name"] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should check for the existence of the "name" key in both cases... with a wrong configuration this could panic.
controllers/predictor_controller.go
Outdated
@@ -239,6 +240,33 @@ func validatePredictor(predictor *api.Predictor) string { | |||
return "Only one of spec.schemaPath and spec.storage.schemaPath can be specified" | |||
} | |||
|
|||
if storage.StorageSpec.Parameters != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually think we can probably omit this whole section. If the PVC isn't present it will cause the predictor load to fail (with an informative message from the puller).
main.go
Outdated
ClusterScope: clusterScopeMode, | ||
EnableCSRWatch: enableCSRWatch, | ||
EnableSecretWatch: enableSecretWatch, | ||
DeployPVCForPredictor: conf.EnableDeployPVCForPredictor, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Getting the value here means it won't be dynamic. We can just grab it from the config where we use it.
} | ||
} | ||
// add pvcs for predictors when the global config is enabled | ||
if r.DeployPVCForPredictor { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per other comment this doesn't need to be a reconciler field, just pass cfg
into this function and get the value from there.
pkg/config/config.go
Outdated
ModelMeshEndpoint string // For dev use only | ||
EtcdSecretName string // DEPRECATED - should be removed in the future | ||
ModelMeshEndpoint string // For dev use only | ||
EnableDeployPVCForPredictor bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should name this differently, how about AllowAnyPVC
?
We should also add to the docs ... the config params table and also in the storage part of the docs explain how this parameter works.
cdf26b6
to
080d438
Compare
if _, exists := storageConfig["name"]; !exists { | ||
return nil, fmt.Errorf("Missing PVC name in storage configuration") | ||
} | ||
pvcsMap[storageConfig["name"]] = struct{}{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can avoid double lookup: Edit: ignore this, see follow-on comment
if _, exists := storageConfig["name"]; !exists { | |
return nil, fmt.Errorf("Missing PVC name in storage configuration") | |
} | |
pvcsMap[storageConfig["name"]] = struct{}{} | |
if name, exists := storageConfig["name"]; exists { | |
pvcsMap[name] = struct{}{} | |
} else { | |
return nil, fmt.Errorf("Missing PVC name in storage configuration") | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually on second thoughts, simplest would just be this. I also think we shouldn't return a failure for this, best to just log a warning.
if _, exists := storageConfig["name"]; !exists { | |
return nil, fmt.Errorf("Missing PVC name in storage configuration") | |
} | |
pvcsMap[storageConfig["name"]] = struct{}{} | |
if name := storageConfig["name"]; name != "" { | |
pvcsMap[name] = struct{}{} | |
} else { | |
// Log warning here | |
} |
} | ||
} | ||
// add pvcs for predictors when the global config is enabled | ||
if r.ConfigProvider.GetConfig().AllowAnyPVC { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So that we use a consistent snapshot of the config for the whole reconciliation, it would be better to pass the already retrieved config struct into this function instead of getting via the provider. Can just add a cfg
arg to getPVCs()
.
e7d58bf
to
dc49a3b
Compare
if p != nil && predicate(p) { | ||
inferenceService := &list.Items[i] | ||
nname := types.NamespacedName{Name: inferenceService.Name, Namespace: inferenceService.Namespace} | ||
p, err := BuildBasePredictorFromInferenceService(inferenceService) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chinhuang007 we should probably check for p == nil
here like before.
dc49a3b
to
75c0303
Compare
Add PVC support according to the design and discussions captured in the issue, kserve#230 Signed-off-by: Chin Huang <[email protected]>
Signed-off-by: Chin Huang <[email protected]>
Signed-off-by: Chin Huang <[email protected]>
Signed-off-by: Chin Huang <[email protected]>
Signed-off-by: Chin Huang <[email protected]>
Signed-off-by: Chin Huang <[email protected]>
Signed-off-by: Chin Huang <[email protected]>
Signed-off-by: Chin Huang <[email protected]>
75c0303
to
537344b
Compare
s := &corev1.Secret{} | ||
if err := r.Client.Get(ctx, types.NamespacedName{ | ||
Name: modelmesh.StorageSecretName, | ||
Namespace: req.Namespace, | ||
}, s); err != nil { | ||
return nil, fmt.Errorf("Could not get the storage secret: %w", err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To use PVC, why do we need to check StorageSecretName The default "storage-config" secret usually has credentials for accessing cloud storage such as S3. However, PVC does not need credentials to attach to the pod.
With this code block, the controller keeps checking if the secret exists in a namespace. So although it is not needed, it must exist in the namespace to deploy an inference pod.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This goes back to some of the design decisions for the implementation, i.e. ...
The controller will watch the storage config secret in each namespace and keep the pvcs mounted to runtime deployments based on any pvc type entries
Predictors can always reference one of these PVCs for storage
The entry type will be checked and the name used to create volumes and volume mounts for runtime deployments, while the keys are irrelevant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so even though users want to use PVC only, creating storage-config
Secret with dummy data is required, am I right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you use the allowAnyPVC
config parameter, and don't want to deploy the minio-examples image, I suppose you would not need it
Certainly it should not be required to create it with dummy data, so I think we should tolerate it not being there. @njhill
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ckadner yes as discussed offline we should change it so that in that mode the storage-config secret isn't required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @chinhuang007 @ckadner!
/lgtm |
Related: kserve#230, kserve#267 Signed-off-by: Christian Kadner <[email protected]>
Related: kserve#230, kserve#267 Signed-off-by: Christian Kadner <[email protected]>
* Add PVC support Add PVC support according to the design and discussions captured in the issue, kserve#230 Signed-off-by: Chin Huang <[email protected]> * add predictor controller login Signed-off-by: Chin Huang <[email protected]> * code restructure, cleanup based on review Signed-off-by: Chin Huang <[email protected]> * fix addPullerSidecar to include all pvcs Signed-off-by: Chin Huang <[email protected]> * restructure and simplify code, use global configmap rather than env var Signed-off-by: Chin Huang <[email protected]> * make AllowAnyPVC dynamic, update docs Signed-off-by: Chin Huang <[email protected]> * add runtimeSupportsPredictor check Signed-off-by: Chin Huang <[email protected]> * use PredictorRegistry and add storage to find() Signed-off-by: Chin Huang <[email protected]> --------- Signed-off-by: Chin Huang <[email protected]>
Motivation Address PVC follow-up work items outlined in #337 for PVC storage introduced in #267 Modifications Code changes: - Sort PVC mounts on serving runtime specs to avoid unstable repeated runtime rollouts as Kubernetes treat two otherwise identical deployment specs as different if the same set of volume mounts are in different order - Don't add non-existent PVCs from predictor/ISVC when allowAnyPVC is enabled as this would cause all serving pods for that runtime to stay in - Pending state with unbound (pending) volumes - Tolerate missing storage-config secret when allowAnyPVC is enabled - Lint: fix "io/ioutil" deprecations FVT changes: - Add Storage test suite - Add helper methods to add PVC to storage-config during FVT - Allow for additional time in WaitForReadyDeployStatus but allow early abort on success - Check if pod still running before gRPC/REST requests, reconnect if necessary - Only choose "Ready" runtime pod for port-forwards - Include ISVC tests in Predictor test suite to ensure "serial" execution of TLS tests Resolves #337 Signed-off-by: Christian Kadner <[email protected]>
Motivation Address PVC follow-up work items outlined in kserve#337 for PVC storage introduced in opendatahub-io#267 Modifications Code changes: - Sort PVC mounts on serving runtime specs to avoid unstable repeated runtime rollouts as Kubernetes treat two otherwise identical deployment specs as different if the same set of volume mounts are in different order - Don't add non-existent PVCs from predictor/ISVC when allowAnyPVC is enabled as this would cause all serving pods for that runtime to stay in - Pending state with unbound (pending) volumes - Tolerate missing storage-config secret when allowAnyPVC is enabled - Lint: fix "io/ioutil" deprecations FVT changes: - Add Storage test suite - Add helper methods to add PVC to storage-config during FVT - Allow for additional time in WaitForReadyDeployStatus but allow early abort on success - Check if pod still running before gRPC/REST requests, reconnect if necessary - Only choose "Ready" runtime pod for port-forwards - Include ISVC tests in Predictor test suite to ensure "serial" execution of TLS tests Resolves kserve#337 Signed-off-by: Christian Kadner <[email protected]>
Motivation Address PVC follow-up work items outlined in kserve#337 for PVC storage introduced in opendatahub-io#267 Modifications Code changes: - Sort PVC mounts on serving runtime specs to avoid unstable repeated runtime rollouts as Kubernetes treat two otherwise identical deployment specs as different if the same set of volume mounts are in different order - Don't add non-existent PVCs from predictor/ISVC when allowAnyPVC is enabled as this would cause all serving pods for that runtime to stay in - Pending state with unbound (pending) volumes - Tolerate missing storage-config secret when allowAnyPVC is enabled - Lint: fix "io/ioutil" deprecations FVT changes: - Add Storage test suite - Add helper methods to add PVC to storage-config during FVT - Allow for additional time in WaitForReadyDeployStatus but allow early abort on success - Check if pod still running before gRPC/REST requests, reconnect if necessary - Only choose "Ready" runtime pod for port-forwards - Include ISVC tests in Predictor test suite to ensure "serial" execution of TLS tests Resolves kserve#337 Signed-off-by: Christian Kadner <[email protected]>
Motivation Address PVC follow-up work items outlined in kserve#337 for PVC storage introduced in opendatahub-io#267 Modifications Code changes: - Sort PVC mounts on serving runtime specs to avoid unstable repeated runtime rollouts as Kubernetes treat two otherwise identical deployment specs as different if the same set of volume mounts are in different order - Don't add non-existent PVCs from predictor/ISVC when allowAnyPVC is enabled as this would cause all serving pods for that runtime to stay in - Pending state with unbound (pending) volumes - Tolerate missing storage-config secret when allowAnyPVC is enabled - Lint: fix "io/ioutil" deprecations FVT changes: - Add Storage test suite - Add helper methods to add PVC to storage-config during FVT - Allow for additional time in WaitForReadyDeployStatus but allow early abort on success - Check if pod still running before gRPC/REST requests, reconnect if necessary - Only choose "Ready" runtime pod for port-forwards - Include ISVC tests in Predictor test suite to ensure "serial" execution of TLS tests Resolves kserve#337 Signed-off-by: Christian Kadner <[email protected]>
Motivation
Support PVC as a storage option, to address issue #230
Modifications
Result
Related Issues
Resolves #230