Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: Add PVC storage support #267

Merged
merged 8 commits into from
Mar 1, 2023
Merged

Feat: Add PVC storage support #267

merged 8 commits into from
Mar 1, 2023

Conversation

chinhuang007
Copy link
Contributor

@chinhuang007 chinhuang007 commented Oct 24, 2022

Motivation

Support PVC as a storage option, to address issue #230

Modifications

  • Add PVCs to deployment
  • Add secret watch to ServingRuntime controller to update deployment
  • Add logic to ServingRuntime controller to create PVCs for predictors if the global setting is true
  • Create volumes and PVCs based on deployment for ServingRuntimes
  • Update Predictor controller to validate for PVCs

Result

  • PVCs are read from the storage secret and created for all ServingRuntimes
  • PVCs are read from the predictor spec and created for all ServingRuntimes if the global setting is true

Related Issues


Resolves #230

@kserve-oss-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: chinhuang007
To complete the pull request process, please assign njhill after the PR has been reviewed.
You can assign the PR to them by writing /assign @njhill in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Member

@njhill njhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @chinhuang007, please see inline comments.

Comment on lines 144 to 150
volumeMounts = append([]corev1.VolumeMount{
{
Name: pvcName,
MountPath: PVCRootDir + "/" + pvcName,
ReadOnly: true,
},
}, volumeMounts...)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be simplified:

Suggested change
volumeMounts = append([]corev1.VolumeMount{
{
Name: pvcName,
MountPath: PVCRootDir + "/" + pvcName,
ReadOnly: true,
},
}, volumeMounts...)
volumeMounts = append(volumeMounts, corev1.VolumeMount{
Name: pvcName,
MountPath: PVCRootDir + "/" + pvcName,
ReadOnly: true,
})

@@ -53,6 +53,7 @@ type Deployment struct {
RESTProxyImage string
RESTProxyResources *corev1.ResourceRequirements
RESTProxyPort uint16
PVCs map[string]struct{}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could just be a []string?

Comment on lines 245 to 262
s := &corev1.Secret{}
if err := pr.Client.Get(context.TODO(), types.NamespacedName{
Name: modelmesh.StorageSecretName,
Namespace: predictor.Namespace,
}, s); err != nil {
return "Could not get the storage secret"
}
var storageConfig map[string]string
for _, storageData := range s.Data {
if err := json.Unmarshal(storageData, &storageConfig); err != nil {
return "Could not parse storage configuration json"
}
if storageConfig["type"] == StoragePVCType && storageConfig["name"] == storage.PersistentVolumeClaim.ClaimName {
return ""
}
}

return "PersistentVolumeClaim name is not in the storage config secret"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want this here. That field in the CRD is deprecated, we decided against "hard coded" storage types so it will just be included as the "type" parameter in the StorageSpec.

We could change the returned error message however to indicate this e.g. "spec.storage.PersistentVolumeClaim is not supported, use 'pvc' type storage in the storageSpec instead"

Comment on lines 95 to 99
{
Name: PVCMount,
MountPath: PullerPVCPath,
ReadOnly: true,
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, we should add all the same PVC mounts here as done for the other containers below. Will need to add a new arg to this func and the one that calls it to pass through the list of PVCs.

@@ -29,6 +29,7 @@ const (
SocketVolume = "domain-socket"

ConfigStorageMount = "storage-config"
PVCMount = "pvc-mount"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is needed.

main.go Outdated
// default is true
envVarVal, _ := os.LookupEnv(envVar)
if envVarVal != FalseString {
err = cl.Get(context.Background(), client.ObjectKey{Name: "foo", Namespace: ControllerNamespace}, resourceObject)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be best to just check doing a Get on a secret called storage-config

Comment on lines 327 to 331
if parameters["type"] == pvcStorageType {
p.Spec.Storage.PersistentVolumeClaim = &corev1.PersistentVolumeClaimVolumeSource{}
p.Spec.Storage.PersistentVolumeClaim.ClaimName = parameters["name"]
p.Spec.Storage.PersistentVolumeClaim.ReadOnly = true
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this per previous comment

Comment on lines 167 to 168
case pvcStorageType:
modelPath = strings.TrimPrefix(u.Path, "/")
uriParameters["type"] = pvcStorageType
uriParameters["name"] = u.Host
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -39,6 +40,7 @@ const (
runtimeAnnotation = "serving.kserve.io/servingRuntime"

azureBlobHostSuffix = "blob.core.windows.net"
pvcStorageType = "pvc"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think a constant is needed here, best to keep it consistent with the others.

@@ -623,3 +676,16 @@ func (r *ServingRuntimeReconciler) clusterServingRuntimeRequests(csr *kserveapi.

return requests
}

func (r *ServingRuntimeReconciler) secretRequests(secret *corev1.Secret, namespace string) []reconcile.Request {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename to storageSecretRequests? (otherwise a bit generic)

Also don't need a separate namespace arg, can just get it from the secret inside this func.

@njhill njhill changed the title (WIP) Add pvc Feat: Add PVC storage support Nov 12, 2022
@chinhuang007 chinhuang007 force-pushed the add-pvc branch 6 times, most recently from 457c64e to ea63c61 Compare November 18, 2022 21:24
@chinhuang007 chinhuang007 marked this pull request as ready for review November 19, 2022 00:52
@chinhuang007
Copy link
Contributor Author

@njhill thanks for the detailed review as always! Just made a few changes so please take a look again.

Copy link
Member

@njhill njhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @chinhuang007 it's getting closer :)

One of the things is nontrivial and I'll aim to add some more detail for that.

}
// add pvcs for predictors when the global config is enabled
if r.DeployPVCForPredictor {
list := &v1beta1.InferenceServiceList{}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we still need to change this per prior comment, to support different predictor "sources" (could be predictor or inferencservice). I will aim to write an example of what I mean.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chinhuang007 here is what I was thinking of, sorry again for the delay:

	if cfg.AllowAnyPVC {
		restProxyEnabled := cfg.RESTProxy.Enabled
		f := func(p *api.Predictor) bool {
			if runtimeSupportsPredictor(rt, p, restProxyEnabled, req.Name) &&
				p.Spec.Storage != nil && p.Spec.Storage.Parameters != nil {
				
				params := *p.Spec.Storage.Parameters
				if stype := params["type"]; stype == "pvc" {
					if name := params["name"]; name != "" {
						pvcsMap[name] = struct{}{}
					}
				}
			}
			return false
		}

		for _, pr := range r.RegistryMap {
			if found, err := pr.Find(ctx, req.Namespace, f); found || err != nil {
				return nil, err
			}
		}
	}

if err := json.Unmarshal(storageData, &storageConfig); err != nil {
return "Could not parse storage configuration json"
}
if storageConfig["type"] == StoragePVCType && storageConfig["name"] == parameters["name"] {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should check for the existence of the "name" key in both cases... with a wrong configuration this could panic.

@@ -239,6 +240,33 @@ func validatePredictor(predictor *api.Predictor) string {
return "Only one of spec.schemaPath and spec.storage.schemaPath can be specified"
}

if storage.StorageSpec.Parameters != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually think we can probably omit this whole section. If the PVC isn't present it will cause the predictor load to fail (with an informative message from the puller).

main.go Outdated
ClusterScope: clusterScopeMode,
EnableCSRWatch: enableCSRWatch,
EnableSecretWatch: enableSecretWatch,
DeployPVCForPredictor: conf.EnableDeployPVCForPredictor,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting the value here means it won't be dynamic. We can just grab it from the config where we use it.

}
}
// add pvcs for predictors when the global config is enabled
if r.DeployPVCForPredictor {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per other comment this doesn't need to be a reconciler field, just pass cfg into this function and get the value from there.

ModelMeshEndpoint string // For dev use only
EtcdSecretName string // DEPRECATED - should be removed in the future
ModelMeshEndpoint string // For dev use only
EnableDeployPVCForPredictor bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should name this differently, how about AllowAnyPVC?

We should also add to the docs ... the config params table and also in the storage part of the docs explain how this parameter works.

Comment on lines 318 to 321
if _, exists := storageConfig["name"]; !exists {
return nil, fmt.Errorf("Missing PVC name in storage configuration")
}
pvcsMap[storageConfig["name"]] = struct{}{}
Copy link
Member

@njhill njhill Nov 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can avoid double lookup: Edit: ignore this, see follow-on comment

Suggested change
if _, exists := storageConfig["name"]; !exists {
return nil, fmt.Errorf("Missing PVC name in storage configuration")
}
pvcsMap[storageConfig["name"]] = struct{}{}
if name, exists := storageConfig["name"]; exists {
pvcsMap[name] = struct{}{}
} else {
return nil, fmt.Errorf("Missing PVC name in storage configuration")
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually on second thoughts, simplest would just be this. I also think we shouldn't return a failure for this, best to just log a warning.

Suggested change
if _, exists := storageConfig["name"]; !exists {
return nil, fmt.Errorf("Missing PVC name in storage configuration")
}
pvcsMap[storageConfig["name"]] = struct{}{}
if name := storageConfig["name"]; name != "" {
pvcsMap[name] = struct{}{}
} else {
// Log warning here
}

}
}
// add pvcs for predictors when the global config is enabled
if r.ConfigProvider.GetConfig().AllowAnyPVC {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So that we use a consistent snapshot of the config for the whole reconciliation, it would be better to pass the already retrieved config struct into this function instead of getting via the provider. Can just add a cfg arg to getPVCs().

if p != nil && predicate(p) {
inferenceService := &list.Items[i]
nname := types.NamespacedName{Name: inferenceService.Name, Namespace: inferenceService.Namespace}
p, err := BuildBasePredictorFromInferenceService(inferenceService)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chinhuang007 we should probably check for p == nil here like before.

Add PVC support according to the design and discussions
captured in the issue, kserve#230

Signed-off-by: Chin Huang <[email protected]>
@fOO223Fr
Copy link

@pvaneck @ckadner is there any update on this Feature? this would be a great addition for our use case. Thank you

@njhill
Copy link
Member

njhill commented Jan 18, 2023

@fOO223Fr yes this is the next thing we are focused on completing. It didn't make the 0.10 release but is mostly complete. The main thing remaining to do is some tests, which @ckadner is working on.

Comment on lines +302 to +308
s := &corev1.Secret{}
if err := r.Client.Get(ctx, types.NamespacedName{
Name: modelmesh.StorageSecretName,
Namespace: req.Namespace,
}, s); err != nil {
return nil, fmt.Errorf("Could not get the storage secret: %w", err)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To use PVC, why do we need to check StorageSecretName The default "storage-config" secret usually has credentials for accessing cloud storage such as S3. However, PVC does not need credentials to attach to the pod.
With this code block, the controller keeps checking if the secret exists in a namespace. So although it is not needed, it must exist in the namespace to deploy an inference pod.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This goes back to some of the design decisions for the implementation, i.e. ...

#230 (comment)

The controller will watch the storage config secret in each namespace and keep the pvcs mounted to runtime deployments based on any pvc type entries
Predictors can always reference one of these PVCs for storage

#230 (comment)

The entry type will be checked and the name used to create volumes and volume mounts for runtime deployments, while the keys are irrelevant.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so even though users want to use PVC only, creating storage-config Secret with dummy data is required, am I right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you use the allowAnyPVC config parameter, and don't want to deploy the minio-examples image, I suppose you would not need it
Certainly it should not be required to create it with dummy data, so I think we should tolerate it not being there. @njhill

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ckadner yes as discussed offline we should change it so that in that mode the storage-config secret isn't required.

@ckadner ckadner self-assigned this Feb 16, 2023
@ckadner ckadner added this to the v0.11.0 milestone Feb 24, 2023
Copy link
Member

@njhill njhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@njhill
Copy link
Member

njhill commented Mar 1, 2023

/lgtm

@njhill njhill merged commit f794ffc into kserve:main Mar 1, 2023
@ckadner ckadner mentioned this pull request Mar 1, 2023
6 tasks
ckadner added a commit to ckadner/modelmesh-serving that referenced this pull request Mar 6, 2023
ckadner added a commit to ckadner/modelmesh-serving that referenced this pull request Mar 6, 2023
VedantMahabaleshwarkar referenced this pull request in VedantMahabaleshwarkar/modelmesh-serving Mar 20, 2023
* Add PVC support

Add PVC support according to the design and discussions
captured in the issue, kserve#230

Signed-off-by: Chin Huang <[email protected]>

* add predictor controller login

Signed-off-by: Chin Huang <[email protected]>

* code restructure, cleanup based on review

Signed-off-by: Chin Huang <[email protected]>

* fix addPullerSidecar to include all pvcs

Signed-off-by: Chin Huang <[email protected]>

* restructure and simplify code, use global configmap rather than env var

Signed-off-by: Chin Huang <[email protected]>

* make AllowAnyPVC dynamic, update docs

Signed-off-by: Chin Huang <[email protected]>

* add runtimeSupportsPredictor check

Signed-off-by: Chin Huang <[email protected]>

* use PredictorRegistry and add storage to find()

Signed-off-by: Chin Huang <[email protected]>

---------

Signed-off-by: Chin Huang <[email protected]>
njhill pushed a commit that referenced this pull request Mar 21, 2023
Motivation

Address PVC follow-up work items outlined in #337 for PVC storage introduced in #267

Modifications

Code changes:
- Sort PVC mounts on serving runtime specs to avoid unstable repeated
runtime rollouts as Kubernetes treat two otherwise identical deployment
specs as different if the same set of volume mounts are in different order
- Don't add non-existent PVCs from predictor/ISVC when allowAnyPVC is
enabled as this would cause all serving pods for that runtime to stay in
- Pending state with unbound (pending) volumes
- Tolerate missing storage-config secret when allowAnyPVC is enabled
- Lint: fix "io/ioutil" deprecations

FVT changes:
- Add Storage test suite
- Add helper methods to add PVC to storage-config during FVT
- Allow for additional time in WaitForReadyDeployStatus but allow early abort on success
- Check if pod still running before gRPC/REST requests, reconnect if necessary
- Only choose "Ready" runtime pod for port-forwards
- Include ISVC tests in Predictor test suite to ensure "serial" execution of TLS tests

Resolves #337

Signed-off-by: Christian Kadner <[email protected]>
VedantMahabaleshwarkar referenced this pull request in VedantMahabaleshwarkar/modelmesh-serving Mar 23, 2023
Motivation

Address PVC follow-up work items outlined in kserve#337 for PVC storage introduced in opendatahub-io#267

Modifications

Code changes:
- Sort PVC mounts on serving runtime specs to avoid unstable repeated
runtime rollouts as Kubernetes treat two otherwise identical deployment
specs as different if the same set of volume mounts are in different order
- Don't add non-existent PVCs from predictor/ISVC when allowAnyPVC is
enabled as this would cause all serving pods for that runtime to stay in
- Pending state with unbound (pending) volumes
- Tolerate missing storage-config secret when allowAnyPVC is enabled
- Lint: fix "io/ioutil" deprecations

FVT changes:
- Add Storage test suite
- Add helper methods to add PVC to storage-config during FVT
- Allow for additional time in WaitForReadyDeployStatus but allow early abort on success
- Check if pod still running before gRPC/REST requests, reconnect if necessary
- Only choose "Ready" runtime pod for port-forwards
- Include ISVC tests in Predictor test suite to ensure "serial" execution of TLS tests

Resolves kserve#337

Signed-off-by: Christian Kadner <[email protected]>
VedantMahabaleshwarkar referenced this pull request in VedantMahabaleshwarkar/modelmesh-serving Mar 23, 2023
Motivation

Address PVC follow-up work items outlined in kserve#337 for PVC storage introduced in opendatahub-io#267

Modifications

Code changes:
- Sort PVC mounts on serving runtime specs to avoid unstable repeated
runtime rollouts as Kubernetes treat two otherwise identical deployment
specs as different if the same set of volume mounts are in different order
- Don't add non-existent PVCs from predictor/ISVC when allowAnyPVC is
enabled as this would cause all serving pods for that runtime to stay in
- Pending state with unbound (pending) volumes
- Tolerate missing storage-config secret when allowAnyPVC is enabled
- Lint: fix "io/ioutil" deprecations

FVT changes:
- Add Storage test suite
- Add helper methods to add PVC to storage-config during FVT
- Allow for additional time in WaitForReadyDeployStatus but allow early abort on success
- Check if pod still running before gRPC/REST requests, reconnect if necessary
- Only choose "Ready" runtime pod for port-forwards
- Include ISVC tests in Predictor test suite to ensure "serial" execution of TLS tests

Resolves kserve#337

Signed-off-by: Christian Kadner <[email protected]>
VedantMahabaleshwarkar referenced this pull request in VedantMahabaleshwarkar/modelmesh-serving Mar 23, 2023
Motivation

Address PVC follow-up work items outlined in kserve#337 for PVC storage introduced in opendatahub-io#267

Modifications

Code changes:
- Sort PVC mounts on serving runtime specs to avoid unstable repeated
runtime rollouts as Kubernetes treat two otherwise identical deployment
specs as different if the same set of volume mounts are in different order
- Don't add non-existent PVCs from predictor/ISVC when allowAnyPVC is
enabled as this would cause all serving pods for that runtime to stay in
- Pending state with unbound (pending) volumes
- Tolerate missing storage-config secret when allowAnyPVC is enabled
- Lint: fix "io/ioutil" deprecations

FVT changes:
- Add Storage test suite
- Add helper methods to add PVC to storage-config during FVT
- Allow for additional time in WaitForReadyDeployStatus but allow early abort on success
- Check if pod still running before gRPC/REST requests, reconnect if necessary
- Only choose "Ready" runtime pod for port-forwards
- Include ISVC tests in Predictor test suite to ensure "serial" execution of TLS tests

Resolves kserve#337

Signed-off-by: Christian Kadner <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PVC storage support
6 participants