Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PVC storage support #230

Closed
njhill opened this issue Sep 7, 2022 · 3 comments · Fixed by #267
Closed

PVC storage support #230

njhill opened this issue Sep 7, 2022 · 3 comments · Fixed by #267
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@njhill
Copy link
Member

njhill commented Sep 7, 2022

Support for serving models from PVCs is an important capability, we need to have a robust approach. It differs from our other cloud storage abstractions since it must be configured at the pod level, which means PVCs can't be configured dynamically in the multi-model case without restarting pods.

Thoughts:

  1. The controller will need to be aware of Predictors tied to pvc storage. We need to think about whether we update the multi-model serving runtime Deployments dynamically if there's a new Predictor targeted for that deployment which uses a PVC not currently mounted there, and whether we refresh the Deployment again if/when a particular PVC is no longer needed.

    • We could make this configurable with reasonable defaults, for example reasonable default behaviour would probably be to avoid restarting a Deployment to just to remove pvcs which are no longer needed.
    • We should add a ServingRuntime field allowing to specify pvcs that should always be mounted to the given runtime, and possibly a list in the global config of PVCs which should be mounted to all (multi-model) runtimes. This will help with common use-cases where there's only one (or a small/fixed number) used to store all models.
    • This isn't needed for single-model runtimes for hopefully obvious reasons
  2. PVC can be one of our abstract storage types, the corresponding puller logic would just symlink the targeted file or dir from the mounted pvc volume into the model dir (which itself may be subsequently symlinked by adapter logic). We can read-only mount all of the "current" pvcs into one area (e.g. /pvc_mounts/<pvc-name>/).

  3. Generally it probably doesn't make sense to have a config entry in our storage secret for pvcs since they are already separately configured and all that should be needed is the claimName, i.e. Predictor would contain something like:

     storage:
       type: pvc
       parameters:
           claimName: myPvc
    
  4. We could consider retaining specific fields in the CRD since it's a special case. This would also help with 1 and 3 above.

Related (but different): #229


Pull Requests:

@njhill njhill added the enhancement New feature or request label Sep 7, 2022
@njhill
Copy link
Member Author

njhill commented Sep 23, 2022

There are some parts of this that we can start on before finalizing the overall design:

  • Add puller logic for pvc type storage which just:
    • Verifies that a /pvc_mounts/<pvc-name> dir exists and also that /pvc_mounts/<pvc-name>/<model-path> exists (dir or file), where <pvc-name> is the value of the claimName storage parameter in the load request
    • Rewrites the model path in the request passed along to be this new full path
  • Add a pvcs string slice to the Deployment struct used to construct the serving runtime deployments and for each entry add a PVC type Volume with the name of that pvc and readOnly: true, and mount it to all of the same containers that have the /models mount, at a location /pvc_mounts/<pvc-name>/

@njhill
Copy link
Member Author

njhill commented Sep 27, 2022

Some design decisions following further discussion:

  • The controller will watch the storage config secret in each namespace and keep the pvcs mounted to runtime deployments based on any pvc type entries
  • Predictors can always reference one of these PVCs for storage
  • For predictors that reference a PVC name that doesn't have an entry in the storage config secret, behaviour will depend upon a new global config flag. Either:
    1. it will fail to load (this could be caught at the controller and/or puller level)
    2. the controller adds the pvc to the list of pvcs mounted to all runtimes, resulting in a reconfig of all the deployments

Nice to have in the case of (2):

  • Only add it to the runtimes which could host that predictor, based on the type matching that's already done in the controller
  • Include some delay (for example an hour) before removing a pvc from the deployment-mounted list when the last predictor to use it is deleted. This should help to cut down on deployment restart churn.

It would be fine to ignore one or both of these for a first pass.

We also may need to think about what happens in the case of (1) when a pvc is removed from the storage-secret that's already in use by a predictor, or the global config flag changes from (2) to (1) while there's a predictor with a pvc mounted that isn't in the config secret list. Probably the simplest/correct thing is just to allow the predictor to enter a failed state.

Finally, this proposal means the controller will now require RBAC permission to access the storage secret, which it doesn't currently. We should ensure everything apart from out-of-band configured PVCs still works if it does not have that permission.

chinhuang007 added a commit to chinhuang007/modelmesh-serving that referenced this issue Oct 12, 2022
Add PVC support according to the design and discussions
captured in the issue, kserve#230

Signed-off-by: Chin Huang <[email protected]>
chinhuang007 added a commit to chinhuang007/modelmesh-serving that referenced this issue Oct 12, 2022
Add PVC support according to the design and discussions
captured in the issue, kserve#230

Signed-off-by: Chin Huang <[email protected]>
chinhuang007 added a commit to chinhuang007/modelmesh-serving that referenced this issue Oct 13, 2022
Add PVC support according to the design and discussions
captured in the issue, kserve#230

Signed-off-by: Chin Huang <[email protected]>
@chinhuang007
Copy link
Contributor

chinhuang007 commented Oct 13, 2022

Just to clarify, the pvc entries in the storage config secret would look like:
stringData: pvc1: | { “type”: “pvc” "name": "pvc1", } pvc2: | { “type”: “pvc” "name": "pvc2", }
The entry type will be checked and the name used to create volumes and volume mounts for runtime deployments, while the keys are irrelevant.
The predictor storage in an InferenceService is in storageUri: pvc://${PVC_NAME}/ModelPath format, for example, storageUri: pvc://tf-pvc/mnist

chinhuang007 added a commit to chinhuang007/modelmesh-serving that referenced this issue Oct 17, 2022
Add PVC support according to the design and discussions
captured in the issue, kserve#230

Signed-off-by: Chin Huang <[email protected]>
chinhuang007 added a commit to chinhuang007/modelmesh-serving that referenced this issue Nov 3, 2022
Add PVC support according to the design and discussions
captured in the issue, kserve#230

Signed-off-by: Chin Huang <[email protected]>
chinhuang007 added a commit to chinhuang007/modelmesh-serving that referenced this issue Nov 14, 2022
Add PVC support according to the design and discussions
captured in the issue, kserve#230

Signed-off-by: Chin Huang <[email protected]>
chinhuang007 added a commit to chinhuang007/modelmesh-serving that referenced this issue Nov 18, 2022
Add PVC support according to the design and discussions
captured in the issue, kserve#230

Signed-off-by: Chin Huang <[email protected]>
chinhuang007 added a commit to chinhuang007/modelmesh-serving that referenced this issue Dec 14, 2022
Add PVC support according to the design and discussions
captured in the issue, kserve#230

Signed-off-by: Chin Huang <[email protected]>
@ckadner ckadner added this to the v0.11.0 milestone Feb 24, 2023
njhill pushed a commit that referenced this issue Mar 1, 2023
* Add PVC support

Add PVC support according to the design and discussions
captured in the issue, #230

Signed-off-by: Chin Huang <[email protected]>

* add predictor controller login

Signed-off-by: Chin Huang <[email protected]>

* code restructure, cleanup based on review

Signed-off-by: Chin Huang <[email protected]>

* fix addPullerSidecar to include all pvcs

Signed-off-by: Chin Huang <[email protected]>

* restructure and simplify code, use global configmap rather than env var

Signed-off-by: Chin Huang <[email protected]>

* make AllowAnyPVC dynamic, update docs

Signed-off-by: Chin Huang <[email protected]>

* add runtimeSupportsPredictor check

Signed-off-by: Chin Huang <[email protected]>

* use PredictorRegistry and add storage to find()

Signed-off-by: Chin Huang <[email protected]>

---------

Signed-off-by: Chin Huang <[email protected]>
ckadner added a commit to ckadner/modelmesh-serving that referenced this issue Mar 6, 2023
ckadner added a commit to ckadner/modelmesh-serving that referenced this issue Mar 6, 2023
VedantMahabaleshwarkar pushed a commit to VedantMahabaleshwarkar/modelmesh-serving that referenced this issue Mar 20, 2023
* Add PVC support

Add PVC support according to the design and discussions
captured in the issue, kserve#230

Signed-off-by: Chin Huang <[email protected]>

* add predictor controller login

Signed-off-by: Chin Huang <[email protected]>

* code restructure, cleanup based on review

Signed-off-by: Chin Huang <[email protected]>

* fix addPullerSidecar to include all pvcs

Signed-off-by: Chin Huang <[email protected]>

* restructure and simplify code, use global configmap rather than env var

Signed-off-by: Chin Huang <[email protected]>

* make AllowAnyPVC dynamic, update docs

Signed-off-by: Chin Huang <[email protected]>

* add runtimeSupportsPredictor check

Signed-off-by: Chin Huang <[email protected]>

* use PredictorRegistry and add storage to find()

Signed-off-by: Chin Huang <[email protected]>

---------

Signed-off-by: Chin Huang <[email protected]>
ckadner added a commit that referenced this issue Mar 23, 2023
- PVC storage support (#230, #337) 
- Payload logging/events (#284)

Signed-off-by: Christian Kadner <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants