You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When following this guide, https://www.kubeflow.org/docs/components/tfserving_new/, I am unable to serve a model using ks param set ${MODEL_COMPONENT} numGpus 1. Doing so results in an error 0/1 nodes are available: 1 Insufficient nvidia.com/gpu., which presumably means that the nvidia.com/gpu plugin has not been deployed. I am at a loss as to exactly how this should be done. Documentation on the Nvidia website is quite scant, and also the link provided in the guide for a GPU example (https://github.com/kubeflow/examples/blob/master/object_detection/tf_serving_gpu.md) offers no explanation whatsoever.
As a side note, if I leave out ks param set ${MODEL_COMPONENT} numGpus 1 (or set numGpus to 0), it also doesn't work, resulting in:
Error: failed to start container "testmodel": Error response from daemon: OCI runtime create failed: container_linux.go:348: starting container process caused "process_linux.go:402: container init caused \"setenv: invalid argument\"": unknown
Thereafter, the nvidia/gpu daemonset must be deployed, as follows: kubectl apply -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v1.12/nvidia-device-plugin.yml
I think it is really necessary that the guide describes these requirements
The text was updated successfully, but these errors were encountered:
Issue-Label Bot is automatically applying the label kind/bug to this issue, with a confidence of 0.69. Please mark this comment with 👍 or 👎 to give our bot feedback!
Note that this issue refers to an AWS deployment. The TensorFlow Serving guide should explain the situation in general terms (for clouds other than AWS) and can give an AWS-specific example where useful.
I'm marking this issue for the doc sprint. It will take some testing to ensure the updates are correct.
sarahmaddox
changed the title
TensorFlow Serving: 0/1 nodes are available: 1 Insufficient nvidia.com/gpu.
AWS: TensorFlow Serving: 0/1 nodes are available: 1 Insufficient nvidia.com/gpu.
Jan 2, 2020
AWS by default install nvidia-device-plugin for 0.7. We can close this issue.
/close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
When following this guide, https://www.kubeflow.org/docs/components/tfserving_new/, I am unable to serve a model using
ks param set ${MODEL_COMPONENT} numGpus 1
. Doing so results in an error0/1 nodes are available: 1 Insufficient nvidia.com/gpu.
, which presumably means that thenvidia.com/gpu
plugin has not been deployed. I am at a loss as to exactly how this should be done. Documentation on the Nvidia website is quite scant, and also the link provided in the guide for a GPU example (https://github.com/kubeflow/examples/blob/master/object_detection/tf_serving_gpu.md) offers no explanation whatsoever.As a side note, if I leave out
ks param set ${MODEL_COMPONENT} numGpus 1
(or set numGpus to 0), it also doesn't work, resulting in:EDIT
The solution to this is as follows:
For example:
nvidia/gpu
daemonset must be deployed, as follows:kubectl apply -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v1.12/nvidia-device-plugin.yml
I think it is really necessary that the guide describes these requirements
The text was updated successfully, but these errors were encountered: