Skip to content
This repository has been archived by the owner on Jan 22, 2024. It is now read-only.

gpu not found in a docker container #92

Closed
mijung-kim opened this issue May 6, 2016 · 7 comments
Closed

gpu not found in a docker container #92

mijung-kim opened this issue May 6, 2016 · 7 comments

Comments

@mijung-kim
Copy link

Hello,

I recently built a cuda-7.5 with cudnn4 devel docker image with tensorflow. But inside the docker container gpus were not recognized.

The log is as below:
import tensorflow as tf
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcublas.so.7.5 locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcudnn.so.4 locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcufft.so.7.5 locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcurand.so.7.5 locally
sess = tf.Session()
E tensorflow/stream_executor/cuda/cuda_driver.cc:481] failed call to cuInit: CUresult(-1)
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:114] retrieving CUDA diagnostic information for host: 18a61c37a941
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:121] hostname: 18a61c37a941
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:146] libcuda reported version is: Invalid argument: expected %d.%d form for driver version; got "1"
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:257] driver version file contents: """NVRM version: NVIDIA UNIX x86_64 Kernel Module 352.93 Tue Apr 5 18:18:24 PDT 2016
GCC version: gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04)
"""
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:150] kernel reported version is: 352.93
I tensorflow/core/common_runtime/gpu/gpu_init.cc:81] No GPU devices available on machine.

In addition, I also tried the flag "--device /dev/nvidia0:~", but the result was the same as above and I could not run "nvidia-smi" inside the container saying that "command not found" error. Is there any way to fix this?

@flx42
Copy link
Member

flx42 commented May 6, 2016

Which command did you use to start tensorflow? You should use the prebuilt image available on GCR:

nvidia-docker run -ti --rm gcr.io/tensorflow/tensorflow:latest-devel-gpu

If it doesn't work, try starting with the following:

nvidia-docker run --rm nvidia/cuda nvidia-smi

@mijung-kim
Copy link
Author

Thank you, it worked. But I need to customize the nvidia-docker image for my research. Can I build the image using Dockerfile as Docker build does? More generally, can I use nvidia-docker like docker such as flags or any other commands? Thank you in advance!

@flx42
Copy link
Member

flx42 commented May 9, 2016

Your custom image must be based on one of the image we provide on the DockerHub:
https://hub.docker.com/r/nvidia/
For instance, use FROM nvidia/cuda:7.5, or FROM nvidia/cuda:7.5-cudnn4-devel like Tensorflow does.
In this case, nvidia-docker will automatically mount the right files and devices inside the container.

Yes, you can use all the flags and commands supported by docker with nvidia-docker.

@mijung-kim
Copy link
Author

Thank you! It works perfect for me. :-)

@flx42
Copy link
Member

flx42 commented May 9, 2016

Great to know! Note that you can also inherit from the tensorflow image since it itself uses one of our image.

@saikishor
Copy link

saikishor commented Oct 18, 2017

I have a question, Will i be able to use GPU of another system connected to the same network using docker?. If yes, how can I do that?. Will i be able to use these GPU's to train model zoo present in the Tensorflow Pre-trained models

@flx42
Copy link
Member

flx42 commented Oct 18, 2017

With nvidia-docker 2.0 + using a remote docker daemon (e.g. DOCKER_HOST or docker -H) it should be easy.

Locking, please don't resurrect old issues.

@NVIDIA NVIDIA locked and limited conversation to collaborators Oct 18, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants