nvidia-smi -----> Failed to initialize NVML: Unknown Error (in docker)

Hi ,
My requirement is to have zeppelin and cuda packages in docker container running in GPU machine.
So i have downloaded zeppelin docker image and created docker container .Then i installed cuda libraries in zeppelin docker container using below link —

https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&target_distro=Ubuntu&target_version=1604&target_type=deblocal

after following all steps in link , i get error on nvidia-smi :
Failed to initialize NVML: Unknown Error

— I have one more docker image in same machine ,in which “nvidia-smi” does not throwing any error.
so i ruled out the bios setting modification resolution.

Need steps for debugging or any alternate solution.
I might be ignorant at this stage about the issue but i am looking at it.

Thanks

1 Like

you should be using the nvidia docker runtime plugin

https://github.com/NVIDIA/nvidia-docker

oh yes , my bad ,was expecting some magic , understood that normal docker will virtualize cpu and nvidia-docker is harnessing available GPU .
My issue with setting up newly nvidia-docker is i have to run this “systemctl restart docker” , that i was avoiding ,since it will restart existing docker services.
However , on running it showing, nvidia-docker: command not found.

Just wondering , if no nvidia-docker is present , how existing gpu docker are running in machine.

Obviously i am missing something , looking for it now.

Thanks

a gpu docker container can run without nvidia-docker

I encourage you to read the link I gave you thoroughly.

The nvidia-docker runtime will harmonize driver components between the base machine and the container.

If you don’t do this, then the onus is on you to make sure the driver components in your container match the driver components installed on the base machine. if they do, your container will work. if they don’t you’ll have driver problems that may manifest using the exact error message you’ve reported here.

So it is possible for a GPU-based docker container to run correctly on a machine without the nvidia-docker runtime. It’s just more difficult.

1 Like

Well Thanks for explanation , it ran successfully with nvidia-docker.