cuda 8.0 and libnvidia-ml.so.xxx on cloud platforms that use underlying nvidia-docker

When using cloud vendor machines with GPUs that expect the driver directories and files to be mounted into the container I am having issues getting the management library installed so that I can query the GPU. If I use docker to create the machine and force the nvidia drivers in, along with the standard cuda software to get a copy of the shared library, when the machine starts on Azure or AWS I get errors from Kubernetes that is being used to start these saying that the directories are getting in the way of the nvidia-docker style mounts.

So, if I cannot get the nvidia drivers on how on earth do I get the shared library and tools like nvidia-smi into containers ?

use nvidia container runtime plugin: (previously called nvidia-docker 2.0)

https://github.com/NVIDIA/nvidia-container-runtime

When you do that, you don’t install the driver bits into the container (the runtime does it for you).

with respect to Kubernetes, this may be useful:

https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/

Note that this situation is changing pretty rapidly (as indicated on kubernetes page) and so the “recipe” may be different 6 months from now.

Do you know of an Azure recipe for the plugin approach on k8s ?

Thanks

a google search on “gpu kubernetes azure” seemed to turn up several promising hits

I was not able to find any using google etc that mention the plugin approach. Many older articles and blogs abound for the older style alpha.kubernetes.io/nvidia-gpu: 1 approaches.

I will try Microsoft and see if I can find anything out about the ‘nvidia container runtime plugin’ approach but it does not seem to have been on the MSDN radar at least.