Enabling GPUs in the Container Runtime Ecosystem

Originally published at: https://developer.nvidia.com/blog/gpu-containers-runtime/

NVIDIA uses containers to develop, test, benchmark, and deploy deep learning (DL) frameworks and HPC applications. We wrote about building and deploying GPU containers at scale using NVIDIA-Docker roughly two years ago. Since then, NVIDIA-Docker has been downloaded close to 2 million times. A variety of customers used NVIDIA-Docker to containerize and run GPU accelerated workloads. NVIDIA…

I wish you'd talk more about Singularity.

Gostei muito.

this is nice! but, the gpu's needed as documented elsewhere on the Nvidia site are

NVIDIA TITAN V (Volta)
NVIDIA TITAN X (Pascal)
NVIDIA TITAN Xp (Pascal)
NVIDIA Quadro GV100 (Volta)
NVIDIA Quadro GP100 (Pascal)
NVIDIA Quadro P6000 (Pascal)

the web site https://docs.nvidia.com/ngc... has no information about the RTX cards as yet. Your documentation is all over the place, not up to date, and seems to confuse issues for the sake of clearing inventory...

I want to run my cloud hybrid or on premises with commodity (ish) hardware, so my old titan, or new gtx1080ti do not get a look in irrespective of their capabilities... I can try and flash the bios so that my 1080ti looks like something acceptable, but reliable sources tell me that Nvidia has precluded this with watchdogs that will brick my new card... Nice. So it looks as if I should try for second hand titan x (pascal) or titan Xp cards.Trouble with this is that I expect the next round of iterations to jerk the rug out from under my feet, again, forcing me to submit to the scrutiny and rental costs of the cloud, or go for the extremely expensive route. Or go with intel/amd... It may be I just freeze in time with a titan x (pascal) or two, after the RTX2080 and 2080Ti (and god knows what other jack in the boxes are about to be foist on me) depress the market for lesser cards. This does make it very difficult for sole operators like me, who are being frozen out of developing their ideas inexpensively. And I know that if I go cloud/hybrid with some of the juicier ones, then it won't be too long before some bright eyed ivy leaguer launches another billion dollar company... Thanks Nvidia!

This works fine for me using a GTX 1080Ti?

wow! really? you can spin up on premises (aka on your home box) GPU cloud instances on a GTX1080Ti? Can you detail what you did, please? This is very good news, will save me a LOT of stuffing around. CHEERS :))

I'm not sure if this is exactly the same as GPU cloud. But you can definitely spin up nvidia-docker containers on a home/in premisise machine with GTX 1080Ti's. If you're not familiar with docker, a docker contain is basically a small headless virtual machine with low overhead and a scripted build process so repeatable.

I installed Ubuntu 18.04 - I tried Debian and Fedora but it's far less painful with Ubuntu.

You then need to install cuda 10.0 with the driver which ships with it 410.48, if you try and download the driver by itself you get 396.xx which doesn't support cuda 10.0 (you may also need to disable the open source nouveau driver). I first disabled the nouveau driver by following https://linuxconfig.org/how... (I'm not sure if this is required or not)

Then installed cuda 10.0

$ sudo apt-get install build-essential dkms
$ sudo apt-get install freeglut3 freeglut3-dev libxi-dev libxmu-dev

Now download the deb file installer and following the instructions here:

https://developer.nvidia.co...

$ sudo dpkg -i cuda-repo-ubuntu1804_10.0.130-1_amd64.deb
$ sudo apt-key adv --fetch-keys https://developer.download....
$ sudo apt-get update
$ sudo apt-get install cuda
$ reboot

Then install docker-ce by following the instructions here:

https://docs.docker.com/ins...

Then the nvidia docker runtime following the instructions here:

https://github.com/nvidia/n...

Now you should be able to bring up a nvidia-docker container

# Test nvidia-smi with the cuda 9.0
$ docker run --runtime=nvidia --rm nvidia/cuda:9.0-base nvidia-smi
# Test nvidia-smi with the cuda 9.0
$ docker run --runtime=nvidia --rm nvidia/cuda:10.0-base nvidia-smi

There's a ton of different base images you can use i.e. https://hub.docker.com/r/nv...

I've been using nvidia/cuda:9.0-cudnn7-devel-ubuntu16.04 as it matches the requirements for tensorflow-gpu for example

hey thanks for this, I am about to embark on another build quest, so it will be interesting to compare your instructions with what I have already done. I am still not holding my breath on the NVIDIA GPU cloud images working though...

What about windows?
Actually windows is much needed.
I need it to run directx 12.

I have a question, if I have a server with 4 tesla M10, can I only launch 4 containers?

Docker can run on both Windows and macOS operating systems. This is enabled by the Docker architectures